[Rdkit-discuss] a 2D to 3D (smi to sdf) conformer generator python script using rdkit

Discussion:

Francois BERENGER

2017-06-14 07:27:09 UTC

Hello,

I gave a try at reproducing the protocol described in:

@article{DBLP:journals/jcisd/EbejerMD12,
author = {Jean{-}Paul Ebejer and Garrett M. Morris and
Charlotte M. Deane},
title = {Freely Available Conformer Generation Methods:
How Good Are They?},
journal = {Journal of Chemical Information and Modeling},
volume = {52},
number = {5},
pages = {1146--1158},
year = {2012},
url = {https://doi.org/10.1021/ci2004658},
doi = {10.1021/ci2004658},
}

The resulting script is there:

https://github.com/UnixJunkie/smi2sdf3d

I hope I could reproduce their protocol exactly.
Sorry, my python is so rusty these days.

Comments and contributions are welcome.

Even auditing the code for correctness is welcome since it is
doing some scientific computation.

It is a little bit too slow to my taste.

You can use it like this to get a max of 10 conformers
per molecule in your input.smi file:

./smi2sdf.py 10 input.smi output.sdf

Best regards,
Francois.

Greg Landrum

2017-06-15 06:50:14 UTC

Permalink

Thanks for letting people know about this. If we can get a consensus form
that people agree makes sense, this might be a nice addition to either the
RDKit/Scripts directory or the cookbook.

A couple of smallish comments after a quick skim:
- I would really strongly encourage you to use the ETKDG parameters (
http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654) when doing the
embedding. This really helps a lot with the quality of the conformations
and lets you skip the UFF step.
- The built-in RMSD pruning has improved since JP's article, it may be
worth looking at that.
- If you want to make the embedding step itself robust, it wouldn't be a
bad idea to try switching to random coordinate generation if the initial
embedding fails.

Best,
-greg

On Wed, Jun 14, 2017 at 9:27 AM, Francois BERENGER <

Post by Francois BERENGER
Hello,
@article{DBLP:journals/jcisd/EbejerMD12,
author = {Jean{-}Paul Ebejer and Garrett M. Morris and
Charlotte M. Deane},
How Good Are They?},
journal = {Journal of Chemical Information and Modeling},
volume = {52},
number = {5},
pages = {1146--1158},
year = {2012},
url = {https://doi.org/10.1021/ci2004658},
doi = {10.1021/ci2004658},
}
https://github.com/UnixJunkie/smi2sdf3d
I hope I could reproduce their protocol exactly.
Sorry, my python is so rusty these days.
Comments and contributions are welcome.
Even auditing the code for correctness is welcome since it is
doing some scientific computation.
It is a little bit too slow to my taste.
You can use it like this to get a max of 10 conformers
./smi2sdf.py 10 input.smi output.sdf
Best regards,
Francois.
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Francois BERENGER

2017-06-15 07:47:15 UTC

Permalink

Post by Greg Landrum
Thanks for letting people know about this. If we can get a consensus
form that people agree makes sense, this might be a nice addition to
either the RDKit/Scripts directory or the cookbook.
- I would really strongly encourage you to use the ETKDG parameters
(http://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654) when doing the
embedding. This really helps a lot with the quality of the conformations
and lets you skip the UFF step.
- The built-in RMSD pruning has improved since JP's article, it may be
worth looking at that.

It would be nice if we have a way faster protocol than what I implemented.

This protocol (the one from the paper) is super slow due
to the RMSD pruning step (not due to UFF).
The more conformers/molecule you need, the slower.

But it works, at least.

The problem if you change the protocol to something more modern
is that you have to redo all the statistical validation they
did to confirm it works well.
Which requires quite some time and motivation.

Post by Greg Landrum
- If you want to make the embedding step itself robust, it wouldn't be a
bad idea to try switching to random coordinate generation if the initial
embedding fails.

Thanks for the comment. I might update this part if I see it fail.

Regards,
F.

Post by Greg Landrum
Best,
-greg
On Wed, Jun 14, 2017 at 9:27 AM, Francois BERENGER
Hello,
@article{DBLP:journals/jcisd/EbejerMD12,
author = {Jean{-}Paul Ebejer and Garrett M. Morris and
Charlotte M. Deane},
How Good Are They?},
journal = {Journal of Chemical Information and Modeling},
volume = {52},
number = {5},
pages = {1146--1158},
year = {2012},
url = {https://doi.org/10.1021/ci2004658
<https://doi.org/10.1021/ci2004658>},
doi = {10.1021/ci2004658},
}
https://github.com/UnixJunkie/smi2sdf3d
<https://github.com/UnixJunkie/smi2sdf3d>
I hope I could reproduce their protocol exactly.
Sorry, my python is so rusty these days.
Comments and contributions are welcome.
Even auditing the code for correctness is welcome since it is
doing some scientific computation.
It is a little bit too slow to my taste.
You can use it like this to get a max of 10 conformers
./smi2sdf.py 10 input.smi output.sdf
Best regards,
Francois.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>