Discussion:
[Rdkit-discuss] canonical smiles for fragments with map numbers
Pavel Polishchuk
2017-05-27 12:36:47 UTC
Permalink
Hi,

I cannot solve an issue and would like to ask for an advice.
If there are different map numbers for attachment points for the same
fragment different canonical smiles are generated.
I observed such behavior only for fragments with 3 attachment points.
Below is an example.
I'm looking for a solution/workaround how to produce the "same"
smiles strings irrespectively of mapping that after removal of map
numbers smiles will become identical.
Any advice would be appreciated.

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]

for s in smi:
print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))

output:
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]

Kind regards,
Pavel.
Brian Kelley
2017-05-27 13:03:29 UTC
Permalink
Pavel, this isn't exactly trivial so I went ahead and made an example. The
basics are that atomMaps are canonicalized, i.e. their value is used in the
generation of smiles.

To solve this problem:
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which the
atoms in the molecule are output
3) using the atom output order, relabel the atom maps based on output order.

That's a mouthful, but here's some code that should do the trick:

from rdkit import Chem

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]


def CanonicalizeMaps(m, *a, **kw):
# atom maps are canonicalized, so rename them
# figure out where they would have gone
# and relabel from 1...N based on output order
atomMap = "molAtomMapNumber"
backupAtomMap = "oldMolAtomMapNumber"

for atom in m.GetAtoms():
if atom.HasProp(atomMap):
atomNum = atom.GetProp(atomMap)
atom.SetProp(backupAtomMap, atomNum)
atom.ClearProp(atomMap)

# canonicalize
smi = Chem.MolToSmiles(m, *a, **kw)
# where did the atoms end up in the output string?
atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
eval(m.GetProp("_smilesAtomOutputOrder")))]
atommap = 1
atoms.sort()

# set the new atommap based on output position
for pos, atom_idx in atoms:
atom = m.GetAtomWithIdx(atom_idx)
if atom.HasProp(backupAtomMap):
atom.SetProp(atomMap, str(atommap))
atommap +=1

return Chem.MolToSmiles(m)

for s in smi:
m = Chem.MolFromSmiles(s)
print CanonicalizeMaps(m,True)



Output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]

Now, if you want the atomMaps in 1...2...3 output order, we could do that
as well, but it is even trickier.

Enjoy,
Brian
Post by Pavel Polishchuk
Hi,
I cannot solve an issue and would like to ask for an advice.
If there are different map numbers for attachment points for the same
fragment different canonical smiles are generated.
I observed such behavior only for fragments with 3 attachment points.
Below is an example.
I'm looking for a solution/workaround how to produce the "same" smiles
strings irrespectively of mapping that after removal of map numbers smiles
will become identical.
Any advice would be appreciated.
smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]
print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]
Kind regards,
Pavel.
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Pavel Polishchuk
2017-05-27 13:35:34 UTC
Permalink
Thank you, Brian!

Actually what I expected as output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
and so on

You gave me the right direction. I can store old-new maps in a dict and
after relabeling and producing of canonical smiles it would be easy to
relabel attachment points back.
Thank you again!

Pavel.
Post by Brian Kelley
Pavel, this isn't exactly trivial so I went ahead and made an
example. The basics are that atomMaps are canonicalized, i.e. their
value is used in the generation of smiles.
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which
the atoms in the molecule are output
3) using the atom output order, relabel the atom maps based on output order.
from rdkit import Chem
smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]
# atom maps are canonicalized, so rename them
# figure out where they would have gone
# and relabel from 1...N based on output order
atomMap = "molAtomMapNumber"
backupAtomMap = "oldMolAtomMapNumber"
atomNum = atom.GetProp(atomMap)
atom.SetProp(backupAtomMap, atomNum)
atom.ClearProp(atomMap)
# canonicalize
smi = Chem.MolToSmiles(m, *a, **kw)
# where did the atoms end up in the output string?
atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
eval(m.GetProp("_smilesAtomOutputOrder")))]
atommap = 1
atoms.sort()
# set the new atommap based on output position
atom = m.GetAtomWithIdx(atom_idx)
atom.SetProp(atomMap, str(atommap))
atommap +=1
return Chem.MolToSmiles(m)
m = Chem.MolFromSmiles(s)
print CanonicalizeMaps(m,True)
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
Now, if you want the atomMaps in 1...2...3 output order, we could do
that as well, but it is even trickier.
Enjoy,
Brian
On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk
Hi,
I cannot solve an issue and would like to ask for an advice.
If there are different map numbers for attachment points for the
same fragment different canonical smiles are generated.
I observed such behavior only for fragments with 3 attachment
points. Below is an example.
I'm looking for a solution/workaround how to produce the "same"
smiles strings irrespectively of mapping that after removal of map
numbers smiles will become identical.
Any advice would be appreciated.
smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
"ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
"ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
"ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
"ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
"ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]
print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]
Kind regards,
Pavel.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
Loading...