Alexis Parenty
2017-05-19 09:52:46 UTC
Hi everyone,
I need a function that could generalize any aromatic rings from a SMARTS:
[image: Inline images 1]
I have noticed that it is possible to rearrange most of SMARTS strings into
a general aromatic SMARTS strings by following those simple rules:
1 Exchange any lower case of a SMARTS string with â:[*]â
2 Catch the two cycle junctions of the SMARTS:
a. Where a number(1-9) appears a first time in the string: insert a
colon after the digit (for example â[*]1â to â[*]1:â
b. Where the same number appears a second time, move the semi colon
before the digit (for example â[*]1:â to â[*]:1 the
I have written a function (see under) that works fine with any SMART
containing a single aromatic ring. But it does get buggy when I have a
SMARTS with more than one aromatic ring:
[image: Inline images 2]
def get_aromatic_generalised_smarts(smarts):
for arom_atom in ("c", "o", "n", "s"):
smarts = smarts.replace(arom_atom, "x")
smarts = smarts.replace("[xH]", "x") # to take care of explicit
hydrogen atoms
for char in smarts:
if char == 'x':
smarts = smarts.replace(char, ":[*]")
for char in smarts:
if char.isdigit():
if ("[*]"+char) in smarts:
for cycle_junction in ("[*]1", "[*]2", "[*]3", "[*]4",
"[*]5", "[*]6", "[*]7", "[*]8", "[*]9"):
smarts = smarts.replace(cycle_junction, "[*]:" +
cycle_junction[-1]) # that make the second cycle junction OK but
introduce an error in the first cycle jonction that is corrected next
line
smarts = smarts.replace(":[*]:"+char, "[*]"+char, 1) # to
correct the first cycle junction.
break
return smarts
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)n1"))
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1"))
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1Cc2ccccc2")
Am I heading in the right direction? I can't make my heads around SMARTS
with more than one aromatic rings...
Maybe regular expressions would be more appropriate? Maybe there is an
RDKit function that does the trick from a mol object?
Thanks,
Alexis
I need a function that could generalize any aromatic rings from a SMARTS:
[image: Inline images 1]
I have noticed that it is possible to rearrange most of SMARTS strings into
a general aromatic SMARTS strings by following those simple rules:
1 Exchange any lower case of a SMARTS string with â:[*]â
2 Catch the two cycle junctions of the SMARTS:
a. Where a number(1-9) appears a first time in the string: insert a
colon after the digit (for example â[*]1â to â[*]1:â
b. Where the same number appears a second time, move the semi colon
before the digit (for example â[*]1:â to â[*]:1 the
I have written a function (see under) that works fine with any SMART
containing a single aromatic ring. But it does get buggy when I have a
SMARTS with more than one aromatic ring:
[image: Inline images 2]
def get_aromatic_generalised_smarts(smarts):
for arom_atom in ("c", "o", "n", "s"):
smarts = smarts.replace(arom_atom, "x")
smarts = smarts.replace("[xH]", "x") # to take care of explicit
hydrogen atoms
for char in smarts:
if char == 'x':
smarts = smarts.replace(char, ":[*]")
for char in smarts:
if char.isdigit():
if ("[*]"+char) in smarts:
for cycle_junction in ("[*]1", "[*]2", "[*]3", "[*]4",
"[*]5", "[*]6", "[*]7", "[*]8", "[*]9"):
smarts = smarts.replace(cycle_junction, "[*]:" +
cycle_junction[-1]) # that make the second cycle junction OK but
introduce an error in the first cycle jonction that is corrected next
line
smarts = smarts.replace(":[*]:"+char, "[*]"+char, 1) # to
correct the first cycle junction.
break
return smarts
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)n1"))
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1"))
print(get_aromatic_generalised_smarts("[*]c1coc(Cl)c1Cc2ccccc2")
Am I heading in the right direction? I can't make my heads around SMARTS
with more than one aromatic rings...
Maybe regular expressions would be more appropriate? Maybe there is an
RDKit function that does the trick from a mol object?
Thanks,
Alexis