Creating Molecules from Smiles Strings#

Smiles strings provide a convenient way to represent a molecule as text.

You can create a molecule from a smiles string using the sire.smiles() function.

>>> import sire as sr
>>> mol = sr.smiles("C1:C:C:C:C:C1")
>>> print(mol.atoms())
Selector<SireMol::Atom>( size=12
0:  Atom( C1:1    [  -0.92,   -1.05,    0.02] )
1:  Atom( C2:2    [  -1.37,    0.27,   -0.04] )
2:  Atom( C3:3    [  -0.45,    1.32,   -0.06] )
3:  Atom( C4:4    [   0.92,    1.05,   -0.02] )
4:  Atom( C5:5    [   1.37,   -0.27,    0.04] )
...
7:  Atom( H8:8    [  -2.43,    0.48,   -0.07] )
8:  Atom( H9:9    [  -0.80,    2.35,   -0.11] )
9:  Atom( H10:10  [   1.63,    1.87,   -0.04] )
10:  Atom( H11:11  [   2.43,   -0.48,    0.07] )
11:  Atom( H12:12  [   0.80,   -2.35,    0.11] )
)
>>> mol.view()
Image of a benzene molecule created from a smiles string

Note how hydrogen atoms and coordinates of all atoms have been generated automatically. You can control this using the add_hydrogens and generate_coordinates options, e.g.

>>> mol = sr.smiles("C1:C:C:C:C:C1", generate_coordinates=False)
>>> print(mol.atoms())
Selector<SireMol::Atom>( size=12
0:  Atom( C1:1 )
1:  Atom( C2:2 )
2:  Atom( C3:3 )
3:  Atom( C4:4 )
4:  Atom( C5:5 )
...
7:  Atom( H8:8 )
8:  Atom( H9:9 )
9:  Atom( H10:10 )
10:  Atom( H11:11 )
11:  Atom( H12:12 )
)
>>> mol = sr.smiles("C1:C:C:C:C:C1", add_hydrogens=False)
>>> print(mol.atoms())
Selector<SireMol::Atom>( size=6
0:  Atom( C1:1 )
1:  Atom( C2:2 )
2:  Atom( C3:3 )
3:  Atom( C4:4 )
4:  Atom( C5:5 )
5:  Atom( C6:6 )
)

Note

Note that coordinates cannot be generated if add_hydrogens is False

The above code works by using the Chem.MolFromSmiles function from rdkit. This is used to create an rdkit Molecule which is converted to a sire Molecule using the functions in the convert module.

Generating smiles strings from molecules#

We can also convert in the other direction, e.g. from a Molecule to an rdkit Molecule. This can be used with rdkit’s MolToSmiles function to generate a smiles string. This is performed automatically using a molecule’s .smiles() function, e.g.

>>> mol = sr.smiles("C1:C:C:C:C:C1")
>>> print(mol.smiles())
c1ccccc1

Note how hydrogens have been left out from the smiles string. They are only included if they are needed to resolve any ambiguity in the structure or chirality. For example;

>>> mol = sr.smiles("C[C@H](N)C(=O)O")
>>> print(mol.smiles())
C[C@H](N)C(=O)O

You can ask for all of the hydrogens to be included explicitly by passing include_hydrogens as True.

>>> print(mol.smiles(include_hydrogens=True))
[H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]

The smiles function can be called on any molecule, even if it hasn’t been created from a smiles string, e.g.

>>> mols = sr.load(sr.expand(sr.tutorial_url, "ala.crd", "ala.top"))
>>> print(mols[0].smiles())
CNC(=O)C(C)NC(C)=O

You can also call it on a subset of the molecule, e.g.

>>> print(mols[0]["residx 0"].smiles())
C[C-]=O

Note

Note that smiles strings of subsets will have missing bonds, e.g. here we can see that the central carbon has a negative charge because it is missing the bond to the carbon in the next residue.

You can also create smiles strings for all molecules in a collection, e.g.

>>> print(mols[0:10].smiles())
['CNC(=O)C(C)NC(C)=O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

Note

Note that the smiles string for a water molecule is O

>>> print(mols[0:3].smiles(include_hydrogens=True))
['[H]N(C(=O)C([H])(N([H])C(=O)C([H])([H])[H])C([H])([H])[H])C([H])([H])[H]', '[H]O[H]', '[H]O[H]']