Converting Molecules¶
There are many available excellent molecular modelling packages for Python.
Rather than re-implementing their functionality, a design goal of
sire
is to support easy interconversion so that functionality
from those packages can be built on and re-used.
The sire.convert
module contains functions that support the
conversion of the sire
molecular objects into their equivalents
in a number of other packages.
BioSimSpace¶
BioSimSpace is an interoperable Python framework for biomolecular simulation that makes it easy for you to write simulation workflow components for, e.g. parameterising molecules, solvating systems, or running molecular dynamics or free energy simulations using a number of different backends (e.g. Amber, Gromacs, Namd etc.).
To convert a sire
object to a BioSimSpace object, you need to
ensure that you have imported BioSimSpace first in your script, before
you import sire
.
>>> import BioSimSpace as BSS
>>> import sire as sr
To convert to BioSimSpace, you just need to pass the sire
molecular
object to the function sire.convert.to()
, e.g.
>>> mols = sr.load(sr.expand(sr.tutorial_url, "ala.crd", "ala.top"))
>>> mol = mols[0]
>>> bss_mol = sr.convert.to(mol, "BioSimSpace")
>>> print(bss_mol)
<BioSimSpace.Molecule: number=2, nAtoms=22, nResidues=3>
You can now use the BioSimSpace Molecule exactly as you would any other BioSimSpace Molecule, e.g. calling BioSimSpace functions to generate forcefield parameters, solvate it, or run molecular dynamics or free energy simulations.
Note
Note that the format argument to sr.convert.to
is case-insensitive.
You could have use bss_mol = sr.convert.to(mol, "biosimspace")
.
sire.convert.to()
will convert any sub-view of a
molecule (e.g. a Residue
or Atom
)
into the BioSimSpace Molecule that contains that view.
>>> atom = mol[0]
>>> bss_mol = sr.convert.to(atom, "BioSimSpace")
>>> print(bss_mol)
<BioSimSpace.Molecule: number=2, nAtoms=22, nResidues=3>
It will convert a list of Molecule
or sub-view
objects into a list of equivalent BioSimSpace Molecules, e.g.
>>> bss_mols = sr.convert.to(mols[0:10], "BioSimSpace")
>>> print(bss_mols)
[<BioSimSpace.Molecule: number=2, nAtoms=22, nResidues=3>,
<BioSimSpace.Molecule: number=3, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=4, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=5, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=6, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=7, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=8, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=9, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=10, nAtoms=3, nResidues=1>,
<BioSimSpace.Molecule: number=11, nAtoms=3, nResidues=1>]
There is also logic to convert a
System
object, which is the collection of molecules plus associated metadata read from an input file, into the equivalent BioSimSpace System class.>>> bss_sys = sr.convert.to(mols, "BioSimSpace") >>> print(bss_sys) <BioSimSpace.System: nMolecules=631>
You can convert back from BioSimSpace to sire
using the same
function. For example,
>>> mol = sr.convert.to(bss_mol, "sire")
>>> print(mol)
Molecule( ACE:2 num_atoms=22 num_residues=3 )
Note that any sub-view of a BioSimSpace object will be converted to the
Molecule
that contains that view, e.g.
>>> mol = sr.convert.to(bss_mol.getAtoms()[0], "sire")
>>> print(mol)
Molecule( ACE:2 num_atoms=22 num_residues=3 )
A list of BioSimSpace molecule (or sub-view) objects will be converted to
a list of Molecule
objects.
>>> mols = sr.convert.to(bss_mols, "sire")
>>> print(mols)
SelectorMol( size=10
0: Molecule( ACE:2 num_atoms=22 num_residues=3 )
1: Molecule( WAT:3 num_atoms=3 num_residues=1 )
2: Molecule( WAT:4 num_atoms=3 num_residues=1 )
3: Molecule( WAT:5 num_atoms=3 num_residues=1 )
4: Molecule( WAT:6 num_atoms=3 num_residues=1 )
5: Molecule( WAT:7 num_atoms=3 num_residues=1 )
6: Molecule( WAT:8 num_atoms=3 num_residues=1 )
7: Molecule( WAT:9 num_atoms=3 num_residues=1 )
8: Molecule( WAT:10 num_atoms=3 num_residues=1 )
9: Molecule( WAT:11 num_atoms=3 num_residues=1 )
)
And a BioSimSpace System will be automatically converted to a
System
object.
>>> mols = sr.convert.to(bss_sys, "sire")
>>> print(mols)
System( name=ACE num_molecules=631 num_residues=633 num_atoms=1912 )
RDKit¶
RDKit is a collection of cheminformatics and
machine-learning software written in C++ and Python. Assuming you have
RDKit installed, you can convert sire
molecule and molecule view
objects to and from RDKit Molecule objects.
The sire.convert.supported_formats()
function lists the formats that
sire.convert
supports for the current installation. This will
depend on whether or not you have the package installed in the same conda
environment as sire
, and whether or not sire
was compiled
with support for that package.
>>> print(sr.convert.supported_formats())
['biosimspace', 'gemmi', 'openmm', 'rdkit', 'sire']
Note
If rdkit
isn’t listed, then you should quit Python and install
it, e.g. using the command conda install -c conda-forge rdkit
.
If it still isn’t listed then please raise an issue on the
sire GitHub repository.
You can convert to RDKit by passing rdkit
as the format argument to
sire.convert.to()
, e.g.
>>> rdkit_mol = sr.convert.to(mol, "rdkit")
>>> print(rdkit_mol)
<rdkit.Chem.rdchem.Mol object at 0x10283da10>
You can now use this RDKit Mol object identically to any other RDKit Mol object, e.g. generating smiles strings, performing sub-structure searches, maximum common substructure alignments, generating 2D views etc.
Just as for BioSimSpace, sire.convert.to()
will return the RDKit Mol
for the entire molecule that contains any sub-views that are passed.
For example,
>>> rdkit_mol = sr.convert.to(mol[0], "rdkit")
>>> print(rdkit_mol.GetNumAtoms())
22
Passing in a list of molecules or molecule views to convert will return a list of RDKit Mol objects.
>>> rdkit_mols = sr.convert.to(mols[0:10], "rdkit")
>>> print(rdkit_mols)
[<rdkit.Chem.rdchem.Mol object at 0x102c6a180>, <rdkit.Chem.rdchem.Mol object at 0x102c6a340>,
<rdkit.Chem.rdchem.Mol object at 0x102c6a3b0>, <rdkit.Chem.rdchem.Mol object at 0x102c69d90>,
<rdkit.Chem.rdchem.Mol object at 0x102c6a1f0>, <rdkit.Chem.rdchem.Mol object at 0x102c69af0>,
<rdkit.Chem.rdchem.Mol object at 0x102c698c0>, <rdkit.Chem.rdchem.Mol object at 0x102c69bd0>,
<rdkit.Chem.rdchem.Mol object at 0x102c69e00>, <rdkit.Chem.rdchem.Mol object at 0x102c69cb0>]
Note
RDKit does not have an equivalent of a System
object,
so these will be converted to a list of RDKit Mol objects.
You can also convert RDKit Mol objects back to Molecule
objects, e.g.
>>> mol = sr.convert.to(rdkit_mol, "sire")
>>> print(mol)
Molecule( ACE:633 num_atoms=22 num_residues=1 )
>>> mols = sr.convert.to(rdkit_mols, "sire")
>>> print(mols)
SelectorMol( size=10
0: Molecule( ACE:634 num_atoms=22 num_residues=1 )
1: Molecule( WAT:635 num_atoms=3 num_residues=1 )
2: Molecule( WAT:636 num_atoms=3 num_residues=1 )
3: Molecule( WAT:637 num_atoms=3 num_residues=1 )
4: Molecule( WAT:638 num_atoms=3 num_residues=1 )
5: Molecule( WAT:639 num_atoms=3 num_residues=1 )
6: Molecule( WAT:640 num_atoms=3 num_residues=1 )
7: Molecule( WAT:641 num_atoms=3 num_residues=1 )
8: Molecule( WAT:642 num_atoms=3 num_residues=1 )
9: Molecule( WAT:643 num_atoms=3 num_residues=1 )
)
This is useful, e.g. if you have created the molecule using RDKit’s
smiles functionality, and then want to convert to a Molecule
object for continued manipulation.
OpenMM¶
OpenMM is a high-performance toolkit for molecular simulation, which is particularly suited to running GPU-accelerated molecular dynamics (and related) simulations.
The sire.convert.supported_formats()
function lists the formats that
sire.convert
supports for the current installation. This will
depend on whether or not you have the package installed in the same conda
environment as sire
, and whether or not sire
was compiled
with support for that package.
>>> print(sr.convert.supported_formats())
['biosimspace', 'gemmi', 'openmm', 'rdkit', 'sire']
Note
If openmm
isn’t listed, then you should quit Python and install
it, e.g. using the command conda install -c conda-forge openmm
.
If it still isn’t listed then please raise an issue on the
sire GitHub repository.
sire.convert.to()
can convert a Molecule
or
molecule view into the equivalent for OpenMM.
>>> mols = sr.load(sr.expand(sr.tutorial_url, "ala.crd", "ala.top"))
>>> omm = sr.convert.to(mols[0], "openmm")
>>> print(omm)
<openmm.openmm.Context; proxy of <Swig Object of type 'OpenMM::Context *' at 0x14e95b510> >
The result is an OpenMM Context object which contains just the first molecule
from mols
. This can be used just like any
other OpenMM Context object, e.g. for running minimisation or dynamics.
An OpenMM Context object is returned because it contains within it:
Representations of the potentials and connectivity of the molecule(s) in the OpenMM System object (obtained via
omm.getSystem()
)The current coordinates and (optionally) the velocities of the molecule(s) in the OpenMM Integrator object (obtained via
omm.getIntegrator()
).
The context has placed these two object onto an OpenMM Platform object
(obtained via omm.getPlatform()
), so that the Context is ready for
simulation. You can change the platform or choose a new Integrator by
using the omm.getSystem()
or omm.getIntegrator()
to extract these
objects and then recombine them with the platform or integrator of your choice.
More detail on how you can control what platform and integrator is chosen for this conversion is available here.
You can convert a single molecule, list of molecules or an entire
System
to an OpenMM context in the same way, e.g.
>>> omm = sr.convert.to(mols, "openmm")
>>> print(omm)
<openmm.openmm.Context; proxy of <Swig Object of type 'OpenMM::Context *' at 0x14e9ee220> >
We do plan to add code to allow conversion back from an OpenMM Context to
the equivalent sire
object, but this is not yet ready for release.
Instead, we have lower-level functions that extract coordinates, velocities
and Space
objects from OpenMM State objects that are
extracted from the Context. Please do get in touch with us if you would like
to learn about these functions, and would like to contribute to coding
a more complete OpenMM to sire converter.
Gemmi¶
Gemmi is a Python library developed primarily for use in macromolecular crystallography (MX). In particular it can be used to parse PDBx/mmCIF files, refinement restraints, reflection data, 3D grid data and dealing with crystallographic symmetry. This is useful for structural bioinformatics.
The sire.convert.supported_formats()
function lists the formats that
sire.convert
supports for the current installation. This will
depend on whether or not you have the package installed in the same conda
environment as sire
, and whether or not sire
was compiled
with support for that package.
>>> print(sr.convert.supported_formats())
['biosimspace', 'gemmi', 'openmm', 'rdkit', 'sire']
Note
If gemmi
isn’t listed, then you should quit Python and install
it, e.g. using the command conda install -c conda-forge gemmi
.
If it still isn’t listed then please raise an issue on the
sire GitHub repository.
sire.convert.to()
can convert a System
, list
of molecules, or single molecule into a
Gemmi Structure
object.
>>> mols = sr.load(sr.expand(sr.tutorial_url, "ala.crd", "ala.top"))
>>> gemmi_struct = sr.convert.to(mols, "gemmi")
>>> print(gemmi_struct)
<gemmi.Structure with 1 model(s)>
>>> print(gemmi_struct[0].get_all_residue_names())
['ACE', 'ALA', 'NME', 'WAT']
Passing in a single molecule or subset of molecules will return a Gemmi Structure with just those molecules, e.g.
>>> gemmi_struct = sr.convert.to(mols[0], "gemmi")
>>> print(gemmi_struct)
<gemmi.Structure with 1 model(s)>
>>> print(gemmi_struct[0].get_all_residue_names())
['ACE', 'ALA', 'NME']
You can convert a
Gemmi Structure
back to a System
object, e.g.
>>> mols = sr.convert.to(gemmi_struct, "sire")
>>> print(mols)
System( name= num_molecules=1 num_residues=3 num_atoms=22 )
This conversion also preserves user-supplied System metadata, e.g.
>>> mols.set_metadata("name", "alanine dipeptide")
>>> mols.set_metadata("residues", ["ACE", "ALA", "NME"])
>>> mols.set_metadata("atoms", {"element": ["C", "N", "O"],
... "x_coords": [0.0, 1.0, 2.0],
... "y_coords": [3.0, 4.0, 5.0],
... "z_coords": [6.0, 7.0, 8.0]})
>>> sr.save(mols, "test.pdbx")
would add the following to the PDBx/mmCIF file:
data_sire
loop_
_atoms.element
_atoms.x_coords
_atoms.y_coords
_atoms.z_coords
C 0 3 6
N 1 4 7
O 2 5 8
_name "alanine dipeptide"
loop_
_residues.value
ACE
ALA
NME
which could be recovered when loading the file…
>>> mols = sr.load("test.pdbx")
>>> print(mols.metadata())
Properties(
residues => [ ACE,ALA,NME ],
name => alanine dipeptide,
atoms => Properties(
element => SireBase::StringArrayProperty( size=3
0: C
1: N
2: O
),
x_coords => SireBase::StringArrayProperty( size=3
0: 0
1: 1
2: 2
),
y_coords => SireBase::StringArrayProperty( size=3
0: 3
1: 4
2: 5
),
z_coords => SireBase::StringArrayProperty( size=3
0: 6
1: 7
2: 8
)
)
)
Note
Note that metadata values loaded from a PDBx/mmCIF file are always stored as strings. You may need to convert them to the appropriate type for your application (e.g., here the coordinate values are the strings “0”, “1”, “2” etc. rather than the numbers 0, 1, 2 etc.).
Anything to Anything¶
Above you have seen how sire.convert.to()
can convert to and from
sire
objects and other molecular modelling package objects.
It is actually more powerful than that! It recognises the object being
passed and can convert between any two object types that are supported
by sire
, using the sire
object format as an
intermediary.
For example, you can convert RDKit objects to BioSimSpace objects…
>>> import BioSimSpace as BSS
>>> import sire as sr
>>> from rdkit import Chem
>>> rdkit_mol = Chem.MolFromSmiles("Cc1ccccc1")
>>> bss_mol = sr.convert.to(rdkit_mol, "BioSimSpace")
>>> bss_mol
<BioSimSpace.Molecule: number=2, nAtoms=7, nResidues=1>
Note
Remember that you may need to exit Python and then restart to ensure that BioSimSpace is imported before sire.
Or you could convert BioSimSpace molecules back to RDKit…
>>> rdkit_mol = sr.convert.to(bss_mol, "rdkit")
>>> print(Chem.MolToSmiles(rdkit_mol))
[C-3]c1[c][c][c][c][c]1
or you could setup and parameterise a molecule in BioSimSpace and convert it to an OpenMM Context ready for minimisation or dynamics…
>>> url = BSS.tutorialUrl()
>>> bss_system = BSS.IO.readMolecules([f"{url}/ala.top", f"{url}/ala.crd"])
>>> omm = sr.convert.to(bss_system, "openmm")
>>> integrator = omm.getIntegrator()
>>> integrator.step(10)
>>> print(omm.getState().getTime())
0.010000000000000002 ps
or you could load a PDBx file from Gemmi and convert a “MAN” moelcule within it into an RDKit structure…
>>> import gemmi
>>> import sire as sr
>>> import rdkit
>>> from rdkit import Chem
>>> import urllib
>>> urllib.request.urlretrieve("https://files.rcsb.org/download/3NSS.cif.gz",
... filename="3NSS.cif.gz")
>>> struct = gemmi.read_structure("3NSS.cif.gz")
>>> mol = gemmi.Selection("(MAN)").copy_structure_selection(struct)
>>> rdkit_mol = sr.convert.to(mol, "rdkit")
>>> print(Chem.MolToSmiles(rdkit_mol))
O=C=C1OC(OC2=C(=O)C(=C=O)O[C-]=C2[O-])=C(=O)=C(=O)C1=O
Supporting other formats¶
We are actively looking for other molecular modelling packages to support. Please get in touch if you would like to suggest a package we should look at, or if you want to provide some help with implementation.