Sire-EMLE

In this section we will show how to use the emle-engine package as a QM/MM engine within sire. The emle-engine package provides support for a wide range of backends and embedding models, importantly providing a simple and efficient ML model for electrostatic embedding.

In order to use EMLE you will first need to create the following conda environment:

$ git clone https://github.com/chemle/emle-engine.git
$ cd emle-engine
$ conda env create -f environment_sire.yaml
$ conda activate emle-sire
$ pip install -e .

In this tutorial, we will perform a short ML/MM simulation of alanine dipeptide in water. First, let us load the molecular system:

>>> import sire as sr
>>> mols = sr.load_test_files("ala.crd", "ala.top")

Creating an EMLE calculator

Next we will create an emle-engine calculator to perform the QM (or ML) calculation for the dipeptide along with the ML electrostatic embedding. Since this is a small molecule it isn’t beneficial to perform the calculation on a GPU, so we will use the CPU instead.

>>> from emle.calculator import EMLECalculator
>>> calculator = EMLECalculator(device="cpu")

By default, emle-engine will use TorchANI as the backend for in vacuo calculation of energies and gradients. However, it is possible to use a wide variety of other backends, including your own as long as it supports the standand Atomic Simulation Environment (ASE) calculator interface. For details, see the backends section of the emle-engine documentation. At present, the default embedding model provided with emle-engine supports only the elements H, C, N, O, and S. We plan on adding support for other elements in the near future.

Creating a QM engine

We now need to set up the molecular system for the QM/MM simulation and create an engine to perform the calculation:

>>> qm_mols, engine = sr.qm.emle(
...     mols,
...     mols[0],
...     calculator,
...     cutoff="7.5A",
...     neighbour_list_frequency=20
... )

Here the first argument is the molecules that we are simulating, the second selection coresponding to the QM region (here this is the first molecule), and the third is calculator that was created above. The fourth and fifth arguments are optional, and specify the QM cutoff distance and the neighbour list update frequency respectively. (Shown are the default values.) The function returns a modified version of the molecules containing a “merged” dipeptide that can be interpolated between MM and QM levels of theory, along with an engine. The engine registers a Python callback that uses emle-engine to perform the QM calculation.

Running a QM/MM simulation

Next we need to create a dynamics object to perform the simulation. For QM/MM simulations it is recommended to use a 1 femtosecond timestep and no constraints. In this example we will use the lambda_interpolate keyword to interpolate the dipeptide potential between pure MM (λ=0) and QM (λ=1) over the course of the simulation, which can be used for end-state correction of binding free energy calculations.

>>> d = qm_mols.dynamics(
...     timestep="1fs",
...     constraint="none",
...     qm_engine=engine,
...     lambda_interpolate=[0, 1],
...     platform="cpu",
... )

We can now run the simulation. The options below specify the run time, the frequency at which trajectory frames are saved, and the frequency at which energies are recorded. The energy_frequency also specifies the frequency at which the λ value is updated.

>>> d.run("1ps", frame_frequency="0.05ps", energy_frequency="0.05ps")

Note

If you don’t require a trajectory file, then better performance can be achieved leaving the frame_frequency keyword argument unset.

Once the simulation has finished we can get back the trajectory of energy values. This can be obtained as a pandas DataFrame, allowing for easy plotting and analysis. The table below shows the instantaneous kinetic and potential energies as a function of λ, along with the accumulated non-equilibrium work. (Times are in picoseconds and energies are in kcal/mol.)

>>> nrg_traj = d.energy_trajectory(to_pandas=True)
>>> print(nrg_traj)
           lambda      kinetic      potential          work
time
6000.05  0.000000  1004.360323   -6929.923522      0.000000
6000.10  0.052632   907.430686  -23199.383591   -856.287372
6000.15  0.105263  1103.734847  -39773.815961  -1728.625918
6000.20  0.157895   982.097859  -56012.557224  -2583.296511
6000.25  0.210526  1035.727824  -72437.484783  -3447.766382
6000.30  0.263158  1029.009153  -88803.629979  -4309.142445
6000.35  0.315789  1014.269847 -105159.643486  -5169.985261
6000.40  0.368421  1021.246476 -121532.624612  -6031.721110
6000.45  0.421053  1022.233858 -137904.993921  -6893.424758
6000.50  0.473684  1025.310039 -154284.677129  -7755.513348
6000.55  0.526316  1025.001630 -170655.548776  -8617.138171
6000.60  0.578947  1016.891585 -187011.341345  -9477.969359
6000.65  0.631579  1022.910901 -203389.408932 -10339.972916
6000.70  0.684211  1024.431575 -219765.627241 -11201.879143
6000.75  0.736842  1052.484710 -236168.647435 -12065.195995
6000.80  0.789474  1032.732604 -252520.971205 -12925.844615
6000.85  0.842105  1061.216013 -268919.903129 -13788.946295
6000.90  0.894737  1062.979311 -285305.108112 -14651.325505
6000.95  0.947368  1057.025646 -301673.184597 -15512.803215
6001.00  1.000000  1024.034371 -318006.345331 -16372.443253

Note

In the table above, the time doesn’t start from zero because the example molecular system was loaded from an existing trajectory restart file.

Note

Unlike the sander interface of emle-engine, the interpolated potential energy is non-linear with respect to λ, i.e. it is not precisely a linear combination of MM and QM energies. This is because the sire interface performs a perturbation of the system parameters from MM to QM as λ is changed, e.g. scaling down the force constants for bonded terms in the QM region and scaling down the charges. Perturbing charges linearly results in an energy change between charges that is quadratic in λ.

Interfacing with OpenMM-ML

In the example above we used a sire dynamics object d to run the simulation. This is wrapper around a standard OpenMM context object, providing a simple convenience functions to make it easier to run and analyse simulations. However, if you are already familiar with OpenMM, then it is possible to use emle-engine with OpenMM directly. This allows for fully customised simulations, or the use of OpenMM-ML as the backend for calculation of the intramolecular force for the QM region.

To use OpenMM-ML as the backend for the QM calculation, you will first need to install the package:

$ conda install -c conda-forge openmm-ml

Next, you will need to create an MLPotential for desired backend. Here we will use the ANI-2x, as was used for the EMLECalculator above. The

>>> import openmm
>>> from openmmml import MLPotential
>>> potential = MLPotential("ani2x")

Since we are now using the MLPotential for the QM calculation, we need to create a new EMLECalculator object with no backend, i.e. one that only computes the electrostatic embedding:

>>> calculator = EMLECalculator(backend=None, device="cpu")

Next we create a new engine bound to the calculator:

>>> _, engine = sr.qm.emle(
>>> ... mols, mols[0], calculator, cutoff="7.5A", neighbour_list_frequency=20
>>> ... )

Note

qm_mols is not needed when using OpenMM-ML, since it will perform its own internal modifications for performing interpolation.

Rather than using this engine with a sire dynamics object, we can instead extract the underlying OpenMM force object and add it to an existing OpenMM system. The forces can be extracted from the engine as follows:

>>> emle_force, interpolation_force = engine.get_forces()

The emle_force object is the OpenMM force object that calculates the electrostatic embedding interaction. The interpolation_force is a null CustomBondForce object that contains a lambda_emle global parameter than can be used to scale the electrostatic embedding interaction. (By default, this is set to 1, but can be set to any value between 0 and 1.)

Note

The interpolation_force has no energy contribution. It is only required as there is currently no way to add global parameters to the EMLEForce.

Next we need to save the original molecular system to disk so that we can load it with OpenMM. Here we will use AMBER format files, but any format supported by OpenMM can be used.

>>> sr.save(mols, "ala", ["prm7", "rst7"])

We can now read them back in with OpenMM:

>>> prmtop = openmm.app.AmberPrmtopFile("ala.prm7")
>>> inpcrd = openmm.app.AmberInpcrdFile("ala.rst7")

Next we use the prmtop to create the MM system:

>>> mm_system = prmtop.createSystem(
...     nonbondedMethod=openmm.app.PME,
...     nonbondedCutoff=1 * openmm.unit.nanometer,
...     constraints=openmm.app.HBonds,
... )

In oder to create the ML system, we first define the ML region. This is a list of atom indices that are to be treated with the ML model.

>>> ml_atoms = list(range(qm_mols[0].num_atoms()))

We can now create the ML system:

>>> ml_system = potential.createMixedSystem(
...     prmtop.topology, mm_system, ml_atoms, interpolate=True
... )

By setting interpolate=True we are telling the MLPotential to create a mixed system that can be interpolated between MM and ML levels of theory using the lambda_interpolate global parameter. (By default this is set to 1.)

Note

If you choose not to add the emle interpolation force to the system, then the EMLEForce will also use the lambda_interpolate global parameter. This allows for the electrostatic embedding to be alongside or independent of the ML model.

We can now add the emle forces to the system:

>>> ml_system.addForce(emle_force)
>>> ml_system.addForce(interpolation_force)

In order to ensure that OpenMM-ML doesn’t perform mechanical embedding, we next need to zero the charges of the QM atoms in the MM system:

>>> for force in ml_system.getForces():
...     if isinstance(force, mm.NonbondedForce):
...         for i in ml_atoms:
...             _, sigma, epsilon = force.getParticleParameters(i)
...             force.setParticleParameters(i, 0, sigma, epsilon)

In order to run a simulation we need to create an integrator and context. First we create the integrator:

>>> integrator = openmm.LangevinMiddleIntegrator(
...     300 * openmm.unit.kelvin,
...     1.0 / openmm.unit.picosecond,
...     0.002 * openmm.unit.picosecond,
... )

And finally the context:

>>> context = openmm.Context(ml_system, integrator)
>>> context.setPositions(inpcrd.positions)

Creating an EMLE torch module

As well as the EMLECalculator, the emle-engine package provides Torch modules for the calculation of the electrostatic embedding. These can be used to create derived modules for the calculation of in vacuo and electrostatic embedding energies for different backends. For example, we provide an optimised ANI2xEMLE module that can be used to add electrostatic embedding to the existing ANI2x model from TorchANI.

Note

Torch support is currently not available for our Windows conda pacakge since pytorch is not available for Windows on the conda-forge. It is possible to compile Sire from source using a local pytorch installation, or using the pacakge from the official pytorch conda channel.

As an example for how to use the module, let’s again use the example alanine dipeptide system. First, let’s reload the system and center the solute within the simulation box:

>>> mols = sr.load_test_files("ala.crd", "ala.top")
>>> center = mols[0].coordinates()
>>> mols.make_whole(center=center)

To obtain the point charges around the QM region we can take advantage of Sire’s powerful search syntax, e.g:

>>> mols["mols within 7.5 of molidx 0"].view()
Alanine-dipeptide in water.

Next we will set the device and dtype for our Torch tensors:

>>> import torch
>>> device = torch.device("cuda")
>>> dtype = torch.float32

Now we can create the input tensors for our calculation. First the coordinates of the QM region:

>>> coords_qm = torch.tensor(
...     sr.io.get_coords_array(mols[0]),
...     device=device,
...     dtype=dtype,
...     requires_grad=True,
... )

Next the coordinates of the MM region, which can be obtained using the search term above:

>>> mm_atoms = mols["water within 7.5 of molidx 0"].atoms()
>>> coords_mm = torch.tensor(
...     sr.io.get_coords_array(mm_atoms),
...     device=device,
...     dtype=dtype,
...     requires_grad=True,
... )

Now the atomic numbers for the atoms within the QM region:

>>> atomic_numbers = torch.tensor(
...     [element.num_protons() for element in mols[0].property("element")],
...     device=device,
...     dtype=torch.int64,
... )

And finally the charges of the MM atoms:

>>> charges_mm = torch.tensor([atom.property("charge").value() for atom in mm_atoms],
...     device=device,
...     dtype=dtype
... )

In order to perform a calculation we need to create an instance of the ANI2xEMLE module:

>>> from emle.models import ANI2xEMLE
>>> model = ANI2xEMLE().to(device)

We can now calculate the in vacuo and electrostatic embedding energies:

>>> energies = model(atomic_numbers, charges_mm, coords_qm, coords_mm)
>>> print(energies)
tensor([-4.9570e+02, -4.2597e-02, -1.2952e-02], device='cuda:0',
       dtype=torch.float64, grad_fn=<StackBackward0>)

The first element of the tensor is the in vacuo energy of the QM region, the second is the static electrostatic embedding energy, and the third is the induced electrostatic embedding energy.

Then we can use autograd to compute the gradients of the energies with respect to the QM and MM coordinates:

>>> grad_qm, grad_mm = torch.autograd.grad(energies.sum(), (coords_qm, coords_mm))
>>> print(grad_qm)
>>> print(grad_mm)
tensor([[-2.4745e-03, -1.2421e-02,  1.1079e-02],
        [-7.0100e-03, -2.9659e-02, -6.8182e-03],
        [-1.8393e-03,  1.1682e-02,  1.1509e-02],
        [-3.4777e-03,  1.5750e-03, -1.9650e-02],
        [-3.4737e-02,  7.3493e-02,  3.7996e-02],
        [-9.3575e-03, -3.7101e-02, -2.0774e-02],
        [ 9.2816e-02, -7.5343e-03, -5.0656e-02],
        [ 4.9443e-03,  1.1114e-02, -4.0737e-04],
        [-1.6362e-03,  3.0464e-03,  3.0192e-02],
        [-6.2813e-03, -1.3678e-02, -3.4606e-03],
        [ 4.5878e-03,  3.0234e-02, -2.9871e-02],
        [-3.8999e-03, -1.3376e-02, -2.6382e-03],
        [ 4.4184e-03, -7.4247e-03,  5.1742e-04],
        [ 8.8851e-05, -8.5786e-03,  1.2712e-02],
        [-5.9939e-02,  1.1648e-01,  1.6692e-01],
        [-6.4231e-03, -4.4771e-02,  3.0655e-03],
        [ 1.1274e-01, -6.4833e-02, -1.5494e-01],
        [ 1.8500e-03,  5.5206e-03, -7.0060e-03],
        [-6.3634e-02, -1.5340e-02, -2.7031e-03],
        [ 7.7061e-03,  3.7852e-02,  6.0927e-03],
        [-2.9915e-03, -3.5084e-02,  2.3909e-02],
        [-1.5018e-02,  8.6911e-03, -2.5789e-03]], device='cuda:0')
tensor([[ 1.8065e-03, -1.4048e-03, -6.0694e-04],
        [-9.0640e-04,  5.1307e-04,  9.6374e-06],
        [-8.4827e-04,  9.5815e-04,  1.7164e-04],
        ...,
        [-5.7833e-04, -1.9125e-04,  2.0395e-03],
        [ 3.2311e-04,  2.1525e-04, -7.8029e-04],
        [ 3.5424e-04,  4.0781e-04, -1.5014e-03]], device='cuda:0')

The model is serialisable, so can be saved and loaded using the standard torch.jit functions, e.g.:

>>> script_model = torch.jit.script(model)
>>> torch.jit.save(script_model, "ani2xemle.pt")

It is also possible to use the model with Sire when performing QM/MM dynamics:

>>> qm_mols, engine = sr.qm.emle(
...     mols, mols[0], model, cutoff="7.5A", neighbour_list_frequency=20
... )

The model will be serialised and loaded into a C++ TorchQMEngine object, bypassing the need for a Python callback.