Running Faster Free Energy Simulations#

The previous section showed how to calculate the relative hydration free energy of ethane and methanol using alchemical dynamics. Short dynamics simulation were run for each λ-value, with the energy differences between neighbouring λ-values used to calculate the free energy differences.

The simulations used a timestep of 1 fs, and calculated energy differences every 0.1 ps (100 steps). This calculated a free energy that should be accurate, but the simulation took a long time to run. In this section, we will show how to calculate the same result, but with a much faster simulation.

Timesteps and Constraints#

The easiest way to speed up a simulation is to increase the dynamic timestep. This is the amount of time between each step of the simulation, i.e. the amount of time between each calculation of the forces. The longer the timestep, the faster the simulation will run, but at the cost of more unstable dynamics. At an extreme, the simulation will become totally unstable and an OpenMM Particle Exception will be raised.

As a rule of thumb, the timestep should be less than the fastest vibrational motion in the simulation. Since the fastest vibrations will likely involve bonds with hydrogen atoms, we can make the simulation more stable by contraining the bonds that involve hydrogen atoms.

We can choose the constraint to use via the constraint keyword, e.g. after loading the molecules and minimising,

>>> import sire as sr
>>> mols = sr.load(sr.expand(sr.tutorial_url, "merged_molecule.s3"))
>>> mols = sr.morph.link_to_reference(mols)
>>> mols = mols.minimisation().run().commit()

…we can turn on constraints of the bonds involving hydrogen atoms by setting constraint to h-bonds.

>>> d = mols.dynamics(timestep="2fs", temperature="25oC", constraint="h-bonds")
>>> d.run("5ps")
>>> print(d)
Dynamics(completed=5 ps, energy=-32140.6 kcal mol-1, speed=51.2 ns day-1)

We would hope that this is faster than running a simulation with no constraints and a 1 fs timestep…

>>> d = mols.dynamics(timestep="1fs", temperature="25oC", constraint="none")
>>> d.run("5ps")
>>> print(d)
Dynamics(completed=5 ps, energy=-27197.8 kcal mol-1, speed=67.3 ns day-1)

…but this is not the case. We can see here that the cost of the constraints has out-weighed the benefit of having half the simulation steps.

The simulation can go a lot faster with a 4 fs timestep…

>>> d = mols.dynamics(timestep="4fs", temperature="25oC", constraint="h-bonds")
>>> d.run("4ps")
>>> print(d)
Dynamics(completed=4 ps, energy=-32807 kcal mol-1, speed=100.9 ns day-1)

However, turning on h-bonds constraints would not be enough to keep the simulation stable for larger timesteps. For example, if we use a timestep of 5 fs…

>>> d = mols.dynamics(timestep="5fs", temperature="25oC", constraint="h-bonds")
>>> d.run("5ps")
OpenMMException: Particle coordinate is NaN.  For more information, see
https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

For such timesteps, we would need to constrain all bonds and angles involving hydrogen, using the h-bonds-h-angles constraint.

>>> d = mols.dynamics(timestep="5fs", temperature="25oC",
...                   constraint="h-bonds-h-angles")
>>> d.run("5ps")
>>> print(d)
Dynamics(completed=5 ps, energy=-34450.6 kcal mol-1, speed=116.2 ns day-1)

You can go even further by constraining all bonds, and all angles involving hydrogen using the bonds-h-angles constraint…

>>> d = mols.dynamics(timestep="8fs", temperature="25oC",
...                   constraint="bonds-h-angles")
>>> d.run("5ps")
>>> print(d)
Dynamics(completed=5 ps, energy=-34130.9 kcal mol-1, speed=152.6 ns day-1)

Note

There is also the bonds constraint which constrains all chemical bonds, but this doesn’t really help unless angles involving hydrogen aren’t also constrained.

Note

Different molecular systems will behave differently, and may produce more, or less stable dynamics than that shown above. As a rule of thumb, start using a 4 fs timestep with h-bonds-h-angles constraints, and then either increase the timestep and dial up the constraints if you can, or reduce the timestep and reduce the timestep if the simulation becomes unstable.

To make things easy, sire automatically chooses a suitable constraint based on the timestep. You can see the constraint chosen using the constraint() function.

>>> d = mols.dynamics(timestep="8fs", temperature="25oC")
>>> print(d.constraint())
bonds-h-angles

You can disable all constraints by setting constraint equal to none, e.g.

>>> d = mols.dynamics(timestep="4fs", temperature="25oC", constraint="none")
>>> print(d.constraint())
none
>>> d.run("5ps")
RuntimeError: The kinetic energy has exceeded 1000 kcal mol-1 per atom (it is 2.2202087996265908e+16 kcal mol-1 atom-1, and
2.701328046505673e+20 kcal mol-1 total). This suggests that the simulation has become unstable. Try reducing the timestep and/or minimising
the system and run again.

but do expect to see a ParticleException or other RuntimeError exceptions raised at some point!

Note

Setting constraint to the Python None type indicates that you don’t have a preference, and so sire will choose a suitable constraint for you. Use the string "none" to explicitly disable constraints.

Constraints and Perturbable Molecules#

While constraints are useful for speeding up simulations, they can cause problems when used with perturbable (merged) molecules. This is because the constraints hold the bonds and/or angles at fixed values based on the starting coordinates of the simulation. Changes in λ, which may change the equilibrium bond and angle parameters, will not be reflected in the free energy. This is because the constraints will stop dynamics from sampling these perturbing bonds and angles.

For example, changing the value of λ to 1.0, and sampling with bond and angle constraints on would force methanol to adopt ethane’s internal geometry.

>>> print(mols[0].bond("element C", "element C").length())
1.53625 Å
>>> d = mols.dynamics(timestep="1fs", temperature="25oC", lambda_value=1.0)
>>> d.run("5ps")
>>> print(d.commit()[0].bond("element C", "element C").length())
1.48737 Å
>>> d = mols.dynamics(timestep="2fs", constraint="bonds-h-angles",
...                   temperature="25oC", lambda_value=1.0)
>>> d.run("5ps")
>>> print(d.commit()[0].bond("element C", "element C").length())
1.53625 Å

As seen here, the C-C bond length for ethane is 1.54 Å, while the C-O bond length for methanol is 1.49 Å. Using bonds-h-angles constrains this bond, meaning that the simulation at λ=1 uses ethane’s bond length (1.54 Å) rather than methanol’s (1.49 Å).

Note

The C-C bond morphs into the C-O bond during the perturbation from ethane to methanol. Earlier, we mapped the default parameters to those of ethane, meaning that the elements property of ethane is used by default. This is why we searched for bond("element C", "element C") rather than bond("element C", "element O"). We would use bond("element C", "element O") if we had used link_to_perturbed() to set the default properties.

One solution is to choose a different constraint for perturbable molecules than for the rest of the system. You can do this using the perturbable_constraint keyword, e.g.

>>> d = mols.dynamics(timestep="2fs",
...                   constraint="bonds-h-angles",
...                   perturbable_constraint="none",
...                   temperature="25oC",
...                   lambda_value=1.0)
>>> d.run("5ps")
>>> print(d.commit()[0].bond("element C", "element C").length())
1.42569 Å

has run dynamics using no constraints on the perturbable molecules, and bonds-h-angles constraints on all other molecules. This has allowed sampling of the C-O bond in methanol, so that it was able to vibrate around its equilibrium bond length (1.42 Å).

The perturbable_constraint argument accepts the same values as constraint, i.e. none, h-bonds, h-bond-h-angles etc.

Note

By default, perturbable_constraint will have the same value as constraint.

Note

You must set perturbable_constraint to the string none if you want to disable constraints. Setting perturbable_constraint to the Python None indicates you are providing no preference for any constraint, and so the code automatically assigns perturbable_constraint as equal to constraint.

Unfortunately, not constraining the bonds and/or angles of the perturbable molecules will impact the stability of dynamics, and thus the size of timestep that will be achievable. For example,

>>> d = mols.dynamics(timestep="4fs",
...                   constraint="h-bonds-h-angles",
...                   perturbable_constraint="none",
...                   temperature="25oC",
...                   lambda_value=1.0)
>>> d.run("5ps")
OpenMMException: Particle coordinate is NaN.  For more information, see
https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

Note

You can still use constraints on perturbable molecules. Just be careful to minimise and then equilibrate the molecule(s) at the desired value of λ without using constraints, so that the perturbable bonds and angles have the right size for that value of λ. You can then run longer simulations with constraints applied, as they will use the bond / angle sizes measured from these starting coordinates as the constrained values.

Hydrogen Mass Repartitioning#

Bonds involving hydrogen atoms vibrate quickly because vibrational frequency is related to atomic mass - the lighter the atom, the faster it will vibrate. We can reduce the frequency of these vibrations by increasing the mass of the hydrogens. Fortunately, free energy is derived from the potential energy of the molecules, which is independent of their mass. So, we are free to magically move mass from heavy atoms such as carbon to their bonded hydrogen atoms without affecting the free energy.

This method, called “hydrogen mass repartitioning”, is implemented in the sire.morph.repartition_hydrogen_masses() function. It takes a molecule as argument, and returns that same molecule with its hydrogen masses repartitioned.

>>> mol = mols.molecule("molecule property is_perturbable")
>>> repartitioned_mol = sr.morph.repartition_hydrogen_masses(mol)
>>> for atom0, atom1 in zip(mol.atoms(), repartitioned_mol.atoms()):
...    print(atom0, atom0.property("mass"), atom1.property("mass"))
Atom( C1:1    [  25.71,   24.94,   25.25] ) 12.01 g mol-1 10.498 g mol-1
Atom( C2:2    [  24.29,   25.06,   24.75] ) 12.01 g mol-1 10.498 g mol-1
Atom( H3:3    [  25.91,   23.89,   25.56] ) 1.008 g mol-1 1.512 g mol-1
Atom( H4:4    [  26.43,   25.22,   24.45] ) 1.008 g mol-1 1.512 g mol-1
Atom( H5:5    [  25.86,   25.61,   26.13] ) 1.008 g mol-1 1.512 g mol-1
Atom( H6:6    [  24.14,   24.39,   23.87] ) 1.008 g mol-1 1.512 g mol-1
Atom( H7:7    [  24.09,   26.11,   24.44] ) 1.008 g mol-1 1.512 g mol-1
Atom( H8:8    [  23.57,   24.78,   25.55] ) 1.008 g mol-1 1.512 g mol-1

This has repartioned the hydrogen masses using a mass_factor of 1.5. This factor is known to be good for using the (default) LangevinMiddleIntegrator with a timestep of 4 fs. You may need to use a different factor for a different integrator or timestep, e.g.

>>> rep4_mol = sr.morph.repartition_hydrogen_masses(mol, mass_factor=4)
>>> for atom0, atom1 in zip(mol.atoms(), rep4_mol.atoms()):
...    print(atom0, atom0.property("mass"), atom1.property("mass"))
Atom( C1:1    [  25.71,   24.94,   25.25] ) 12.01 g mol-1 2.938 g mol-1
Atom( C2:2    [  24.29,   25.06,   24.75] ) 12.01 g mol-1 2.938 g mol-1
Atom( H3:3    [  25.91,   23.89,   25.56] ) 1.008 g mol-1 4.032 g mol-1
Atom( H4:4    [  26.43,   25.22,   24.45] ) 1.008 g mol-1 4.032 g mol-1
Atom( H5:5    [  25.86,   25.61,   26.13] ) 1.008 g mol-1 4.032 g mol-1
Atom( H6:6    [  24.14,   24.39,   23.87] ) 1.008 g mol-1 4.032 g mol-1
Atom( H7:7    [  24.09,   26.11,   24.44] ) 1.008 g mol-1 4.032 g mol-1
Atom( H8:8    [  23.57,   24.78,   25.55] ) 1.008 g mol-1 4.032 g mol-1

would multiply the hydrogen masses by a factor of 4.

Note

By default, this function will repartition the masses of the hydrogens in both the standard mass property (mass) and also for the end-state mass properties (mass0 and mass1 if they exist). You can disable repartitioning of the end-state masses by setting include_end_states to False. You can choose to repartition specific mass properties by passing in a property map, e.g. map={"mass": "mass0"}.

The repartitioned molecule has the same mass as the original molecule, but the hydrogens have been made heavier. The mass of the carbon atoms has been reduced to compensate.

>>> print(mol.mass(), repartitioned_mol.mass())
30.068 g mol-1 30.068 g mol-1

It is normal to only repartition the hydrogen masses of perturbable molecules. This lets us disable the constraints on the perturbable molecules, but still use a large timestep (e.g. 3 fs or 4 fs). This is because the hydrogens on the perturbable molecules would become heavier, and so the vibrations of those atoms should have a lower frequency.

To aid this, we can use the h-bonds-not-perturbed constraint. This will constrain all bonds involving hydrogen atoms, except for those involving atoms that are directly perturbed. This means that all of the non-perturbing bonds in the perturbable molecule will be constrained, which should aid simulation stability for large timesteps.

>>> mols.update(repartitioned_mol)
>>> d = mols.dynamics(timestep="4fs", temperature="25oC",
...                   constraint="h-bonds-not-perturbed",
...                   perturbable_constraint="none")
>>> d.run("5ps")
>>> print(d)
Dynamics(completed=5 ps, energy=-31950.1 kcal mol-1, speed=100.8 ns day-1)

This has given us the best of both worlds - a fast simulation with a larger timestep, plus no constraints on the bonds that perturb in the perturbable molecules.

Using this protocol, we can now recalculate the relative free energy of ethane and methanol.

>>> for l in range(0, 105, 5):
...     # turn l into the lambda value by dividing by 100
...     lambda_value = l / 100.0
...     print(f"Simulating lambda={lambda_value:.2f}")
...     # minimise the system at this lambda value
...     min_mols = mols.minimisation(lambda_value=lambda_value,
...                                  constraint="h-bonds-not-perturbed").run().commit()
...     # create a dynamics object for the system
...     d = min_mols.dynamics(timestep="4fs", temperature="25oC",
...                           lambda_value=lambda_value,
...                           constraint="h-bonds-not-perturbed")
...     # generate random velocities
...     d.randomise_velocities()
...     # equilibrate, not saving anything
...     d.run("2ps", save_frequency=0)
...     print("Equilibration complete")
...     print(d)
...     # get the values of lambda for neighbouring windows
...     lambda_windows = [lambda_value]
...     if lambda_value > 0:
...         lambda_windows.insert(0, (l-5)/100.0)
...     if lambda_value < 1:
...         lambda_windows.append((l+5)/100.0)
...     # run the dynamics, saving the energy every 0.1 ps
...     d.run("25ps", energy_frequency="0.1ps", frame_frequency=0,
...           lambda_windows=lambda_windows)
...     print("Dynamics complete")
...     print(d)
...     # stream the EnergyTrajectory to a sire save stream object
...     sr.stream.save(d.commit().energy_trajectory(),
...                    f"energy_fast_{lambda_value:.2f}.s3")

The simulation runs 33% faster than before, taking about 30 seconds per λ-window.

We can now calculate the free energy using alchemlyb as before;

>>> df = sr.morph.to_alchemlyb("energy_fast_*.s3")
>>> from alchemlyb.estimators import BAR
>>> b = BAR()
>>> b.fit(df)
>>> print(b.delta_f_.loc[0.00, 1.00])
-2.9972763590251836

Within error, this is the same free energy as before, but calculated in a little less time.

Note

The random error on these calculations can be seen in b.d_delta_f, which shows the error per λ-window is about 0.02 kcal mol-1. Summed over the whole λ-coordinate suggest an error of about 0.4 kcal mol-1. You could reduce this error by running the simulation for longer (e.g. 250 ps per λ-window) or by running multiple repeats and taking and average.

Switching to Reaction Field from PME#

By default, the dynamics simulation uses the particle mesh Ewald (PME) method to calculate long-range electrostatics. This is a very accurate, but comes with a high computational cost. We can switch to the faster reaction field method by setting the cutoff_type option to RF. This cutoff type works well for perturbations that don’t involve a change in net charge (as most perturbations). Switching to RF can improve the simulation speed by 25-50%.

Complete Example Script#

Putting everything together, here is a simple script that does all of the work of calculating the relative hydration free energy of ethane and methanol. The key parameters (e.g. timestep, constraint, run time, cutoff type, λ-values etc) are pulled out as variables at the top. The script then runs a dynamics simulation for each λ-window for the water leg, then a dynamics simulation using the same parameters and λ-windows for the vacuum leg. The free energies are collected and then calculated using BAR from alchemlyb. This is a good starting point for you to adapt for your own simulations. Or take a look at BioSimSpace or the upcoming somd2 software if you are interested in higher-level interfaces to this functionality that automatically run more complex protocols.

import sire as sr

timestep = "4fs"
energy_frequency = "0.4ps"

constraint = "h-bonds-not-perturbed"
perturbable_constraint = "h-bonds-not-perturbed"

cutoff_type = "RF"

equil_time = "2ps"
run_time = "25ps"

lambda_values = [x / 100.0 for x in range(0, 101, 5)]

mols = sr.load(sr.expand(sr.tutorial_url, "merged_molecule.s3"))

for mol in mols.molecules("molecule property is_perturbable"):
    mol = sr.morph.link_to_reference(mol)
    mol = sr.morph.repartition_hydrogen_masses(mol, mass_factor=1.5)
    mols.update(mol)

mols = mols.minimisation(cutoff_type=cutoff_type).run().commit()

print("\nWater leg")

for i, lambda_value in enumerate(lambda_values):
    print(f"Simulating lambda={lambda_value:.2f}")
    # minimise the system at this lambda value
    min_mols = (
        mols.minimisation(
            cutoff_type=cutoff_type,
            lambda_value=lambda_value,
            constraint=constraint,
            perturbable_constraint="none",
        )
        .run()
        .commit()
    )

    # create a dynamics object for the system
    d = min_mols.dynamics(
        timestep=timestep,
        temperature="25oC",
        cutoff_type=cutoff_type,
        lambda_value=lambda_value,
        constraint=constraint,
        perturbable_constraint=perturbable_constraint,
    )

    # generate random velocities
    d.randomise_velocities()

    # equilibrate, not saving anything
    d.run(equil_time, save_frequency=0)
    print("Equilibration complete")
    print(d)

    # get the values of lambda for neighbouring windows
    lambda_windows = lambda_values[max(i - 1, 0) : min(len(lambda_values), i + 2)]

    # run the dynamics, saving the energy every 0.1 ps
    d.run(
        run_time,
        energy_frequency=energy_frequency,
        frame_frequency=0,
        lambda_windows=lambda_windows,
    )
    print("Dynamics complete")
    print(d)

    # stream the EnergyTrajectory to a sire save stream object
    sr.stream.save(
        d.commit().energy_trajectory(), f"energy_water_{lambda_value:.2f}.s3"
    )

print("\nVacuum leg")

for i, lambda_value in enumerate(lambda_values):
    print(f"Simulating lambda={lambda_value:.2f}")
    # minimise the system at this lambda value using the
    # same constraints as the dynamics (but switching
    # off the perturbable constraint)
    min_mols = (
        mols[0]
        .minimisation(
            lambda_value=lambda_value,
            vacuum=True,
            constraint=constraint,
            perturbable_constraint="none",
        )
        .run()
        .commit(return_as_system=True)
    )

    # create a dynamics object for the system
    d = min_mols.dynamics(
        timestep=timestep,
        temperature="25oC",
        lambda_value=lambda_value,
        constraint=constraint,
        perturbable_constraint=perturbable_constraint,
        vacuum=True,
    )

    # generate random velocities
    d.randomise_velocities()

    # equilibrate, not saving anything
    d.run(equil_time, save_frequency=0)
    print("Equilibration complete")
    print(d)

    # get the values of lambda for neighbouring windows
    lambda_windows = lambda_values[max(i - 1, 0) : min(len(lambda_values), i + 2)]

    # run the dynamics, saving the energy every 0.1 ps
    d.run(
        run_time,
        energy_frequency=energy_frequency,
        frame_frequency=0,
        lambda_windows=lambda_windows,
    )
    print("Dynamics complete")
    print(d)

    # stream the EnergyTrajectory to a sire save stream object
    sr.stream.save(
        d.commit().energy_trajectory(),
        f"energy_vacuum_{lambda_value:.2f}.s3",
    )

from alchemlyb.estimators import BAR

df = sr.morph.to_alchemlyb("energy_water_*.s3")

b = BAR()
b.fit(df)

dG_solv = b.delta_f_.loc[0.00, 1.00]

df = sr.morph.to_alchemlyb("energy_vacuum_*.s3")

b = BAR()
b.fit(df)

dG_vac = b.delta_f_.loc[0.00, 1.00]

print(f"Water phase:  {dG_solv} kcal mol-1")
print(f"Vacuum phase: {dG_vac} kcal mol-1")
print(f"Hydration free energy: {dG_solv - dG_vac} kcal mol-1")

The relative hydration free energy calculated using the above script is:

Water phase:  -3.2587718667371606 kcal mol-1
Vacuum phase: 2.980451221216327 kcal mol-1
Hydration free energy: -6.239223087953487 kcal mol-1

This is in good agreement with published results from other codes which are typically -5.99 kcal mol-1 to -6.26 kcal mol-1.

Note

There will be some variation between different codes and different protocols, as the convergence of the free energy estimate is sensitive to the length of the dynamics simulation at each λ-value. In this case, we used very short simulations. Edit the script to increase the amount of simulation time per λ-value to get a more accurate result.

This script can be edited to run a longer, more statistically accurate by setting run_time to a larger value, e.g. 250 ps. On my machine, this results in a simulation that takes 5 minutes per λ-window for the water phase at 100 ns day-1, and 45 seconds per λ-window for the vacuum phase, at 950 ns day-1 (total run time about 120 minutes). The resulting free energy is in remarkable agreement with that from the shorter simulation.

Water phase:  -3.3027026336540715 kcal mol-1
Vacuum phase: 2.9761399609273567 kcal mol-1
Hydration free energy: -6.278842594581429 kcal mol-1

We can check the convergence and get an error estimate as before. First the water phase;

>>> import sire as sr
>>> from glob import glob
>>> from alchemlyb.convergence import forward_backward_convergence
>>> from alchemlyb.visualisation import plot_convergence
>>> dfs = []
>>> s3files = glob("energy_water*.s3")
>>> s3files.sort()
>>> for s3 in s3files:
...    dfs.append(sr.stream.load(s3).to_alchemlyb())
>>> f = forward_backward_convergence(dfs, "bar")
>>> plot_convergence(f)

Note

It is important that the dataframes are passed to alchemlyb in ascending order of λ-value, or else it will calculate the wrong free energy. This is why we had to sort the result of globbing the files, and why a file naming scheme was chosen that names the files in ascending λ order.

Convergence of the free energy estimate for the water phase.

And now the vacuum phase;

>>> dfs = []
>>> s3files = glob("energy_vacuum*.s3")
>>> s3files.sort()
>>> for s3 in s3files:
...    dfs.append(sr.stream.load(s3).to_alchemlyb())
>>> f = forward_backward_convergence(dfs, "bar")
>>> plot_convergence(f)
Convergence of the free energy estimate for the vacuum phase.

From the convergence plots and the output printed by alchemlyb, we can see that the error on the water leg is 0.02 kcal mol-1, and the error on the vacuum leg is 0.01 kcal mol-1. Summed in quadrature, this gives an error of 0.022 kcal mol-1 on the hydration free energy. This gives a final result of -6.279 ± 0.022 kcal mol-1, which is in good agreement with published results from other codes which are in the range of -5.99 kcal mol-1 to -6.26 kcal mol-1.