Running a Free Energy Perturbation Simulation¶

Now that you have your (possibly restrained) alchemical system, you can run a free energy perturbation simulation. There are many ways to do this, and the exact protocol will depend on what type of free energy you are calculating, what level of accuracy you require and what compute resource you have available. sire provides the building blocks to run a simulation in any way you like, but in this tutorial we will use the a very simple protocol. Take a look at BioSimSpace or the upcoming somd2 software if you are interested in higher-level interfaces to this functionality that automatically run more complex protocols.

At its simplest, a free energy perturbation simulation consists of a series of alchemical molecular dynamics simulations run at different values of the alchemical parameter, λ. Let’s start with the ethane to methanol merged molecule from the previous section.

>>> import sire as sr
>>> mols = sr.load(sr.expand(sr.tutorial_url, "merged_molecule.s3"))
>>> print(mols)
System( name=BioSimSpace_System num_molecules=4054 num_residues=4054 num_atoms=12167 )

And lets link the properties to the reference state, so that we start the simulation using the reference state coordinates.

>>> mols = sr.morph.link_to_reference(mols)

Next we will run through 21 evenly-space λ values from 0 to 1. We’ve picked 21 because it is a reasonable number that should over-sample the λ-coordinate.

For each λ-value, we will minimise the system, generate random velocities from a Boltzmann distribution for the simulation temperature, equilibrate the molecules for 2 ps, then run dynamics for 25 ps. We will calculate the energy of each simulation at each λ-value, plus at the neighbouring λ-values. This will allow us to calculate energy differences which we will later use to calculate the free energy difference.

We will calculate these energies every 0.1 ps, and will write them at the end of each block of dynamics via the EnergyTrajectory output which we will stream as a sire object.

This will let us subsequently calculate the free energy across λ using alchemlyb.

>>> for l in range(0, 105, 5):
...     # turn l into the lambda value by dividing by 100
...     lambda_value = l / 100.0
...     print(f"Simulating lambda={lambda_value:.2f}")
...     # minimise the system at this lambda value
...     min_mols = mols.minimisation(lambda_value=lambda_value).run().commit()
...     # create a dynamics object for the system
...     d = min_mols.dynamics(timestep="1fs", temperature="25oC",
...                           lambda_value=lambda_value)
...     # generate random velocities
...     d.randomise_velocities()
...     # equilibrate, not saving anything
...     d.run("2ps", save_frequency=0)
...     print("Equilibration complete")
...     print(d)
...     # get the values of lambda for neighbouring windows
...     lambda_windows = [lambda_value]
...     if lambda_value > 0:
...         lambda_windows.insert(0, (l-5)/100.0)
...     if lambda_value < 1:
...         lambda_windows.append((l+5)/100.0)
...     # run the dynamics, saving the energy every 0.1 ps
...     d.run("25ps", energy_frequency="0.1ps", frame_frequency=0,
...           lambda_windows=lambda_windows)
...     print("Dynamics complete")
...     print(d)
...     # stream the EnergyTrajectory to a sire save stream object
...     sr.stream.save(d.commit().energy_trajectory(),
...                    f"energy_{lambda_value:.2f}.s3")

This is a very simple protocol for a free energy simulation, with simulation times chosen so that it will run relatively quickly. For example, with a modern GPU this should run at about 60 nanoseconds of sampling per day, so take only 45 seconds or so for each of the 21 λ-windows.

Note

This is not enough sampling to calculate a properly converged free energy for most molecular systems. You would need to experiment with different equilibration and simulation times, or use a more sophisticated algorithm implemented via BioSimSpace or somd2.

Note

We set the forwards and backwards λ-values to (l-5)/100.0 and (l+5)/100.0 so that they exactly match the way we set the reference λ-value. This is because alchemlyb matches energies using a floating point representation of λ, so small numerical differences between, e.g. (l+5)/100.0 and lambda_value + 0.05 will lead to errors (as in this case, (l+5)/100.0 is exactly every 0.05, while lambda_value + 0.05 has numerical error, meaning some values are at, e.g. 0.15000000000000002).

At the end of the simulation, you should have 21 energy files, one for each λ-window. These are called energy_0.00.s3, energy_0.05.s3, …, energy_1.00.s3.

Processing Free Energy Data using alchemlyb¶

alchemlyb is a simple alchemistry python package for easily analysing the results of free energy simulations. They have excellent documentation on their website that you can use if you want to go into the details of how to calculate free energies.

Here, we will show a simple BAR analysis of the data that we have just generated. We can do this because we have calculated data which alchemlyb can convert into reduced potentials for each λ-window.

First, we need to import alchemlyb

>>> import alchemlyb

Note

If you see an error then you may need to install (or reinstall) alchemlyb. You can do this using conda e.g. conda install -c conda-forge alchemlyb.

Next, we will load all of the EnergyTrajectory objects for each λ-window, and will convert them into pandas DataFrames arranged into an alchemlyb-compatible format. We could do this manually by first loading all of the s3 files containing the EnergyTrajectory objects…

>>> import sire as sr
>>> from glob import glob
>>> dfs = []
>>> energy_files = glob("energy*.s3")
>>> energy_files.sort()
>>> for energy_file in energy_files:
...     dfs.append(sr.stream.load(energy_file).to_alchemlyb())

Note

If you wanted, you could have put the dataframes generated above directly into the dfs list here, and not saved them to disk via the .s3 files. However, this would risk you having to re-run all of the simulation if you wanted to change the analysis below.

Note

Be careful to load the DataFrames in λ-order. The glob function can return the files in a random order, hence why we need to sort this list. This sort only works because we have used a naming convention for the files that puts them in λ-order. They must be in the right order or else alchemlyb will calculate the free energy incorrectly (it uses the column-order rather than the λ-order when calculating free energies).

…then joining them together all of these DataFrames into a single DataFrame…

>>> import pandas as pd
>>> df = pd.concat(dfs)
>>> print(df)
                         0.00          0.05  0.10  0.15  0.20  0.25  0.30  0.35  ...  0.65  0.70  0.75  0.80  0.85  0.90          0.95          1.00
time fep-lambda                                                                  ...
2.1  0.0        -39300.750665 -39301.636585   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN           NaN           NaN
2.2  0.0        -39231.737759 -39232.504175   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN           NaN           NaN
2.3  0.0        -38982.663906 -38982.683430   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN           NaN           NaN
2.4  0.0        -38950.308505 -38950.357904   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN           NaN           NaN
2.5  0.0        -38732.275522 -38733.221193   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN           NaN           NaN
...                       ...           ...   ...   ...   ...   ...   ...   ...  ...   ...   ...   ...   ...   ...   ...           ...           ...
26.6 1.0                  NaN           NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN -37208.485229 -37208.880770
26.7 1.0                  NaN           NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN -37332.051194 -37332.476611
26.8 1.0                  NaN           NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN -37276.392733 -37276.609020
26.9 1.0                  NaN           NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN -37234.238097 -37234.603762
27.0 1.0                  NaN           NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN -37268.684798 -37269.319345

Note

Do not worry about the large number of NaN values. These just show that we have only calculated free energy differences along the diagonal of this DataFrame, i.e. only between the simulated and neighbouring λ-windows.

…or we can use the in-build sire.morph.to_alchemlyb() function to do all of the above for us.

>>> df = sr.morph.to_alchemlyb("energy*.s3")

This function is not only quicker, but it will automatically sort the data by λ-value, meaning that you don’t need to worry about the naming convention of your files.

Now we can tell alchemlyb to calculate the free energy using the BAR method.

>>> from alchemlyb.estimators import BAR
>>> b = BAR()
>>> b.fit(df)
>>> print(b.delta_f_.loc[0.00, 1.00])
-3.009332126465953

You can get a convergence plot, showing how the free energy changes as a function of the simulation length using the convergence_plot function.

>>> from alchemlyb.convergence import forward_backward_convergence
>>> from alchemlyb.visualisation import plot_convergence
>>> f = forward_backward_convergence(dfs, "bar")
>>> plot_convergence(f)

$Convergence of the free energy estimate as a function of the fraction of simulation length$

All of this shows that the relative free energy for the perturbation of ethane to methanol in water is about -3.0 kcal mol-1.

To get the relative hydration free energy, we would need to complete the cycle by calculating the relative free energy for the perturbation in the gas phase. We could do this using this code (which is almost identical to above, except we only simulate the perturbable molecule, and save the EnergyTrajectory objects to energy_gas_{lambda}.s3 instead of energy_{lambda}.s3).

>>> import sire as sr
>>> mols = sr.load(sr.expand(sr.tutorial_url, "merged_molecule.s3"))
>>> mol = mols.molecule("molecule property is_perturbable")
>>> mol = sr.morph.link_to_reference(mol)
>>> for l in range(0, 105, 5):
...     # turn l into the lambda value by dividing by 100
...     lambda_value = l / 100.0
...     print(f"Simulating lambda={lambda_value:.2f}")
...     # minimise the system at this lambda value
...     min_mol = mol.minimisation(lambda_value=lambda_value,
...                                vacuum=True).run().commit(return_as_system=True)
...     # create a dynamics object for the system
...     d = min_mol.dynamics(timestep="1fs", temperature="25oC",
...                          lambda_value=lambda_value,
...                          vacuum=True)
...     # generate random velocities
...     d.randomise_velocities()
...     # equilibrate, not saving anything
...     d.run("2ps", save_frequency=0)
...     print("Equilibration complete")
...     print(d)
...     # get the values of lambda for neighbouring windows
...     lambda_windows = [lambda_value]
...     if lambda_value > 0:
...         lambda_windows.insert(0, (l-5)/100.0)
...     if lambda_value < 1:
...         lambda_windows.append((l+5)/100.0)
...     # run the dynamics, saving the energy every 0.1 ps
...     d.run("25ps", energy_frequency="0.1ps", frame_frequency=0,
...           lambda_windows=lambda_windows)
...     print("Dynamics complete")
...     print(d)
...     # stream the EnergyTrajectory to a sire save stream object
...     sr.stream.save(d.commit().energy_trajectory(),
...                    f"energy_gas_{lambda_value:.2f}.s3")

Note

The option vacuum=True tells the minimisation and dynamics to remove any simulation space that may be attached to the molecule(s), and instead set the space to a Cartesian space. This has the affect of simulating the molecules in vacuum.

Note

The option return_as_system tells the minimisation’s commit function to return the result as a System object, rather than a molecule. This way, min_mol is a System, and so you can access the energy trajectory via the energy_trajectory() function.

This should run more quickly than the simulation in water, e.g. about 15 seconds per window (at about 150 nanoseconds per day of sampling).

We can then analyse the results using the same analysis code, except we switch to analysing the energy_gas_{lambda}.s3 files instead.

>>> df = sr.morph.to_alchemlyb("energy_gas*.s3")
>>> print(df)
                     0.00      0.05  0.10  0.15  0.20  0.25  0.30  0.35  0.40  ...  0.60  0.65  0.70  0.75  0.80  0.85  0.90      0.95      1.00
time fep-lambda                                                                ...
2.1  0.0         6.966376  7.115377   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN       NaN       NaN
2.2  0.0         6.614100  6.434525   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN       NaN       NaN
2.3  0.0         6.782235  6.831925   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN       NaN       NaN
2.4  0.0         4.196341  4.227257   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN       NaN       NaN
2.5  0.0         4.141862  4.205784   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN       NaN       NaN
...                   ...       ...   ...   ...   ...   ...   ...   ...   ...  ...   ...   ...   ...   ...   ...   ...   ...       ...       ...
26.6 1.0              NaN       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN  8.246601  8.707174
26.7 1.0              NaN       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN  7.169596  7.100342
26.8 1.0              NaN       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN  6.422485  6.466972
26.9 1.0              NaN       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN  5.892950  6.129184
27.0 1.0              NaN       NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN  7.487893  7.618827
>>> from alchemlyb.estimators import BAR
>>> b = BAR()
>>> b.fit(df)
>>> print(b.delta_f_.loc[0.00, 1.00])
3.186998316374265

This shows that the relative free energy for the perturbation of ethane to methanol in the gas phase is about 3.2 kcal mol-1. Subtracting this from the free energy in water gives a relative hydration free energy of about -6.2 kcal mol-1, which is in reasonable agreement with published results from other codes which are in the range of -5.99 kcal mol-1 to -6.26 kcal mol-1.

Note

The quoted published results are the difference in computed absolute hydration free energies calculated using difference codes and protocols for ethane and methanol, as reported in table 2 of the linked paper.

Note

There will be some variation between different codes and different protocols, as the convergence of the free energy estimate is sensitive to the length of the dynamics simulation at each λ-value. In this case, we used very short simulations.