Residue mutations¶
So far, perturbations have involved changing an entire molecule. However, we can also change individual residues in a molecule. This can be useful to, e.g. study the effect of residue mutation on a protein, and how this could impact ligand binding or protein folding.
Sub-structure matching¶
To start, we need to define the sub-structure that we want to change.
Let’s first load the kigaki protein system.
>>> import sire as sr
>>> mols = sr.load_test_files("kigaki.gro", "kigaki.top")
>>> protein = mols[0]
>>> print(protein)
Molecule( Protein:6 num_atoms=302 num_residues=19 )
>>> print(protein.residues().names())
[ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('LYS'), ResName('ILE'),
ResName('GLY'), ResName('ALA'), ResName('LYS'), ResName('ILE'),
ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('NH2')]
To start with, let’s mutate the first alanine residue that we find in the protein into a lysine. First, let’s select the first alanine…
>>> ala = protein["resname ALA"][0]
>>> print(ala)
Residue( ALA:4 num_atoms=10 )
To mutate this into a lysine, we need to tell sire
what a
lysine looks like. Fortunately, our protein already contains lysine
residues, so let’s select on of those…
>>> lys = protein["resname LYS"][1]
>>> print(lys)
Residue( LYS:5 num_atoms=22 )
Note
We have selected the second lysine residue in the protein, as the first lysine residue is the N-terminal residue. Since our alanine is a mid-chain residue, it makes more sense to use a mid-chain lysine.
Next, we need to match the alanine residue to the lysine residue,
just as we did when perturbing entire molecules. Calling
sire.morph.match()
on sub-views of molecules will only match
the atoms in those views.
>>> mapping = sr.morph.match(ala, lys, match_light_atoms=True)
>>> print(mapping)
AtomMapping( size=9, unmapped0=1, unmapped1=13
0: MolNum(6) Atom( HA:54 ) <=> MolNum(6) Atom( HA:64 )
1: MolNum(6) Atom( CA:53 ) <=> MolNum(6) Atom( CA:63 )
2: MolNum(6) Atom( O:60 ) <=> MolNum(6) Atom( O:82 )
3: MolNum(6) Atom( C:59 ) <=> MolNum(6) Atom( C:81 )
4: MolNum(6) Atom( HB2:57 ) <=> MolNum(6) Atom( HB2:67 )
5: MolNum(6) Atom( H:52 ) <=> MolNum(6) Atom( H:62 )
6: MolNum(6) Atom( N:51 ) <=> MolNum(6) Atom( N:61 )
7: MolNum(6) Atom( HB1:56 ) <=> MolNum(6) Atom( HB1:66 )
8: MolNum(6) Atom( CB:55 ) <=> MolNum(6) Atom( CB:65 )
)
Warning
This used the default MCS matching algorithm built into sire
.
This is not supported on Windows. You may want to use a different
matching algorithm, e.g. using
Kartograf
We can see that this mapping has nicely matched the atoms in alanine to the corresponding atoms in lysine.
Next, we will align the lysine in the mapping onto the alanine.
>>> mapping = mapping.align()
And finally, we will create the merged molecule.
>>> merged = mapping.merge(as_new_molecule=False)
>>> print(merged)
Molecule( Protein:6 num_atoms=315 num_residues=19 )
This has created the merged molecule as before. To see what has changed, we can check the perturbation…
>>> p_omm = merged.perturbation().to_openmm()
>>> print(p_omm.changed_atoms())
atom charge0 charge1 sigma0 sigma1 epsilon0 epsilon1 alpha0 alpha1 kappa0 kappa1
0 N:51 -0.4157 -0.3479 0.325000 0.325000 0.711280 0.711280 0.0 0.0 0.0 0.0
1 H:52 0.2719 0.2747 0.106908 0.106908 0.065689 0.065689 0.0 0.0 0.0 0.0
2 CA:53 0.0337 -0.2400 0.339967 0.339967 0.457730 0.457730 0.0 0.0 0.0 0.0
3 HA:54 0.0823 0.1426 0.247135 0.247135 0.065689 0.065689 0.0 0.0 0.0 0.0
4 CB:55 -0.1825 -0.0094 0.339967 0.339967 0.457730 0.457730 0.0 0.0 0.0 0.0
5 HB1:56 0.0603 0.0362 0.264953 0.264953 0.065689 0.065689 0.0 0.0 0.0 0.0
6 HB2:57 0.0603 0.0362 0.264953 0.264953 0.065689 0.065689 0.0 0.0 0.0 0.0
7 HB3:58 0.0603 0.0000 0.264953 0.264953 0.065689 0.000000 0.0 1.0 1.0 1.0
8 C:59 0.5973 0.7341 0.339967 0.339967 0.359824 0.359824 0.0 0.0 0.0 0.0
9 O:60 -0.5679 -0.5894 0.295992 0.295992 0.878640 0.878640 0.0 0.0 0.0 0.0
10 Xxx:303 0.0000 0.0187 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
11 Xxx:304 0.0000 0.0103 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
12 Xxx:305 0.0000 0.0103 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
13 Xxx:306 0.0000 -0.0479 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
14 Xxx:307 0.0000 0.0621 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
15 Xxx:308 0.0000 0.0621 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
16 Xxx:309 0.0000 -0.0143 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
17 Xxx:310 0.0000 0.1135 0.195998 0.195998 0.000000 0.065689 1.0 0.0 1.0 1.0
18 Xxx:311 0.0000 0.1135 0.195998 0.195998 0.000000 0.065689 1.0 0.0 1.0 1.0
19 Xxx:312 0.0000 -0.3854 0.325000 0.325000 0.000000 0.711280 1.0 0.0 1.0 1.0
20 Xxx:313 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
21 Xxx:314 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
22 Xxx:315 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
This shows how the atoms in alanine have been perturbed into the equivalent atoms in lysine, plus how a number of ghost atoms have been added that correspond to the additional atoms in lysine. In addition, the HB3 atom of alanine is turned into a ghost.
One-stop function¶
Just as before, the entire set of steps above can be performed in one
step via the sire.morph.merge()
function.
>>> merged = sr.morph.merge(ala, lys)
>>> print(merged)
Molecule( Protein:3627 num_atoms=315 num_residues=19 )
>>> p_omm = merged.perturbation().to_openmm()
>>> print(p_omm.changed_atoms())
atom charge0 charge1 sigma0 sigma1 epsilon0 epsilon1 alpha0 alpha1 kappa0 kappa1
0 N:51 -0.4157 -0.3479 0.325000 0.325000 0.711280 0.711280 0.0 0.0 0.0 0.0
1 H:52 0.2719 0.2747 0.106908 0.106908 0.065689 0.065689 0.0 0.0 0.0 0.0
2 CA:53 0.0337 -0.2400 0.339967 0.339967 0.457730 0.457730 0.0 0.0 0.0 0.0
3 HA:54 0.0823 0.1426 0.247135 0.247135 0.065689 0.065689 0.0 0.0 0.0 0.0
4 CB:55 -0.1825 -0.0094 0.339967 0.339967 0.457730 0.457730 0.0 0.0 0.0 0.0
5 HB1:56 0.0603 0.0362 0.264953 0.264953 0.065689 0.065689 0.0 0.0 0.0 0.0
6 HB2:57 0.0603 0.0362 0.264953 0.264953 0.065689 0.065689 0.0 0.0 0.0 0.0
7 HB3:58 0.0603 0.0000 0.264953 0.264953 0.065689 0.000000 0.0 1.0 1.0 1.0
8 C:59 0.5973 0.7341 0.339967 0.339967 0.359824 0.359824 0.0 0.0 0.0 0.0
9 O:60 -0.5679 -0.5894 0.295992 0.295992 0.878640 0.878640 0.0 0.0 0.0 0.0
10 Xxx:303 0.0000 0.0187 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
11 Xxx:304 0.0000 0.0103 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
12 Xxx:305 0.0000 0.0103 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
13 Xxx:306 0.0000 -0.0479 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
14 Xxx:307 0.0000 0.0621 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
15 Xxx:308 0.0000 0.0621 0.264953 0.264953 0.000000 0.065689 1.0 0.0 1.0 1.0
16 Xxx:309 0.0000 -0.0143 0.339967 0.339967 0.000000 0.457730 1.0 0.0 1.0 1.0
17 Xxx:310 0.0000 0.1135 0.195998 0.195998 0.000000 0.065689 1.0 0.0 1.0 1.0
18 Xxx:311 0.0000 0.1135 0.195998 0.195998 0.000000 0.065689 1.0 0.0 1.0 1.0
19 Xxx:312 0.0000 -0.3854 0.325000 0.325000 0.000000 0.711280 1.0 0.0 1.0 1.0
20 Xxx:313 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
21 Xxx:314 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
22 Xxx:315 0.0000 0.3400 0.106908 0.106908 0.000000 0.065689 1.0 0.0 1.0 1.0
This merged molecule can be used in a free energy simulation in the same way as any other merged molecule.
Protein Mutation¶
Just as for other merged molecules, we can extract the end states using
the sire.morph.extract_reference()
and
sire.morph.extract_perturbed()
functions.
>>> ref_prot = sr.morph.extract_reference(merged)
>>> pert_prot = sr.morph.extract_perturbed(merged)
>>> print(ref_prot.residues().names())
[ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('LYS'), ResName('ILE'),
ResName('GLY'), ResName('ALA'), ResName('LYS'), ResName('ILE'),
ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('NH2')]
>>> print(pert_prot.residues().names())
[ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('LYS'),
ResName('LYS'), ResName('ILE'), ResName('LYS'), ResName('ILE'),
ResName('GLY'), ResName('ALA'), ResName('LYS'), ResName('ILE'),
ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('NH2')]
This shows that the first alanine residue has been mutated into a lysine residue.
Interestingly, if you are only interested in mutation, and are not interested in the merged molecule, then mutation is just the process of performing a merge, and then extracting the perturbed end state.
To make this easier, there is a one-stop function for this,
sire.morph.mutate()
.
>>> mutated_protein = sr.morph.mutate(ala, lys)
>>> print(mutated_protein.residues().names())
[ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('LYS'),
ResName('LYS'), ResName('ILE'), ResName('LYS'), ResName('ILE'),
ResName('GLY'), ResName('ALA'), ResName('LYS'), ResName('ILE'),
ResName('LYS'), ResName('ILE'), ResName('GLY'), ResName('ALA'),
ResName('LYS'), ResName('ILE'), ResName('NH2')]
Warning
By default this uses the (potentially slow) internal MCS matching
algorithm (which also is not supported on Windows). You may want to
use a different matching algorithm, either by passing in the
dictionary of the mapping you want to use via the match
argument, or by using Kartograf.
The match
argument works identically in this function as it
does in the merge
and match
functions.
In this case, we copied and pasted the lysine residue from one part of the protein over the alanine. But, you can copy and paste in this way between different molecules. For example, you could have a template library of different residues that you could use for mutation.
Assuming template["lys"]
contained your template for a lysine residue,
then you could have run the following;
>>> mutated_protein = sr.morph.mutate(ala, template["lys"])
This works for any kind of molecules - not just proteins. In future
versions of sire
we will add functionality to make it easier to
manage libraries of templates, and to perform molecular editing by
copying and pasting between molecules, and between fragments constructed
via, e.g. smiles strings.