Indexing Residues#

Residues are collections of atoms. They typically represent an amino acid residue in a protein. Residues are implemented via the Residue class, which itself is a molecular container for Atom objects. An atom can only belong to one residue at a time (and they don’t need to be assigned to a residue).

You can access residues in a molecule container using the residue() and residues() functions, which are available on all of the molecular container types.

>>> print(mol.residue(0))
Residue( ILE:6   num_atoms=8 )

gives the molecule at index 0, while

>>> print(mol.residues("ALA"))
Selector<SireMol::Residue>( size=155
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:30  num_atoms=5 )
2:  Residue( ALA:53  num_atoms=5 )
3:  Residue( ALA:65  num_atoms=5 )
4:  Residue( ALA:85  num_atoms=5 )
...
150:  Residue( ALA:578 num_atoms=5 )
151:  Residue( ALA:584 num_atoms=5 )
152:  Residue( ALA:593 num_atoms=5 )
153:  Residue( ALA:646 num_atoms=5 )
154:  Residue( ALA:691 num_atoms=5 )
)

returns all residues that are named “ALA”.

The residue() will raise a KeyError if more than one residue matches the search.

>>> print(mol.residue("ALA"))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [33], in <cell line: 1>()
----> 1 print(mol.residue("ALA"))

KeyError: "SireMol::duplicate_residue: More than one residue matches the ID ResName('ALA') (number of matches is 155). (call Sire.Error.get_last_error_details() for more info)"

You can slice residues using the range function, e.g.

>>> print(mol.residues(range(0, 10)))
Selector<SireMol::Residue>( size=10
0:  Residue( ILE:6   num_atoms=8 )
1:  Residue( VAL:7   num_atoms=7 )
2:  Residue( LEU:8   num_atoms=8 )
3:  Residue( LYS:9   num_atoms=9 )
4:  Residue( SER:10  num_atoms=6 )
5:  Residue( SER:11  num_atoms=6 )
6:  Residue( ASP:12  num_atoms=8 )
7:  Residue( GLY:13  num_atoms=4 )
8:  Residue( VAL:22  num_atoms=7 )
9:  Residue( ALA:23  num_atoms=5 )
)

The result, a Selector_Residue_ is also a molecular container, and can be used like Selector_Atom_.

>>> print(mol.residues("ALA")[0:5])
Selector<SireMol::Residue>( size=5
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:30  num_atoms=5 )
2:  Residue( ALA:53  num_atoms=5 )
3:  Residue( ALA:65  num_atoms=5 )
4:  Residue( ALA:85  num_atoms=5 )
)

gives the first 5 residues named “ALA”.

Searching for residues#

You can also search for residues, using their name (resname), their number (resnum) and/or their index in their parent molecule (residx).

>>> print(mol.residues("resnum 5"))
Selector<SireMol::Residue>( size=2
0:  Residue( GLU:5   num_atoms=9 )
1:  Residue( GLU:5   num_atoms=9 )
)

Note

There are two residues with number 5 as there are multiple chains in this protein. Note also how the residue’s name (GLU) and number (5) are printed in its output.

You can use the residue search string in a molecular container’s index operator too!

>>> print(mol["resnum 5"])
Selector<SireMol::Residue>( size=2
0:  Residue( GLU:5   num_atoms=9 )
1:  Residue( GLU:5   num_atoms=9 )
)

and you can combine it with atom identifiers, e.g.

>>> print(mol["resname ALA and atomname CA"])
Selector<SireMol::Atom>( size=155
0:  Atom( CA:65   [ -54.77,   13.35,   37.26] )
1:  Atom( CA:117  [ -62.33,   13.58,   32.15] )
2:  Atom( CA:204  [ -45.04,    6.02,   36.66] )
3:  Atom( CA:306  [ -47.63,   28.39,   36.61] )
4:  Atom( CA:352  [ -34.57,   20.94,   29.60] )
...
150:  Atom( CA:10774 [  -4.40,    7.58,   14.84] )
151:  Atom( CA:10816 [  -1.17,    9.47,   25.09] )
152:  Atom( CA:10886 [   9.70,  -11.41,   19.28] )
153:  Atom( CA:11247 [  14.11,    2.16,   14.69] )
154:  Atom( CA:11624 [  22.43,   -6.30,   32.21] )
)

You can also search for multiple residue names or numbers.

>>> print(mol["resname ALA, ARG"])
Selector<SireMol::Residue>( size=255
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:30  num_atoms=5 )
2:  Residue( ALA:53  num_atoms=5 )
3:  Residue( ARG:61  num_atoms=11 )
4:  Residue( ALA:65  num_atoms=5 )
...
250:  Residue( ARG:652 num_atoms=11 )
251:  Residue( ARG:657 num_atoms=11 )
252:  Residue( ARG:680 num_atoms=11 )
253:  Residue( ARG:685 num_atoms=11 )
254:  Residue( ALA:691 num_atoms=5 )
)
>>> print(mol["resnum 5, 7, 9"])
Selector<SireMol::Residue>( size=10
0:  Residue( VAL:7   num_atoms=7 )
1:  Residue( LYS:9   num_atoms=9 )
2:  Residue( GLU:5   num_atoms=9 )
3:  Residue( VAL:7   num_atoms=7 )
4:  Residue( GLU:9   num_atoms=9 )
5:  Residue( VAL:7   num_atoms=7 )
6:  Residue( LYS:9   num_atoms=9 )
7:  Residue( GLU:5   num_atoms=9 )
8:  Residue( VAL:7   num_atoms=7 )
9:  Residue( GLU:9   num_atoms=9 )
)
>>> print(mol["resnum 201:205"])
Selector<SireMol::Residue>( size=9
0:  Residue( LEU:201 num_atoms=8 )
1:  Residue( ARG:202 num_atoms=11 )
2:  Residue( GLU:203 num_atoms=9 )
3:  Residue( LEU:204 num_atoms=8 )
4:  Residue( LEU:201 num_atoms=8 )
5:  Residue( ARG:202 num_atoms=11 )
6:  Residue( GLU:203 num_atoms=9 )
7:  Residue( LEU:204 num_atoms=8 )
8:  Residue( PEG:201 num_atoms=7 )
)

Wildcard (glob) searching is also supported for residue names.

>>> print(mol["resname /ala/i"])
Selector<SireMol::Residue>( size=155
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:30  num_atoms=5 )
2:  Residue( ALA:53  num_atoms=5 )
3:  Residue( ALA:65  num_atoms=5 )
4:  Residue( ALA:85  num_atoms=5 )
...
150:  Residue( ALA:578 num_atoms=5 )
151:  Residue( ALA:584 num_atoms=5 )
152:  Residue( ALA:593 num_atoms=5 )
153:  Residue( ALA:646 num_atoms=5 )
154:  Residue( ALA:691 num_atoms=5 )
)
>>> print(mol["resname /HI?/"])
Selector<SireMol::Residue>( size=42
0:  Residue( HIS:62  num_atoms=10 )
1:  Residue( HIS:27  num_atoms=10 )
2:  Residue( HIS:39  num_atoms=10 )
3:  Residue( HIS:75  num_atoms=10 )
4:  Residue( HIS:84  num_atoms=10 )
...
37:  Residue( HIS:638 num_atoms=10 )
38:  Residue( HIS:639 num_atoms=10 )
39:  Residue( HIS:662 num_atoms=10 )
40:  Residue( HIS:666 num_atoms=10 )
41:  Residue( HIS:668 num_atoms=10 )
)

This last search is particularly useful for proteins, as it is common for histidine residues to have different names depending on protonation state (e.g. “HIS”, “HIP”, “HIE” or “HID”).

Finding the atoms in a residue#

Because both Residue and Selector_Residue_ are molecular containers, they also have their own atom() and atoms() functions, which behave as you would expect.

>>> print(mol["resname ALA"].atoms("CA"))
Selector<SireMol::Atom>( size=155
0:  Atom( CA:65   [ -54.77,   13.35,   37.26] )
1:  Atom( CA:117  [ -62.33,   13.58,   32.15] )
2:  Atom( CA:204  [ -45.04,    6.02,   36.66] )
3:  Atom( CA:306  [ -47.63,   28.39,   36.61] )
4:  Atom( CA:352  [ -34.57,   20.94,   29.60] )
...
150:  Atom( CA:10774 [  -4.40,    7.58,   14.84] )
151:  Atom( CA:10816 [  -1.17,    9.47,   25.09] )
152:  Atom( CA:10886 [   9.70,  -11.41,   19.28] )
153:  Atom( CA:11247 [  14.11,    2.16,   14.69] )
154:  Atom( CA:11624 [  22.43,   -6.30,   32.21] )
)

You can get all of the atoms in a residue by calling the atoms() function without any arguments.

>>> mol["residx 0"].atoms()
Selector<SireMol::Atom>( size=8
0:  Atom( N:1     [ -54.07,   11.27,   41.93] )
1:  Atom( CA:2    [ -55.43,   11.35,   42.54] )
2:  Atom( C:3     [ -56.06,    9.95,   42.55] )
3:  Atom( O:4     [ -57.04,    9.73,   41.82] )
4:  Atom( CB:5    [ -56.32,   12.33,   41.76] )
5:  Atom( CG1:6   [ -55.68,   13.72,   41.72] )
6:  Atom( CG2:7   [ -57.70,   12.40,   42.39] )
7:  Atom( CD1:8   [ -55.42,   14.31,   43.09] )
)

Another route is to use the atoms in phrase in a search string, e.g.

>>> print(mol["atoms in resname ALA"])
Selector<SireMol::Atom>( size=775
0:  Atom( N:64    [ -54.11,   14.36,   38.13] )
1:  Atom( CA:65   [ -54.77,   13.35,   37.26] )
2:  Atom( C:66    [ -55.92,   14.01,   36.49] )
3:  Atom( O:67    [ -57.09,   13.65,   36.74] )
4:  Atom( CB:68   [ -55.25,   12.19,   38.09] )
...
770:  Atom( N:11623 [  22.09,   -7.64,   32.65] )
771:  Atom( CA:11624 [  22.43,   -6.30,   32.21] )
772:  Atom( C:11625 [  23.84,   -6.28,   31.63] )
773:  Atom( O:11626 [  24.72,   -7.01,   32.08] )
774:  Atom( CB:11627 [  22.32,   -5.30,   33.36] )
)

This has returned all of the atoms in residues that are called “ALA”.

You can get the residues that match atoms using residues with, e.g.

>>> print(mol["residues with atomname CA"])
Selector<SireMol::Residue>( size=1494
0:  Residue( ILE:6   num_atoms=8 )
1:  Residue( VAL:7   num_atoms=7 )
2:  Residue( LEU:8   num_atoms=8 )
3:  Residue( LYS:9   num_atoms=9 )
4:  Residue( SER:10  num_atoms=6 )
...
1489:  Residue( ALA:691 num_atoms=5 )
1490:  Residue( PRO:692 num_atoms=7 )
1491:  Residue( GLU:693 num_atoms=9 )
1492:  Residue( ASN:694 num_atoms=8 )
1493:  Residue( ASP:695 num_atoms=8 )
)

This has returned all of the residues that contain an atom called “CA”.

Another way to do this would be to call the residues() function on the molecular container, e.g.

>>> print(mol["CA"].residues())
Selector<SireMol::Residue>( size=1494
0:  Residue( ILE:6   num_atoms=8 )
1:  Residue( VAL:7   num_atoms=7 )
2:  Residue( LEU:8   num_atoms=8 )
3:  Residue( LYS:9   num_atoms=9 )
4:  Residue( SER:10  num_atoms=6 )
...
1489:  Residue( ALA:691 num_atoms=5 )
1490:  Residue( PRO:692 num_atoms=7 )
1491:  Residue( GLU:693 num_atoms=9 )
1492:  Residue( ASN:694 num_atoms=8 )
1493:  Residue( ASP:695 num_atoms=8 )
)

Uniquely identifying a residue#

You uniquely identify a residue in a molecule using its residue index (residx). You can get the index of a residue in a molecule by calling its index() function.

>>> print(mol.residue(0).index())
ResIdx(0)

Warning

Be careful indexing by residue index. This is the index of the residue that uniquely identifies it within its parent molecule. It is not the index of the residue in an arbitrary molecular container.

Residue identifying types#

Another way to index residues is to use the residue indexing types, i.e. ResIdx, ResName and ResNum. The easiest way to create these is by using the function sire.resid().

>>> print(mol[sr.resid("ALA")])
Selector<SireMol::Residue>( size=155
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:30  num_atoms=5 )
2:  Residue( ALA:53  num_atoms=5 )
3:  Residue( ALA:65  num_atoms=5 )
4:  Residue( ALA:85  num_atoms=5 )
...
150:  Residue( ALA:578 num_atoms=5 )
151:  Residue( ALA:584 num_atoms=5 )
152:  Residue( ALA:593 num_atoms=5 )
153:  Residue( ALA:646 num_atoms=5 )
154:  Residue( ALA:691 num_atoms=5 )
)

This returns the residues called “ALA”, as sr.resid("ALA") has created an ResName object.

>>> print(sr.resid("ALA"))
ResName('ALA')

This function will create an ResNum if it is passed an integer, e.g.

>>> print(sr.resid(5))
ResNum(5)
>>> print(mol[sr.resid(5)])
Selector<SireMol::Residue>( size=2
0:  Residue( GLU:5   num_atoms=9 )
1:  Residue( GLU:5   num_atoms=9 )
)

You can set both a name and a number by passing in two arguments, e.g.

>>> print(mol[sr.resid("ALA", 23)])
Selector<SireMol::Residue>( size=2
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:23  num_atoms=5 )
)
>>> print(mol[sr.resid(name="ALA", num=23)])
Selector<SireMol::Residue>( size=2
0:  Residue( ALA:23  num_atoms=5 )
1:  Residue( ALA:23  num_atoms=5 )
)

Iterating over residues#

The Selector_Residue_ class is iterable, meaning that it can be used in loops.

>>> for res in mol["resname ALA and resnum < 30"]:
...     print(res)
Residue( ALA:23  num_atoms=5 )
Residue( ALA:16  num_atoms=5 )
Residue( ALA:21  num_atoms=5 )
Residue( ALA:23  num_atoms=5 )
Residue( ALA:16  num_atoms=5 )

This is particulary useful when combined with looping over the atoms in the residues.

>>> for res in mol["residx < 3"]:
...     for atom in res["atomname C, CA"]:
...         print(res, atom)
Residue( ILE:6   num_atoms=8 ) Atom( CA:2    [ -55.43,   11.35,   42.54] )
Residue( ILE:6   num_atoms=8 ) Atom( C:3     [ -56.06,    9.95,   42.55] )
Residue( VAL:7   num_atoms=7 ) Atom( CA:10   [ -56.02,    7.64,   43.47] )
Residue( VAL:7   num_atoms=7 ) Atom( C:11    [ -56.14,    7.05,   42.06] )
Residue( LEU:8   num_atoms=8 ) Atom( CA:17   [ -54.99,    6.39,   39.98] )
Residue( LEU:8   num_atoms=8 ) Atom( C:18    [ -54.61,    4.90,   40.03] )

Counting residues#

Similar to how you did for atom, you can find the set of residue names via

>>> print(set(mol.residues().names()))
{ResName('ALA'),
 ResName('ARG'),
 ResName('ASN'),
 ResName('ASP'),
 ResName('CIT'),
 ResName('CYS'),
 ResName('GLN'),
 ResName('GLU'),
 ResName('GLY'),
 ResName('HIS'),
 ResName('HOH'),
 ResName('ILE'),
 ResName('LEU'),
 ResName('LYS'),
 ResName('MET'),
 ResName('PEG'),
 ResName('PHE'),
 ResName('PRO'),
 ResName('SER'),
 ResName('THR'),
 ResName('TRP'),
 ResName('TYR'),
 ResName('VAL')}

And you can count how many of each residue using;

>>> for name in set(mol.residues().names()):
...     print(name, len(mol.residues(name)))
ResName('VAL') 74
ResName('ILE') 64
ResName('GLN') 32
ResName('PRO') 90
ResName('GLU') 107
ResName('TRP') 24
ResName('GLY') 68
ResName('CYS') 48
ResName('HOH') 18
ResName('CIT') 2
ResName('ARG') 100
ResName('MET') 20
ResName('SER') 102
ResName('PHE') 64
ResName('ASN') 38
ResName('THR') 88
ResName('ASP') 84
ResName('LYS') 46
ResName('TYR') 22
ResName('HIS') 42
ResName('PEG') 4
ResName('ALA') 155
ResName('LEU') 226

This can be a convenient way of finding the residue names of different ligands or cofactors that are bound to the molecule.

You could do a similar thing for residue numbers, e.g.

>>> for number in set(mol.residues().numbers()):
...     print(number, len(mol.residues(number)))
ResNum(5) 2
ResNum(6) 4
ResNum(7) 4
ResNum(8) 4
ResNum(9) 4
ResNum(10) 4
ResNum(11) 4
ResNum(12) 4
ResNum(13) 4
ResNum(14) 2
...