Indexing Segments¶
Segments are collections of atoms, sometimes non-contiguously collections
of atoms. They typically represent a user-defined segment within a molecule
or protein. An atom can only belong to one segment at a time. Atoms do
not need to be assigned to segments. Segments are implemented via the
Segment
class, which is itself a molecular container
for Atom
objects.
You can access the segments in a molecule container using the
segment()
and segments()
functions,
which are available on all of the molecular container types.
Not many molecules have named segments, so first let’s load a molecule that does.
>>> mols = sr.load(sr.expand(sr.tutorial_url, "alanin.psf"))
Downloading from 'https://sire.openbiosim.org/m/alanin.psf'...
Unzipping './alanin.psf.bz2'...
>>> mol = mols[0]
This molecule contains only a single segment called “MAIN”.
>>> print(mol.segments())
Selector<SireMol::Segment>( size=1
0: Segment( MAIN num_atoms=66 )
)
>>> print(mol.segment(0))
Segment( MAIN num_atoms=66 )
>>> print(mol.segment("MAIN"))
Segment( MAIN num_atoms=66 )
Search for segments¶
You can search for segments using their name (segname
) or their
index (segidx
).
>>> print(mol.segments("segname MAIN"))
Selector<SireMol::Segment>( size=1
0: Segment( MAIN num_atoms=66 )
)
>>> print(mol.segment("segidx 0"))
Segment( MAIN num_atoms=66 )
Note
Unlike atoms and residues, segments do not have a number. They are identified only by their index in their parent molecule, or their name
You can do a segment search via the containers index operator too!
>>> print(mol["segname MAIN"])
Molecule( alanin:2 num_atoms=66 num_residues=12 )
Note
Sire will automatically convert a result from a search string called via the index operator to the largest matching view. In this case, the single segment contains all of the atoms of the whole molecule. So Sire has converted the result up to the whole molecule view.
You can combine the search string with chain, residue and/or atom search terms too.
>>> print(mol["segname MAIN and atomname C"])
Selector<SireMol::Atom>( size=11
0: Atom( C:2 )
1: Atom( C:8 )
2: Atom( C:14 )
3: Atom( C:20 )
4: Atom( C:26 )
...
6: Atom( C:38 )
7: Atom( C:44 )
8: Atom( C:50 )
9: Atom( C:56 )
10: Atom( C:62 )
)
>>> print(mol["segname MAIN and resname ACE"])
Residue( ACE:1 num_atoms=3 )
As for other types, you can search for multiple segment names using a comma, and can do wildcard (glob) searching too!
>>> print(mol.segment("segname /M*/"))
Segment( MAIN num_atoms=66 )
Finding the atoms in a segment¶
Because both Segment
and Selector_Segment_
are molecular containers, they also have their own
atom()
and atoms()
functions,
which behave as you would expect.
>>> print(mol["segname MAIN"].atoms("C"))
Selector<SireMol::Atom>( size=11
0: Atom( C:2 )
1: Atom( C:8 )
2: Atom( C:14 )
3: Atom( C:20 )
4: Atom( C:26 )
...
6: Atom( C:38 )
7: Atom( C:44 )
8: Atom( C:50 )
9: Atom( C:56 )
10: Atom( C:62 )
)
You can also use atoms in
, chains in
or residues in
to get the
atoms, residues or chains in a segment.
>>> print(mol["residues in segname MAIN"])
Selector<SireMol::Residue>( size=12
0: Residue( ACE:1 num_atoms=3 )
1: Residue( ALA:2 num_atoms=6 )
2: Residue( ALA:3 num_atoms=6 )
3: Residue( ALA:4 num_atoms=6 )
4: Residue( ALA:5 num_atoms=6 )
...
7: Residue( ALA:8 num_atoms=6 )
8: Residue( ALA:9 num_atoms=6 )
9: Residue( ALA:10 num_atoms=6 )
10: Residue( ALA:11 num_atoms=6 )
11: Residue( CBX:12 num_atoms=3 )
)
>>> print(mol["atoms in segname MAIN"])
Selector<SireMol::Atom>( size=66
0: Atom( CA:1 )
1: Atom( C:2 )
2: Atom( O:3 )
3: Atom( N:4 )
4: Atom( H:5 )
...
61: Atom( C:62 )
62: Atom( O:63 )
63: Atom( N:64 )
64: Atom( H:65 )
65: Atom( CA:66 )
)
A KeyError
will be raised if there are no residues or chains within
a segment, e.g.
>>> print(mol["chains within segname MAIN"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 print(mol["chains in segname MAIN"])
File ~/sire.app/lib/python3.8/site-packages/Sire/Mol/__init__.py:462, in __fixed__getitem__(obj, key)
458 elif type(key) is str:
459 # is this a search object - if so, then return whatever is
460 # most relevant from the search
461 try:
--> 462 return __from_select_result(obj.search(key))
463 except SyntaxError:
464 pass
KeyError: 'SireMol::missing_chain: This view does not contain any chains. (call Sire.Error.get_last_error_details() for more info)'
You can go to segments from atoms or residues using segments with
, e.g.
>>> print(mol["segments with atomname C"])
Molecule( 2.137 : num_atoms=66, num_residues=12 )
Finding the atoms, residues or chains in a segment¶
Like all molecular containers, you can find the contained atoms, residues or chains by calling the appropriate functions;
>>> print(mol["segname MAIN"].atoms())
Selector<SireMol::Atom>( size=66
0: Atom( CA:1 )
1: Atom( C:2 )
2: Atom( O:3 )
3: Atom( N:4 )
4: Atom( H:5 )
...
61: Atom( C:62 )
62: Atom( O:63 )
63: Atom( N:64 )
64: Atom( H:65 )
65: Atom( CA:66 )
)
>>> print(mol["segidx 0"].residues())
Selector<SireMol::Residue>( size=12
0: Residue( ACE:1 num_atoms=3 )
1: Residue( ALA:2 num_atoms=6 )
2: Residue( ALA:3 num_atoms=6 )
3: Residue( ALA:4 num_atoms=6 )
4: Residue( ALA:5 num_atoms=6 )
...
7: Residue( ALA:8 num_atoms=6 )
8: Residue( ALA:9 num_atoms=6 )
9: Residue( ALA:10 num_atoms=6 )
10: Residue( ALA:11 num_atoms=6 )
11: Residue( CBX:12 num_atoms=3 )
)
Uniquely identifying a segment¶
You uniquely identify a segment in a molecule using its segment index
(segidx
). You can get the index of a segment in a molecule by
calling its index()
function.
>>> print(mol.segment(0).index())
SegIdx(0)
Warning
Be careful indexing by segment index. This is the index of the segment that uniquely identifies it within its parent molecule. It is not the index of the segment in an arbitrary molecular container.
Segment identifying types¶
Another way to index segments is to use the segment identifying types, i.e.
SegName
and SegIdx
. The
easiest way to create these is by using the function
sire.segid()
.
Use strings to create SegName
objects,
>>> print(sr.segid("MAIN"))
SegName('MAIN')
>>> print(mol[sr.segid("MAIN")])
Segment( MAIN num_atoms=66 )
and integers to create SegIdx
objects.
>>> print(sr.segid(0))
SegIdx(0)
>>> print(mol[sr.segid(0)])
Segment( MAIN num_atoms=66 )
You can set both a name and an index by passing in two arguments.
>>> print(mol[sr.segid("MAIN", 0)])
Segment( MAIN num_atoms=66 )
>>> print(mol[sr.segid(name="MAIN", idx=0)])
Segment( MAIN num_atoms=66 )
Note
Sire will return the Segment from an index operator if a segment identifying type is used as the index. This is slightly different behaviour to how the search string operates. In practice though, all molecular container classes behave in the same way, so you will often not notice or need to know which molecular container class has been returned.
Iterating over segments¶
The Selector_Segment_
class is iterable, meaning that
it can be used in loops.
>>> for segment in mol.segments():
... print(segment)
Segment( MAIN num_atoms=66 )
This is particularly helpful when combined with loops over the atoms in a segment.
>>> for segment in mol.segments():
... for atom in segment.atoms("element carbon"):
... print(segment, atom.residue(), atom)
Segment( MAIN num_atoms=66 ) Residue( ACE:1 num_atoms=3 ) Atom( CA:1 )
Segment( MAIN num_atoms=66 ) Residue( ACE:1 num_atoms=3 ) Atom( C:2 )
Segment( MAIN num_atoms=66 ) Residue( ALA:2 num_atoms=6 ) Atom( CA:6 )
Segment( MAIN num_atoms=66 ) Residue( ALA:2 num_atoms=6 ) Atom( CB:7 )
Segment( MAIN num_atoms=66 ) Residue( ALA:2 num_atoms=6 ) Atom( C:8 )
Segment( MAIN num_atoms=66 ) Residue( ALA:3 num_atoms=6 ) Atom( CA:12 )
Segment( MAIN num_atoms=66 ) Residue( ALA:3 num_atoms=6 ) Atom( CB:13 )
...
Segment( MAIN num_atoms=66 ) Residue( ALA:11 num_atoms=6 ) Atom( C:62 )
Segment( MAIN num_atoms=66 ) Residue( CBX:12 num_atoms=3 ) Atom( CA:66 )
Finding all segment names¶
You can find the names of all segments using the names
function.
>>> print(mol.segments().names())
[SegName('MAIN')]