Figure 1.
Overlapping fragment sets cover the query sequence.
For each position in a query sequence (2gb1 in this example) there is a distinct set of 200 3-mer and a set of 200 9-mer fragments. This implies that the internal degrees of freedom Φ, Ψ, ω for any residue are restricted to a set of 2400 combinations. For three example positions, Φ, Ψ pairs are plotted on a Ramachandran map: THR located in a strand, LYS located in a helix and ASP located in a loop. Each red dot represents a Φ, Ψ pair from a fragment (2400 dots in each plot). Blue background in the maps shows the region allowed for the given amino acid type, computed from a non-redundant PDB subset with the BioShell package [10], [11].
Figure 2.
Overview of the fragment picking process.
In the first step the program reads in a structure database file and creates the scoring system. During the second, iterative stage, each possible fragment i.e. a local match between a query sequence and a structure from the database is scored and sent to a candidates collector. In the last stage selector object picks the final fragment set based on the candidates gathered by the collector.
Figure 3.
Organization of the structural database (nicknamed as vall).
The database is divided into chunks and each chunk is composed of residues.
Figure 4.
UML diagram showing the relations between score types.
For the sake of clarity, only the base classes and the most commonly used score types are shown.
Table 1.
Most common score types for fragment assessment.
Figure 5.
Quota example: the number of fragments assigned to each quota pool based on actual prediction for the sequence of Ubiquitin, residue 39.
There are nine pools, based on three secondary structure predictors (PsiPred, Jufo and SAM) predicting the three secondary structure types: helical (H, purple), coil (C, gray) and extended (E, blue). The order of columns with predicted probabilities is: C, E, H. Notice, that SAM predicted coil while PsiPred and Jufo a helix. While PsiPred's prediction however says “C with E possible”, Jufo gives a slight chance to H. The ninth pool (E for SAM) has size 0 in this case.
Figure 6.
Phi/Psi distributions of picked fragments using different query sequences and structures.
Each row represents a different target loop structure. Each column is a different method for deriving the fragment sets: poly alanine, poly valine, and a structure-specific sequence. Encircled crosses in each figure show the phi-psi of the prepared fragment (input structure). Each square represents a phi/psi bin, where the color reflects the number of phi/psi values for the middle residue of the selected fragments. The fragment distribution picked using a specific sequence is variable and has density most consistent with the input backbone structure (encircled crosses).
Figure 7.
Ab-initio benchmark: each symbol corresponds to a single protein target, for which an ab-initio structure prediction has been run with the reference and the new fragments (X an Y axis, respectively).
Red (green) points denote targets for the new algorithm yields worse (better) results. For targets marked by blue symbols no significant difference has been observed.