Figure 1.
(a) BN with two variables which constitutes the basic (single frame) template for an HMM, and (b) A DBN representation of an HMM obtained by concatenating a variable number of the BN frames and connecting successive state variables.
Figure 2.
Philius training and decoding graphical models.
(a) Training DBN: only the amino acid and the topoLabel are observed in each frame. The topoLabel is used to constrain the hidden state using an observed child node. The color of the edge between two nodes indicates the type of relationship: black is deterministic, and red is random. (b) First stage decoding DBN: the topoState is hidden and dependent on the state and the previous topoState, and specifies the behavior of pType, an additional hidden variable. (c) Second stage decoding DBN: the observed amino acid node and the duration modeling nodes have been removed, and Pr[topoStatei] is defined by the posterior probabilities computed in the first stage using the virtual evidence node topoVE.
Figure 3.
Each rectangle represents a state, which is characterized by an emission distribution and a duration distribution. The state transition topology of Philius exactly mimics that of Phobius.
Table 1.
Phobius and Philius protein type classification performance on the development set: for each protein class, the fraction of the dataset of that type, and the accuracy, precision, sensitivity, specificity, and Matthews correlation coefficient.
Figure 4.
Protein-type classification precision vs confidence score computed by sorting the proteins by score and computing the average score and precision within a sliding window.
Left: precision vs average score for each of the three main protein types. Right: average (black) and average ±one standard deviation (gray) across all proteins.
Table 2.
Segment-level metrics.
Figure 5.
Segment-level classification precision vs score for each of the segment types (excluding the ‘outside’ segments of G and SP+G proteins).
Table 3.
Confusion matrices for Phobius and Philius.
Figure 6.
Full-topology prediction precision vs score for the TM proteins.
The black line is the average score within the sliding window used to estimate the precision, and the gray lines indicate the average plus and minus one standard deviation.
Figure 7.
Original Phobius datasets (G, SP+G, TM and SP+TM) and new SignalP and SCAMPI datasets.
Figure is approximately to scale.
Table 4.
Philius full-topology accuracy on new merged TM dataset (top row).
Table 5.
Philius signal peptide discrimination (accuracy, precision, sensitivity, specificity, and Matthews correlations coefficient) and cleavage-site accuracy (fraction of all SPs detected for which the cleavage-site was predicted exactly).
Figure 8.
The total counts and fraction of correct C-terminal localizations as a function of C-terminal segment confidence score for 546 yeast proteins with experimentally assigned C-terminal locations.
Table 6.
Overall YRC predictions on 6.3 million proteins: number and relative fraction of each protein type, median protein type confidence score, and median TM topology confidence score (when applicable).
Table 7.
This table shows, for a few different organisms, the total number of proteins for which predictions were made and the relative fractions of the four basic protein types.
Table 8.
This table shows the fraction of predicted TM and SP+TM that have a single membrane-spanning segment in the mature protein.
Figure 9.
Philius topology prediction for the human presenilin protein as shown on the YRC web-page.
The diagram shows the nine membrane-spanning regions as vertical cylinders, and the cytoplasmic and non-cytoplasmic segments as horizontal bars. Each segment is colored according to type and shaded according to the confidence score. The seventh membrane-helix is missed by many topology predictors and is assigned a relatively low confidence score by Philius and as such is shaded gray. Because of this one low-confidence membrane segment, the location of the C-terminus is less confidently assigned than the location of the N-terminus. On the YRC web page, this diagram is accompanied by the type confidence and topology confidence, as well as a copy of the protein sequence, color coded by segment type. Placing the mouse over any part of the topology diagram or the color-coded sequence will produce a pop-up showing the segment type, confidence, and boundary locations.