Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information

doi:10.1371/journal.pcbi.1000181

Figure 1.

Accuracy of functional site identification in the Thornton and Lovell datasets by several methods that use sequence information only (HMM_rel_ent), then with the addition of evolutionary information (HMM_rel_ent+SSR), followed by the addition of information on the type of amino acids (HMM_rel_ent+SSR+AAType), and finally with the additional structural information (MFS).

The ROC scores and the top-10 hits scores were used to evaluate performance. The four methods have increasing accuracy, demonstrating the importance of combining information from sequence, structure, evolution, and amino acid type together when functionally characterizing proteins.

More »

Expand

Table 1.

Correlation coefficients of several components of the MFS method in the Thornton dataset (cells in upper-right triangle of the table) and the Lovell dataset (lower-left triangle), respectively.

More »

Expand

Figure 2.

Performance comparison of the MFS method, the SeqonlyMFS method (HMM_rel_ent+SSR+AAType), the Evolutionary Trace method, and the ConSurf method with the Thornton and Lovell datasets.

Only proteins for which both the Evolutionary Trace and ConSurf methods are able to give predictions are used in the comparison. Four measures are used to compare the performance, including: ROC scores, the precision when sensitivity threshold is set at 20%, the false positive rate when sensitivity threshold is set at 20% and the top-10 hits. ET is only used in the ROC score computation but not in other comparative analysis, since it gives many tied scores for top-scoring residues. Both the MFS and SeqonlyMFS methods have better performances than methods that use only one type of information.

More »

Expand

Figure 3.

The different predictive performance of the MFS method, the SeqonlyMFS method, the Evolutionary Trace server, and the ConSurf server on two examples.

The structure of an ornithine decarboxylase (A) (PDB identifier 1ord-A) and a cellobiohydrolase (B) (PDB identifier 1cel-A) are shown in the ribbon representations with the functional sites (223H-316D-355K in 1ord-A, 212E-214D-217E-228H in 1cel-A) represented as spheres. Each residue is colored by its predicted functional importance score, with the color changing from red to white to blue as the score decreases. For 1ord-A (A), both MFS and SeqonlyMFS work well in assigning the highest scores to the functional sites. However, ET and ConSurf also assign high scores to nearby residues in the surrounding cavity, thus the functional sites do not appear in the top-10 hits lists that are generated by these methods. For 1cel-A (B), all the sequence-based methods are able to assign relatively high scores to the functional sites (different shades of red color), but only the MFS method that uses structural information can boost the scores of the functional sites higher (more intense red color) to show up in the top-10 hits list.

More »

Expand

Figure 4.

The application of MFS to understand the role of btuba/btubb dimer in the bacterial genus Prosthecobacter using the predicted and experimental structures.

Both structures are colored by depicting higher MFS scoring residues with a more intense red color, with the top-10 high-scoring residues represented by spheres. One GTP and one GDP in the predicted structure, as well as one GDP and two SO₄²⁻ ions in the experimental structure are shown as yellow spheres. The predicted structure is generated by homology-modeling techniques using the eukaryotic α/β tubulin dimer (PDB identifier: 1jff) as the template. The taxol ligand and metal ions are omitted from the predicted structure for easier depiction. In the predicted structure, btubb lies above btuba, with a GDP molecule enclosed by the dimer interface. In the experimental structure (PDB identifier: 2btq), btuba lies above btubb and there is no GDP in the dimer interface. Our MFS analysis first confirmed that btuba and btubb indeed form dimers due to the existence of a high-scoring cluster in their dimer interface, in contrast to previous predictions made by using the structural stability score alone. In addition, the MFS suggests that regardless of how btuba and btubb orient with each other, their interface is functionally important and may bind to GDP molecules.

More »

Expand

Figure 5.

Prediction of residues with rare function not represented in the training sets.

MFS was trained on a set of residues experimentally characterized to participate in canonical catalytic functionalities and protein-ligand interfaces. Protein binding to biomineral surfaces is a rare function and poorly understood process, for which the only diffraction structure available is osteocalcin binding metal ions (depicted as green spheres with ionic bonds to the γ-carboxy glutamic acid (gla) residues in transparent green tube) (PDB identifier: 1q8h). The three gla residues of osteocalcin (represented as spheres, similar to the target residues in Figure 3 above) previously shown to bind the hydroxyapatite surface of bone are clearly selected by MFS within the top six of 49 residues, with or without knowledge of structural and post-translational modification to these residues. These residues are selected within the top eight by ConSurf, with much lower discrimination from scores for the other residues in osteocalcin. None of these residues are selected within the top-10 by ET. This example demonstrates the applicability of MFS to make highly accurate and specific predictions for proteins of vastly diverse functions.

More »

Expand