Table 1.
Four Categories of Functional Residues Considered in this Study.
Table 2.
17 ESSTs and the Number of Functional Residues Masked from the Alignments.
Figure 1.
Probabilities of Residue Conservation for 21 Amino Acids.
The probability of residue conservation (PCONS) was averaged for the diagonal axis of substitution tables. (A) PCONS of three matrix-types (ENZ, NOENZ and ALL) are compared with the OLD. Non-masking models (X) were used for three matrix-types and OLD to see the effect of alignment source. (ENZ: enzyme-specific 221 SCOP families, NONENZ: non-enzymes, ALL: all the alignments, OLD: non-masking ESST of Shi et al. [11]. See Table 2 for details.) (B) Five masking tables and one non-masking table are compared with the ESST of Shi et al. [11]. Masking and non-masking tables are from the 221 enzyme-specific alignments (ENZ). Masking sources of A, B, C, and D are listed in Table 1. (R: random-masking, X: non-masking.)
Table 3.
Rank Correlation.
Table 4.
Z-Score of CRESCENDO for Functional Residues.
Figure 2.
Performance of 17 ESSTs on Detecting Active Site Residues.
Z-score (blue) and sensitivity (red) are plotted against 17 ESSTs. Z-score is averaged for 602 active-site residues in the test-sets (see text). Z-score and sensitivity (SENS) are highly correlated (0.95 in Spearman's rank correlation, Table 3). If any SCOP families in the test-sets are included in 17 ESSTs, they are removed from the ESSTs to avoid any bias. Those benchmarking ESSTs are marked by ‘t’ (e.g., At, Bt, Ct and Dt) to distinguish from the original. Z-score and SENS of non-masking (X) and random-masking (R) tables are always lower than those of masking models (At, Bt, Ct, and Dt) within the same matrix type (OLD, ENZ, ALL). All the masking-tables outperform the ESST of Shi et al. (J) [11].
Figure 3.
Predicting Four Categories of Functional Residues by CRESCENDO.
Four case-studies of predicting functional residues are shown; (A) active-sites, (B) PPI (protein–protein interaction), (C) PNI (protein–nucleic acid interaction, (D) PLI (protein–ligand interaction). SCOP domains d1evua4 [23], d1i7kb_ [24], d1k8wa5 [33] and d1ed9a_ [34] were used for A, B, C, and D, respectively. True positives (TP) are coloured in pink, false negatives (FN, missing residues) in orange and false positives (FP) in green. TP and FN are shown as sticks (bold-frame). (A) Cysteine protease. CRESCENDO predicted 27 residues as functional residues. All three (CYS-314, HIS-373 and ASP-396) catalytic residues were correctly identified. ALL-B type ESST (see Table 2) was used in this figure. FP (green) are clustered around the three real active sites (pink). (B) Ubiquitin conjugating (UBC) enzyme. 12 residues were predicted by CRESCENDO using ALL-A ESST. Five (coloured in pink) were correctly identified among 14 residues annotated as PPI residues. Interacting partner (A chain of 1i7k) is placed at the bottom and coloured in gray. The solvent accessible surface areas (SASA) for five TP are as follow; ARG-34 (35.64), PRO-90 (4.12), SER-123 (4.74), ALA-124 (0.55), LEU-125 (72.39). SASA for 9 FN are as follow; PRO-30 (77.26), VAL-31 (24.02), SER-87 (110.40), GLY-88 (16.05), TYR-89 (0.01), TYR-91 (58.29), GLU-120 (108.68), LYS-121 (113.96), TRP-122 (7.20). The SASA is from InterPare [18]. (C) Pseudouridine synthase. BIPA (S. Lee, unpublished) annotates 43 residues as PNI. 14 residues were TP (coloured in pink) among 20 residues predicted by CRESCENDO. ALL-D was used as ESST. DNA is coloured in blue. (D) Alkaline phosphatase. UniProt annotates 9 residues as metal-binding (METAL), which were all correctly identified by CRESCENDO among 30 predicted residues. ALL-B was used as ESST. ZN (zinc) and MG (magnesium) are coloured in cyan and blue, respectively.
Table 5.
Performance of 17 ESSTs on Detecting Active Sites.
Table 6.
Performance of ESSTs on Protein–Protein Interaction Residues.