Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

Overview of the FunSAV method for predicting the functional effect of SAVs.

Features used by FunSAV are derived from the amino acid sequence of the protein, 3D structure of the protein, as well as network properties which are calculated based on the representation of the protein structure as a residue-residue contact network. A full list of the extracted features is given in Table 1. After feature selection, distinguishable features between disease-associated and neutral SAVs are statistically analyzed and used as the input to construct RF models. Prediction performance is evaluated by both 5-fold cross-validation and independent tests.

More »

Figure 1 Expand

Table 1.

Features used in this study, which are categorized into nine major types: sequence or sequence-derived, structure, residue-contact network features, computed scores, annotations from database, solvent exposure features, coevolutionary features, solvent accessibilities and conservation score.

More »

Table 1 Expand

Table 2.

Abbreviations of the 15 final selected features in this study.

More »

Table 2 Expand

Figure 2.

The relative importance and ranking of the optimal feature group, as evaluated by the mean MDGI Z-Score.

The bar represents the mean MDGI Z-Score of the corresponding feature group. NACCESS: solvent accessibilities calculated by NACCESS [50]; exposure: solvent exposure features calculated by the biopython package [51]; network: residue-contact network features calculated by the JUNG library available at http://jung.sourceforge.net/; PSSM: PSSM features calculated by PSI-BLAST [28]; co-evolution: coevolutionay features including MIr, MIp, MI and Kai value; DSSP_ACC: the number of water molecules in contact with the residue of interest extracted from DSSP [39]; conserve_score: conservation score defined in the Feature extraction Section; SSpro: solvent accessibility calculated by the SSpro program [30]; MW_change: Mass weight change upon mutation; B_factor: the temperature factor extracted from the PDB file; DISOPRED: predicted native disorder by DISOPRED [31].

More »

Figure 2 Expand

Figure 3.

Comparison of the mean values and standard deviations of the 15 optimal features of disease-associated and neutral SAVs.

“*” represents a P-value in the range of 0.01∼0.05, “**” represents a P-value in the range of 2.2e-16∼0.01, while “***” represents a P-value<2.2e-16, respectively. See Table 2 for more details about feature abbreviations.

More »

Figure 3 Expand

Figure 4.

Effect of the removal or inclusion of the 15 individual optimal features on the prediction performance of the first-stage FunSAV classifier.

Performance was evaluated using MCC. A: Performance of the trained classifier using the individual feature; B: MCC decrease of the trained classifier by removal of the corresponding feature. See Table 2 for more details about feature abbreviations.

More »

Figure 4 Expand

Table 3.

Prediction performance of the first-stage and two-stage FunSAV classifiers in comparison with six other prediction tools.

More »

Table 3 Expand

Figure 5.

The ROC curves of nine classifiers based on 5-fold cross-validation tests.

Results are evaluated based on the benchmark dataset (A) and independent test dataset (B).

More »

Figure 5 Expand

Figure 6.

Prediction examples of the functional effect of SAVs in two proteins by FunSAV.

(A) and (B) the all-atom; (C) and (D) surface; (E) and (F) network representations of proteins hATR (PDB ID: 2IDX, chain A) and PAF-AH (PDB ID: 3D59, chain A), respectively. Red color denotes disease-associated variants while green color represents neutral variants. 3D structures were rendered using PyMol [71] and network graphs were drawn using Cytoscape [72].

More »

Figure 6 Expand

Figure 7.

Prediction example of the false negative of the functional effect of SAVs by FunSAV for the Noggin protein.

(A) The all-atom; (B) surface; (C) network representations of the Noggin protein. Red color denotes the disease-associated variant. 3D structures were rendered using PyMol [71] and network graphs were drawn using Cytoscape [72].

More »

Figure 7 Expand