Fig 1.
(A) Feature representation. A total of 1521 sequence, Euclidean and Voronoi neighborhood features are initially generated. (B)Two-step feature selection. Stability selection is used as the first step. We select the top 152 features with score larger than 0.2. The second step is performed using a wrapper-based feature selection. Features are evaluated by 5-fold cross-validation with the GTB algorithm. (C) Prediction model. Gradient boosted trees are finally built for prediction.
Table 1.
Performance of selected attributes with the two-step feature selection method.
The first column lists different cutoffs of stability selection scores.
Fig 2.
ROC curves of our two-step algorithm and other three existing feature selection methods.
Table 2.
Rankings of feature importance for the optimal selected features.
SN, EN and VN represent sequence neighborhood, Euclidean neighborhood and Voronoi neighborhood, respectively. The numbers in the brackets denote the positions in the sliding window for sequence neighborhood features.
Fig 3.
The relative importance and ranking of the optimal feature group, as evaluated by the gradient tree boosting.
The bar represents the importance score of the corresponding feature group.
Fig 4.
Comparison of the AUC value of the the three methods using 5-fold cross-validation on the benchmark dataset.
Table 3.
Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the benchmark dataset.
Fig 5.
The ROC curves of seven classifiers on the benchmark dataset.
Table 4.
Prediction performance of PredSAV classifiers in comparison with six other prediction tools on the independent test dataset.
Fig 6.
The ROC curves of seven classifiers on the independent test dataset.
Fig 7.
Prediction examples of the functional effects of SAVs in two proteins by PredSAV and other methods.
Red color denotes disease-associated variants while blue color represents neutral variants. (A) and (B) represent proteins PAH (PDB ID: 1J8U, chain A) and LSS (PDB ID: 1W6K, chain A), respectively. 3-D structures are rendered using PyMol [75].