Figure 1.
Schematic illustration of the PredPPCrys approach.
The details of each of the six major steps are discussed within the main text.
Table 1.
Number of selected features after one-step and two-step mRMR feature selection for 5-class prediction.
Table 2.
Performance comparison of the SVM models trained based on various feature subsets selected using different methods on the 5-class benchmark datasets.
Table 3.
Prediction performance of the primary classifier built based on the best-performing final feature subset, along with the number of final selected features for each class.
Table 4.
Performance comparison of SVM classifiers with different kernel functions and parameters.
Figure 2.
Correlations between the probability outputs of any two classes.
Results were evaluated based on the training dataset.
Figure 3.
ROC curves for different predictors.
(A), CLF; (B), MF; (C), PF; (D), CF; and (E), CRYS class. Taking the CLF class as an example, the performance of the first-level predictor PredPPCrys I (corresponding to the CLF class feature in Figure A), predictors built using the outputs of classifiers for other classes as inputs, as well as the second-level predictor, PredPPCrys II, are compared using the respective ROC curves. All predictors were built using the optimized SVM parameters based on the respective training datasets, and subsequently tested on the corresponding independent test datasets.
Table 5.
Performance comparison of PredPPCrys I, PredPPCrys II and previous methods, including PPCPred, ParCrys, OBScore, CRYSTAP2, XtalPred, SVMCRYs, SCMCRYS and XtalPred-RF.
Figure 4.
ROC curves displaying the performance of our methods (PredPPCrys I and II predictors), compared to previous procedures, on independent test datasets for predicting propensity of targets to successfully pass each experimental step.
(A), CLF; (B), MF; (C), PF; (D), CF and (E), CRYS class. PredPPCrys-I denotes the first-level predictors of PredPPCrys, PredPPCry-II denotes second-level predictors of PredPPCrys, while PredPPCrys-II_POLY, PredPPCrys-II_RBF, PredPPCrys-II_SIG denote the best performing SVM classifiers built with SVM_POLY, SVM_RBF, SVM_SIG kernels in second-level predictors, respectively.
Figure 5.
Statistical significance of the contributions of selected features to the prediction performance of the five classes, evaluated based on the negative logarithmic value of p-value (-log(P)) calculated using t-tests.
Contribution significance was determined using t-tests, and only the final selected feature types that made a significant contribution (p<0.01) to performance were included in the analysis. The vertical and horizontal axes display the contributory features. The pie chart insets denote the percentages of selected feature types in the final feature subset for each class.