Fig 1.
Categorization of variants based on their location within the genome and their type.
Fig 2.
Workflow diagram describing the construction of the dataset of variants related to Mendelian diseases.
The dataset was prepared by combining deleterious variants from the ClinVar database with neutral variants from the VariSNP database. The resulting dataset was then divided into independent training and testing subsets for each individual category of variants.
Fig 3.
The use of category-optimal thresholds improves the predictive performance of individual tools by increasing their ability to capture differences in the distribution of prediction scores for the different categories of variants.
(A) Distribution of scores for deleterious and neutral variants provided by each evaluated tool for individual categories of variants from the training subsets of the Mendelian diseases dataset. The locations of the general and category-optimal thresholds used to obtain predictions are shown for each tool. (B) Normalized accuracies achieved by individual tools when using category-optimal (blue bars) and general (red bars) thresholds, evaluated using testing subsets of the Mendelian diseases dataset.
Table 1.
Performance of individual prediction tools employing category-optimal thresholds and their PredictSNP2 consensus score for individual variant categories, evaluated using the testing subset of variants associated with Mendelian diseases.
Table 2.
Performance of different consensus scores for specific variant categories, evaluated using the testing subset of variants associated with Mendelian diseases.
Fig 4.
Performance of nucleotide-based and protein-based prediction tools and their consensuses, evaluated using the dataset of variants associated with Mendelian diseases.
(A) Observed normalized accuracy and (B) area under the receiver operating characteristic curve (AUC) values are shown as blue and red bars for nucleotide- and protein-based tools and their consensuses, respectively. The horizontal dashed lines represent average performance values for each tool type.
Fig 5.
Workflow diagram of the PredictSNP2 webserver.
Upon submission of input variants, evaluation is performed with the integrated prediction tools. The raw scores produced by individual tools are transformed into overall decisions about deleteriousness and interpretable confidence scores according to the category of variants detected by ANNOVAR. In addition, links to relevant databases and on-line tools are provided to allow the user to better understand the genomic context and potential function of the corresponding genome region. Optionally, evaluation of missense mutations by PredictSNP1 can be requested.
Fig 6.
The graphical user interface of the PredictSNP2 webserver.
(A) On the input page, variants to be analyzed can be provided in several established formats using one of two reference genome assemblies. (B) On the output page, the predictions of individual tools and their PredictSNP2 consensus score are reported together with links to the eight relevant databases.