Analysis of Stop-Gain and Frameshift Variants in Human Innate Immunity Genes
Figure 3
Receiver operating characteristic of the performance of pathogenicity scores for stop-gain variants.
Panel A: Classification power of three pathogenicity scores was evaluated on a set of 1160 pathogenic stop-gain variants in the OMIM database, and 125 common stop-gain variants not known to be pathogenic. Shown are the ROC curves for the sequence-based classifier (SB) developed in this work, for the gene-based score reported in [19] (GB), and for the joint classifier (SBĂ—GB). Dashed curves correspond to a randomization test in which rows in sequence features are shuffled column-wise (denoted by SB(r)). Panel B: AUC improvement achieved when combining the sequence-based scores with a gene-based score. The panels shows AUC values of ROC curves using two independent gene-based scores (MacArthur 2012 [19] and RVIS [6]), on two independent datasets of variants (ESP and 1000 Genomes) and two types of variants: stop-gains and frameshifts. Corresponding ROC curves and number of pathogenic and common variants used for benchmark is shown in Figure S4. Inclusion of sequence features led to an increased area under the ROC curve in all evaluated settings.