Analysis of Stop-Gain and Frameshift Variants in Human Innate Immunity Genes
Figure 7
Pathogenicity score distributions for rare stop-gain variants in innate immunity genes.
Rank percentile distributions of pathogenicity scores for rare stop-gain variants (MAF<1%) are shown in different sets of genes: protein coding genome background (grey, “Genome”), innate immunity genes (light turquoise, “Inn Imm”) and their subset of interferon stimulated genes (dark turquoise, “ISGs”). The same categories are shown for OMIM disease variants. All variants are reported in ESP and 1000 Genomes datasets except for sets indicated with the § symbol (dashed boxes) which present scores for OMIM disease variants only reported in the OMIM database. Only three variants reported in ESP and 1000 Genomes were found to affect ISGs and annotated as pathogenic in OMIM; this category is not represented in the figure. Variants with the highest probability of being pathogenic have rank percentiles closer to zero (top of the panels). Panel A represents precomputed gene-based pathogenicity scores from [19]. Panel B represents sequence-based pathogenicity scores, i.e. posterior probabilities using the features described in the present work (see main text). Distributions of rank percentiles are represented as boxes where each box spans between 1st and 3rd quantile, and the median is denoted by a bold line in the middle. Total number of variants within each distribution is indicated. Differences in number of variants in equivalent categories between panel A and B originate from unavailability of the gene-based scores for some genes. Statistical differences against the genome reference (one-sided Wilcoxon rank sum tests) are indicated with asterisks according to Bonferroni corrected p-values: <5e-02 (*), <5e-03 (**) and <5e-04 (***). The genome-wide median is denoted by a red line. Spearman correlation between the sequenced-based and gene-based pathogenicity scores was below 0.31 in all sets of genes analyzed (Figure S5).