Analysis of Stop-Gain and Frameshift Variants in Human Innate Immunity Genes
Figure 4
Correlation between pathogenicity scores of truncating variants and impact in gene-expression levels.
Shown are the distributions (y-axis) of three pathogenicity scores (Panel A: the sequence-based score developed in this work, Panel B: the gene-based score from MacArthur 2012 [19]; Panel C: the gene-based score RVIS [6]) within quintile bins (x-axis) of the average expression z-scores from individuals carrying stop-gain variants (Peer-factor normalized RPKM from [22] were used; see Methods and Figure 2). A total of 1060 stop-gain variants are represented, 212 in each quintile. Quintiles from 1 to 5 are ordered in decreasing impact on gene expression levels and correspond to the following intervals respectively: z-score<−1.25, (−1.25, −0.66], (−0.66, −0.23], (−0.23, 0.23], (0.23, 5.15]. To allow comparison across scores, they are represented as rank percentiles, where the value of a given variant accounts for the percentage of all stop-variants that had a score more pathogenic than the variant. Therefore, a rank percentile of “0” indicates a variant with the highest predicted probability of being pathogenic while a rank percentile of “100” indicates a variant with the lowest predicted severity. A stronger correlation with expression levels was observed for the sequence-based score (Spearman rank correlation = 0.21±0.03, p-value: <5e-12) than either gene-based scores (0.06±0.04, p-value>0.05 for MacArthur 2012 score and 0.13±0.03, p-value<5e-05, for RVIS score). None of the scores associated frameshift variants with gene expression levels.