Fig 1.
MLH1 is frequently disrupted by splicing mutations.
A. Disease coding mutations in exons 4, 5, 7, 8 and 15 of MLH1 were analyzed with MaPSy. While none of the mutations in exons 4, 5 and 7 (blue bars) were found to disrupt splicing, almost all of the mutations tested in exons 8 and 15 (red bars) significantly altered splicing (100% and 71%, respectively). B. Splicing efficiency of wildtype (blue) and mutant (red) alleles that were tested with MaPSy in exons 8 and 15 of MLH1.
Fig 2.
MLH1 ESM affect different stages of spliceosome assembly.
The percentages of mutant mRNA retained in each stage of the assembly relative to wildtype mRNA are shown for all ESM that were identified in MLH1 exon 8 and 15. The majority of ESM were blocked in the transition from A and B complex. Two of the ESM (CM082944 and CM04546) in exon 8 also slowed down the final transesterification reactions to yield spliced mRNA and the lariat.
Fig 3.
Non-uniform distribution of splicing mutations across disease genes.
A. SSM versus all exonic mutations in the HGMD with regions of 99.9% confidence interval shown in gray. Genes with more, expected, and less SSM are shown in red (Upper), blue (Expected), and green (Lower), respectively. Location of MLH1, MSH2, and PMS2 are highlighted and labeled. B. Percent ESM of total mutations tested using MaPSy in each category. C. Due to the inability of MaPSy to observe mutant-specific exon skipping events (as a result of the identical flanking exons), ESMs found in MLH1, BRCA1, and OPA1 were validated as individual wildtype and mutant minigene constructs. All three mutant constructs showed exon skipping events, which were not shown in wildtype constructs.
Fig 4.
Enrichment of cancer genes in SSM-prone genes.
A. SSM versus all exonic mutations in the HGMD with regions of 99.9% confidence interval shown in gray. COSMIC cancer genes are highlighted in Red. MLH1, BRCA1, BRCA2, and NF1 are highlighted and labeled. B-C. Average percent of SSM or ESM in cancer genes versus non-cancer genes reported in HGMD. D. Average HI score of cancer genes in Upper, Expected, and Lower categories of genes.
Fig 5.
Random forest classification and prediction of SSM-prone genes.
A. The order of variable importance by mean decease in accuracy for SSM-prone genes versus genes with an expected number of SSM. The directions that associate with SSM-prone genes are indicated, positive directions are green, and negative directions are red. B. Classification performance of the random forest models and the logistic regression models was calculated as the area under the curve (AUC) in receiver operating characteristic (ROC) analysis. C. Scheme of random forest classification on all genomic genes. D. Average proportion of low frequency ExAC splice-site variants per splice-site in predicted SSM-prone genes (probability: 0.60–0.86) versus genes not predicted to be SSM-prone (P = 6.1043e-18, Mann-Whitney). E. Common variants are depleted from the category of variants that cause loss of splice-site signal at the 5′ splice-site (upper plot). Rare variants are enriched in the range of the splice site signal scores that abolish 5′ splice-site recognition (lower plot).