The impact of genetically controlled splicing on exon inclusion and protein structure

Common variants affecting mRNA splicing are typically identified though splicing quantitative trait locus (sQTL) mapping and have been shown to be enriched for GWAS signals by a similar degree to eQTLs. However, the specific splicing changes induced by these variants have been difficult to characterize, making it more complicated to analyze the effect size and direction of sQTLs, and to determine downstream splicing effects on protein structure. In this study, we catalogue sQTLs using exon percent spliced in (PSI) scores as a quantitative phenotype. PSI is an interpretable metric for identifying exon skipping events and has some advantages over other methods for quantifying splicing from short read RNA sequencing. In our set of sQTL variants, we find evidence of selective effects based on splicing effect size and effect direction, as well as exon symmetry. Additionally, we utilize AlphaFold2 to predict changes in protein structure associated with sQTLs overlapping GWAS traits, highlighting a potential new use-case for this technology for interpreting genetic effects on traits and disorders.

Reviewer #1: In the Einson et al. manuscript, the authors identified splicing quantitative trait loci (sQTLs) using exon percent spliced in (PSI) scores as a quantitative phenotype.They also utilized AlphaFold2 to predict changes in protein structure associated with sQTLs that overlapped with GWAS traits, highlighting a potential new use-case for this technology in interpreting genetic effects on traits and disorders.This study presents novel insights into the effects of sQTLs on splicing events and GWAS signals.
However, there are some major comments to be addressed.1.The study lacks a direct comparison with those protein coding genes with sQTLs from previously published studies, such as the one by Garrido-Martín et al. (2021) Garrido-Martín, D., Borsari, B., Calvo, M. et al.Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome.Nat Commun 12, 727 (2021).https://doi.org/10.1038/s41467-020-20578-2RE: We had reported the comparison of our sQTL calls to the GTEx v8 sQTL calls based on LeafCutter, which is the most commonly used method.However, we agree that a comparison to the sQTLs from the referenced paper is valuable.Thus, we obtained the sQTL calls from Garrido-Martin et al., and now report the comparison to these in line 133, and Supplementary Figure 1E.
2. It is not clear which exon is considered the "top exon" in the statement "removing genes where the 3' or 5' terminal exon was the top exon."If the top exon refers to the first or last exon, then the majority of genes would be removed since most genes have a 3' or 5' exon.accurate deleteriousness scores are available only for coding variants, and biologically informative only when the sVariant is the true causal variant rather than LD proxy.Secondly, because sVariants are relatively common in the population and are associated with common changes in alternative splicing, they are unlikely to be severely damaging.
4. The threshold (i.e.Euclidean distance between (1,1) and (PP.power,PP.coloc) less than .25)for colocalization analysis is not widely used, and a more typical threshold of PP.coloc > 0.5 or 0.8 should be used.RE: As explained in line 560, we chose this looser definition of colocalization to allow for more data in downstream analysis, and where the false positive rate is less critical.Originally, we had implemented a strict cutoff of PP.coloc > 0.8, but this is highly conservative and produces many false negatives, and implementing that substantially limited our power to detect associations with colocalization status and splicing change.There is no systematic reason that a looser threshold with potentially same false positive colocalizations would bias the patterns or outcomes that we report.We have added a note of this in Methods advising caution in interpretation of loci where the evidence for colocalization is not very strong.
5. The colocalization analysis between sQTLs and eQTLs from GTEx should be provided even though the overlap is marginal.RE: Thank you for this suggestion.We have included the credible sets overlaps between sQTLs and eQTLs in the Zenodo repository.It is now live with the new version.
Minor comments: 1. Fig 1B ."Significant genes" for x-axis should be "Genes with a significant ψQTL" RE: Thank you for the comment.The axis label of this figure has been updated.

2.
The following text should be moved to the discussion section: "While Leafcutter [8] identifies more splicing events and finds more sQTLs, it presents an interpretability challenge.It is often difficult to identify which exon a Leafcutter cluster corresponds to, and effect directions are sometimes unclear.While ψQTLs are less powerful in a statistical sense, the method clearly links splicing events to exons, genes, and effect directions, which was advantageous to the purpose of this study.As an additional follow up, we performed ψQTL mapping in the Geuvadis [30]dataset, and checked for concordance with GTEx EBV-transformed lymphocytes.(See methods for details) Of the 1,119 sGenes with coverage in Geuvadis and GTEx, 423 had a significant sExon in both.The coverage in Geuvadis was smaller overall, only capturing splicing for 853 out of 1431 exons with a significant ψQTL in GTEx lymphocytes.Among this set, however, the concordance was reasonably high, with a π1 score of 0.63.This replication strengthens evidence that ψQTLs are robust in many contexts."RE: We think that the overall clarity of the manuscript would suffer if this text was moved.The description of the LeafCutter comparison presents a rationale for the work, and moving this to the end could leave a reader wondering why we did not just use the GTEx sQTLs as is.This topic is elaborated further in Discussion.The replication analysis reports results of an analysis, which in our opinion belongs to Results.
3. "p < 2e-16" is a default minimum p value.Please use the true p value.RE: Thank you for this comment.We have updated statistics to the true p value at line 148, and twice at line 269.Figure 3 has also been updated accordingly.
4. When the Fisher's Exact test is applied, it is not clear what is the background used?For example, what is the background in the following test: "We found that indeed, among sExons, 41.20% were symmetric compared to 38.77% of all non-terminal exons annotated in gencode v26 (Fig 1C, Fisher's Exact Test p = 6.64x10-4)." RE: Thank you for pointing out this confusing language.We have clarified the background in the relevant sections.In the analysis pointed out here, we are comparing sExons to non-sExons (exons without a significant ψQTL), and stratifying them by being symmetric or not.
Reviewer #2: The authors have addressed all concerns.One additional suggestion would be to include a supplemental table with the locations and variant ID associated with all of the ~4k psi-QTLs/sExon's analyzed in the study, so that other scientists can easily work with the same data.