Deciphering novel TCF4-driven mechanisms underlying a common triplet repeat expansion-mediated disease

Fuchs endothelial corneal dystrophy (FECD) is an age-related cause of vision loss, and the most common repeat expansion-mediated disease in humans characterised to date. Up to 80% of European FECD cases have been attributed to expansion of a non-coding CTG repeat element (termed CTG18.1) located within the ubiquitously expressed transcription factor encoding gene, TCF4. The non-coding nature of the repeat and the transcriptomic complexity of TCF4 have made it extremely challenging to experimentally decipher the molecular mechanisms underlying this disease. Here we comprehensively describe CTG18.1 expansion-driven molecular components of disease within primary patient-derived corneal endothelial cells (CECs), generated from a large cohort of individuals with CTG18.1-expanded (Exp+) and CTG 18.1-independent (Exp-) FECD. We employ long-read, short-read, and spatial transcriptomic techniques to interrogate expansion-specific transcriptomic biomarkers. Interrogation of long-read sequencing and alternative splicing analysis of short-read transcriptomic data together reveals the global extent of altered splicing occurring within Exp+ FECD, and unique transcripts associated with CTG18.1-expansions. Similarly, differential gene expression analysis highlights the total transcriptomic consequences of Exp+ FECD within CECs. Furthermore, differential exon usage, pathway enrichment and spatial transcriptomics reveal TCF4 isoform ratio skewing solely in Exp+ FECD with potential downstream functional consequences. Lastly, exome data from 134 Exp- FECD cases identified rare (minor allele frequency <0.005) and potentially deleterious (CADD>15) TCF4 variants in 7/134 FECD Exp- cases, suggesting that TCF4 variants independent of CTG18.1 may increase FECD risk. In summary, our study supports the hypothesis that at least two distinct pathogenic mechanisms, RNA toxicity and TCF4 isoform-specific dysregulation, both underpin the pathophysiology of FECD. We anticipate these data will inform and guide the development of translational interventions for this common triplet-repeat mediated disease.


Introduction
Fuchs endothelial corneal dystrophy (FECD) is a common, age-related, and visually disabling disease that represents a leading indication for corneal transplantation in high-income countries) [1,2].It primarily affects the corneal endothelium, a monolayer of post-mitotic cells on the posterior corneal surface that function to maintain relative corneal dehydration.As the disease progresses, collagenous excrescences (guttae) form on the thickened corneal endothelial basement membrane (Descemet membrane) and corneal endothelial cells (CECs) undergo accelerated cell death, which results in corneal oedema and opacity [3].Affected individuals typically experience symptoms in their 5 th to 6 th decade, with glare, reduced contrast sensitivity, and blurred vision [4].Transplantation of donor endothelial tissue is the preferred treatment for cases with visually significant symptoms.However, given an increasing societal burden of age-related disease coupled with a global shortage of suitable donor tissue, there is a pressing clinical need for alternative treatments [3,5].
Up to 80% of FECD cases of European descent have one or more expanded copies of a CTG trinucleotide repeat element (termed CTG18.1;MIM #613267) located within an intronic region of the transcription factor gene 4, TCF4 [4,6].We have previously demonstrated, in our large (n = 450) FECD cohort, that an expanded copy of this CTG repeat (defined as �50 copies) confers >76-fold increased risk for developing FECD [7].Expansion of non-coding repeat elements within the human genome have been identified to cause more than 30 human diseases to date [8], including fragile X tremor ataxia syndrome (MIM #300623), myotonic dystrophy types 1 and 2 (MIM #160900 and #602668) and C9orf72-associated amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (MIM #105550) [8].Expansion-positive (Exp+) FECD represents by far the most common disease within this category.However, to date, it has only been established that it affects the cornea, which is in contrast to the majority of the other much rarer non-coding repeat expansion disorders that typically have devastating outcomes on neurological and/or neuromuscular systems.Despite these differences, strong mechanistic molecular parallels exist between CTG18.1-mediatedFECD and other non-coding repeat expansion diseases, whereby multiple pathogenic mechanisms occur concurrently to drive pathology [8,9].
We, and others, have demonstrated that RNA toxicity comprising splicing dysregulation attributed to the accumulation of CUG-containing RNA foci and sequestration of splicing factors MBNL1 and MBNL2 within affected cells, is a key feature of FECD in the presence of one or more expanded CTG18.1 alleles [6,7,[10][11][12].Alongside this RNA induced toxicity, repeatassociated non-AUG (RAN) dependent translation of the expanded repeat, and dysregulation of TCF4 itself have also been hypothesised to contribute to the underlying pathology of CTG18.1 Exp+ mediated FECD [13].However, a comprehensive understanding of consequences and role of both the spliceopathy and downstream effects of the TCF4 dysregulation have yet to be characterised in parallel in the same model system.Furthermore, because TCF4 comprises >90 alternative spatially and temporally expressed isoforms, deciphering transcript-specific level dysregulation has proven to be extremely challenging and a consistent pattern of TCF4 dysregulation has yet to be associated with CTG18.1 expansions in corneal endothelial cells (CECs) [14][15][16].Such diversity is reflected in the varied functional roles TCF4 plays in human development, alongside the range of human disease associated with TCF4 variants in addition to FECD.For example, TCF4 haploinsufficiency results in Pitt-Hopkins syndrome (MIM #610954) [17,18], a severe neurodevelopmental disorder, whereas somatic TCF4 mutations are commonly identified in Sonic hedgehog medulloblastoma [19] and common germline variants have been associated with Schizophrenia risk [20,21].
In this study, we used a large cohort of biologically independent primary FECD casederived CECs to isolate the effects of the multiple parallel pathogenic processes underlying CTG18.1-mediatedFECD.We present long-read transcriptomic data to enhance our understanding of the increased isoform diversity observed in Exp+ FECD, and spatial transcriptomics to demonstrate TCF4-isoform dysregulation.For the first time, we also isolate downstream transcriptomic signatures of dysregulation that we hypothesise to be driven by TCF4-isoform dysregulation that may, in part, explain the tissue specificity of the disease.We go on to explore the potential relevance of TCF4 dysregulation to FECD, independent of the CTG18.1 expansion, by interrogating expansion-negative FECD cases for rare TCF4 variants through exome sequencing and subsequently applying a gene-burden style approach.

RNA-seq data from CECs reveal CTG18.1 expansion-mediated FECD to be transcriptomically distinct from non-expanded FECD
To further increase our understanding of CTG18.1 expansion-associated splicing dysregulation events, and better understand the relationship between individual events to transcriptspecific isoforms, we generated long-read RNA-seq data from biologically independent unpassaged primary CEC cultures derived from healthy control (n = 4, mean age 68.8 years) and Exp+ FECD (mono-allelic CTG18.1 allele repeat length �50) (n = 4, mean age 66.8 years) tissues (Tables 1 and S1).After alignment and stringent quality control, reads that displayed evidence of 5' degradation, intrapriming and reverse transcription template switching were removed.Exp+ and control samples were subsequently analysed with SQANTI3 to assess isoform diversity (Fig 1A and 1B and Tables 2 and S2).We also generated short-read RNAseq data from unpassaged primary CEC cultures derived from healthy control (n = 4, mean age 58.5 years), Exp+ FECD (mono-allelic CTG18.1 allele repeat length �50) (n = 3, mean age 63.3 years) and Exp-FECD (bi-allelic CTG18.1 allele repeat lengths �30) (n = 3, mean age 63.3 years) tissues (Table 1).Exome data generated from the Exp-FECD cases utilised in this transcriptomic work were analysed for the presence of variants in any previously FECD-associated genes, such as TCF4, COL8A2, SLC4A11, ZEB1, and AGBL1 [4], to identify any potential underlying genetic cause.No rare (MAF <0.005) or potentially deleterious (CADD >15) variants were identified.Hierarchical clustering using heatmap visualisation and PCA analysis including sex and ethnicity as covariates revealed that Exp+ samples grouped separately from not only the controls, but also the Exp-samples (Fig 1C and 1D).Driving genes and additional principal components are listed in S2 Table .Notably, Exp-samples also clustered together on the PCA plot, but the heatmap derived from hierarchical clustering highlighted that greater intra-group variability existed between Exp-samples compared to variability within control and Exp+ groups (Fig 1C).We hypothesise that the increased transcriptomic diversity within the Exp-group is due to potentially distinct, as yet unidentified, genetic causes driving disease for each FECD case-derived sample.We included all three groups in the downstream comparative pipelines using the following pairwise comparisons; (PWC1) control versus Exp+ FECD, (PWC2) Exp+ FECD versus Exp-FECD, and (PWC3) control versus Exp-FECD.Alternative splicing analysis of long-read and short-read transcriptomic data demonstrate the complexity of increased splicing events underlying CTG18.

expansion-mediated disease in CECs
Iso-seq and subsequent SQANTI3 analysis (Tables 2 and S3) revealed a higher proportion of isoforms per gene in Exp+ samples (1.7 unique isoforms per unique gene in Exp+ vs 1.3 in control).Additionally, we see a higher proportion of known non-canonical splice junctions in Exp+ samples.When characterising transcripts based on splice junctions, we see similar proportions of isoform subtypes (S3 Table ).This is consistent with previous work [6,[10][11][12] and validated by our short read data, discussed below, (S1 and S2 Figs), whereby Exp+ samples showed greater isoform diversity compared to control and Exp-samples, but not an overall skew in the types of alternative splicing observed.
To further characterise and quantify alternative splicing events within our short-read dataset, we utilised rMATS, a junction-based method that employs a hierarchical framework, to model unpaired datasets and evaluate and quantify all major alternative splice types [23][24][25].Comparing Exp+ to control samples (PWC1) we identified 2,118 significant alternative splicing events in 1,437 genes (S1 and S2 Figs and S4 Table ).Alternative splicing comparisons including Exp+ (PWC1 and PWC2) yielded the most alternative splice types in every category (S1 and S2 Figs) further reinforcing the spliceopathy aspects of this genetic subtype of disease.
We then compared the rMATS and Iso-seq results to a subset of published datasets exploring differential splicing in FECD samples (S5 Table ) and selected 24 published differential splicing events (from 23 genes) that had previously been identified by bioinformatic or in vitro methods to be strongly associated with CTG18.1 expansion-mediated FECD [4,6,7].Alternative splicing analysis of our transcriptomic dataset identified that 75.0%(18/24) of these events were significant in PWC1 (control vs Exp+, S5 Table ).In addition to these 24 selected events, rMATS identified another 10 novel and significant differential splicing events within Exp+ in these genes, highlighting the increasing complexity of splicing dysregulation within Exp+ FECD.Long-read sequencing confirmed the presence of these novel splicing events in Exp+ FECD (S6 Table ).Also, it was interesting to note that FN1 was identified to have robust alternative splicing in Exp+, and further investigation showed that alternatively spliced exons in Exp+ corresponded with skipping of the Extra Domain A (EDA) and Extra Table 2. Summary of SQANTI3 analysis of IsoSeq data.In-depth characterisation of isoforms in control and Exp+ long-read RNA-seq data utilising SQANTI3 pipeline following stringent quality control.SQANTI3 subdivides splice junctions into known (present in reference transcriptome) and novel (not in reference transcriptome).Canonical splice junctions refer to the most common type of splice junction (GTAG, GCAG, ATAC)), while non-canonical refers to other rarer and tissue specific splice junctions with different donor and acceptor sites.SQANTI3 defines novel as genes or splice junctions that are absent from the reference transcriptome.

Control (n = 4)
Exp+ FECD (n = 3) Genes Domain B (EDB) protein domains that play a critical role in signal transduction and fibrillar formation [26] (S7 Table ).Skipped exon alternative splicing events, where an additional exon is either included or excluded, were the most common in all three pairwise comparisons (S1 Fig) .Prior work regarding alternative splicing in FECD has focused on MBNL1/2 sequestration [6,7,27] as MBNL proteins are key regulators of exon splicing [28,29].Given their complex role, we attempted to determine if the flanking intronic sequences of alternatively spliced exons were enriched for common RNA-binding protein (RBP) binding motifs.We focussed our RBP motif enrichment analysis (rMAPS2 [30]) on the 250bp region upstream of alternatively spliced exons and isolated those areas where there was enrichment in PWC1 and PWC2, but not PWC3 (i.e.events enriched within Exp+ CEC lines).By filtering rMAPS results in this manner, a trend whereby skipped exons are significantly (p < 0.05) upregulated by RNA-binding proteins with TT-containing binding motifs and downregulated by RNA-binding proteins with CG-containing binding motifs was observed (S8 Table ).As MBNL proteins favour YCGY binding motifs, this suggests that other RNA-binding proteins with TT-containing motifs could also play a role in driving the spliceopathy underpinning RNA toxicity, a key mechanism of disease in Exp+ FECD.

Evidence of TCF4 dysregulation appears enriched in CTG18.1-mediated FECD CECs, highlighting specific candidate biomarkers for this genetic subtype of disease
Differential gene expression was assessed with DESeq2 [31] in all three groups including sex and ethnicity as covariates.For all comparisons an FDR corrected P-value (padj) of <0.05 was used as a threshold for significance (S9 Table ).A total of 4,288 genes were identified to be differentially expressed between the Exp+ and controls (PWC1; Fig 2A).Of these, 2,089 genes had a shrunkLFC value (moderated log 2 fold values of expression accounting for high dispersion and low expression counts) of greater than 1 or less than -1.Overall, PWC2 and PWC3 groups resulted in lowered differential gene expression, with 1,695 and 1,911 significantly differentially expressed genes, respectively (Fig 2A and S9 Table).
Pairwise analysis revealed considerable overlap between dysregulated genes identified by all three pairwise comparisons (Fig 2A ), however, highest convergence was observed by PWC1 (Exp+ versus control) and PWC2 (Exp+ v Exp-), suggesting these events are driven by CTG18.1 expansions.In total, both pairwise comparisons reveal 1,028 dysregulated genes uniquely and significantly dysregulated in Exp+ FECD case-derived samples.Conversely, by comparing the overlap between PWC1 and PWC3 (Exp+ vs control and Exp-vs control) we identified a further 873 genes that are dysregulated in FECD, irrespective of the underlying genetic cause, and thus representing more ubiquitous signatures of the disease.Dysregulated genes in PWC1, including Exp+ unique and common to FECD, did not skew directionally and were evenly up or down-regulated within our analysis (Fig 2C).We isolated 3,292 genes that are uniquely dysregulated in Exp+ FECD, 2,264 genes dysregulated solely in PWC1 and a further 1,028 dysregulated genes shared with PWC2; (highlighted in orange Fig 2A ) and proceeded with downstream analysis (S9 Table ).
We conducted enrichment analysis using g:profiler [32] (querying gene ontology (GO), KEGG pathway, and Reactome pathways).GO term enrichment and KEGG pathway analysis of PWC1 dysregulated genes was consistent with previously published transcriptomes [6,11,12], with several enriched terms and pathways relevant to RNA toxicity (i.e.GO:0000166 etc) (S10 Table ).While total TCF4 gene expression was not significantly downregulated in Exp+ (Fig 2B), intriguingly, we observed strong parallels to pathway enrichment analyses highlighted from a study of TCF4 neural progenitor knockdown cell model [33] within the 3,292 genes uniquely dysregulated in Exp+, such as protein processing in endoplasmic reticulum, MAPK signalling, actin cytoskeletal regulation, endocytosis and apoptosis (Fig 2D).Critically, these pathways are not enriched in PWC3, suggesting that these are unique to Exp+ CECs.Furthermore, several enriched pathways and GO terms involving Wnt signalling were seen in the 3,292 dysregulated genes unique to Exp+ that are not present in PWC3, again suggesting TCF4 function may be modified within Exp+ FECD CECs (S10 Table ).Furthermore, enrichment of dysregulated genes in pathways involving other repeat expansion-mediated diseases and misfolded protein aggregate diseases suggest mechanistic parallels which reinforce Exp+ CECs as a model system with broad relevance to repeatome biology (S10 Table ).
The consistent presence of PI3K-Akt pathway enrichment within KEGG results in Exp + dysregulated genes (and its consistent absence in PWC3 pathway enrichment) suggests that Exp+ FECD CECs may interact with growth factors and the extracellular matrix in a unique manner.Hence, components of the PI3K-Akt pathway may serve as informative biomarkers for Exp+ FECD in future clinical trial settings, and thus warrant further investigation.Additional pathways are enriched in this set of genes (and not in PWC3) that involve RNA processing and binding, other repeat-mediated disease related mechanisms, epithelial to mesenchymal transition, and multiple signal transduction cascades with extracellular matrix interactions (S10 Table ).
To determine if other RNA-binding proteins may be implicated in the alternative splicing seen in Exp+ FECD, we cross-referenced our short-read data with a list of ~1500 human RNA-binding proteins [34], and discovered that there were 141 RBPs uniquely dysregulated in Exp+ (45 downregulated and 106 upregulated), including MBNL1, CELF1, ESRP2, and a variety of hnRNP and DDX encoding genes (S11 Table ).Both the hnRNP and DDX family of RNA-binding proteins have previously been implicated in other repeat expansion diseases [32,[35][36][37].It is likely that a combination of these dysregulated RBPs plays a role in driving the alternative splicing and RNA toxicity seen in Exp+ FECD.

Differential TCF4 exon usage suggests isoform ratio disruption of TCF4 in CECs of CTG18.1-mediated FECD individuals
Given signatures of TCF4 dysregulation were apparent in the pathway enrichment analysis of Exp+ CECs (Fig 2D), we then aimed to determine if any tangible patterns of TCF4 transcriptspecific dysregulation was observed in the Exp+ CECs, despite total TCF4 levels not being significantly different between Exp+, Exp-and control CECs (Fig 2B).Due to transcriptional diversity and complexity of TCF4 transcripts, DEXSeq [38] was used to assess differential exon usage utilising our short-read transcriptomic data.
DEXSeq utilises generalised linear models to detect differential exon usage while controlling for biological variation.This analysis allowed us to query gene expression and splicing at the exon level comparing PWC1 (with and without (PWC1_nocov) sex and ethnicity as covariates), PWC2, PWC3, FECD compared to control (FvC) and Exp+ compared to all expansion negative samples (Exp+ v Ctr+Exp-).In concordance with our rMATs and DESeq2 analysis, higher levels of transcriptome-wide differential exon usage events were observed in PWC1 compared to PWC3 (S3 and S12 and S13 Tables).Transcripts containing downregulated exons are primarily included in longer isoforms of the gene which contain 3 AD domains, while the upregulated exons (excluding those surrounding the repeat) appear in shorter TCF4 isoforms containing only two AD domains.This suggests that Exp+ CECs may be employing certain compensatory measures in response to the transcript-specific dysregulation that is induced by the presence of CTG18.1 repeat expansions.Interestingly, our rMATS analysis independently detects this same phenomenon (S14 Table ), which showed a significant mutually exclusive exon splicing event whereby the last exon in common with most 3 AD domain-containing isoforms (Exon 6 in ENST000000354452.8) was favoured in the control group, while the first exon in common with all isoforms (Exon 7 in ENST000000354452.8) was favoured in the Exp+ group.
We also detect retention of the intron flanking the CTG18.1 repeat, previously described by Sznajder et al. 2018 [39] in Exp+ FECD (S12 and S13 Tables).The intronic region is significantly upregulated in Exp+ CECs (E173 and E174) (S12 and S13 Tables).While this intron represents the 5' untranslated region (UTR) of several transcripts of TCF4, it is also included within an intronic region in 32 out of 93 currently annotated TCF4 transcripts.

RNAScope shows robust detection of altered tissue-specific TCF4 isoform expression in CTG18.1-mediated FECD case-derived corneal endothelial cells
We performed spatial transcriptomics, via RNAScope, to further validate our hypothesis that a subset of TCF4 isoforms containing the third AD3 domain are being selectively downregulated due to the presence of an expanded CTG18.1 allele (�50 repeats).For the first time, we visualised TCF4 transcripts within FECD and control CECs and fibroblasts using two distinct sets of probes, termed A and B (Fig 4A).Probe set A comprises a set of probes that allow the visualisation of all TCF4 transcripts in the cell (probe A covers all exons in the transcript Isoform B + which contains 3 AD domains, TCF4-201, ENST00000354452.8,NM_001083962.2).Probe set B targets exons in the canonical longer TCF4-201 TCF4 transcript which do not overlap with the canonical shorter TCF4 transcript (Isoform A+/J which contains 2 AD domains, AD1 and AD2, TCF4-204, ENST00000457482.7,NM_001243234.2).Hence, these probe sets enable the detection, visualisation, and quantification of the ratio of longer AD3-containing transcripts, previously identified to be downregulated in Exp+ CECs via our differential exon usage analysis, compared to total TCF4 transcripts (Fig 3A).
The TCF4 transcripts were visualised in control, Exp+ and Exp-FECD case-derived CECs (n = 3 for each group) (Fig 4B and Tables 1, S1, and S15).Each independent primary CEC culture was also probed with a negative control (targeting bacterial genes) and a (CAG) 7 -Cy3 probe, using a standard FISH protocol, to validate the presence or absence of CUG-specific RNA foci derived from expanded copies of CTG18.1 (S15 Table ).RNA foci were consistently detected in all CTG18.1 Exp+ CECs, and were absent from control and Exp-CECs.Using Cell-Profiler (version 4), puncta representing TCF4 transcripts, detected with both probe set A and B, were quantified and puncta/cell values were established for each image and averaged across images with at least 100 nuclei imaged per condition per line.Finally, a ratio was calculated establishing the proportion of longer AD3 containing transcripts compared to total TCF4 transcripts for each biological replicate.This showed that, on average, Exp+ CECs had a significantly smaller proportion of the longer AD3-containing transcripts (16.33% ± 4.53) compared to unaffected control CECs (36% ± 6.9, p = 0.0145) and non-expanded FECD CECs (30.9% ± 5.17, p = 0.0213) (Fig 4C).To explore potential cell-type specificity of the observed skewed TCF4 transcript ratios we also analysed a series of dermal fibroblast lines generated from unrelated Exp+ patients with FECD (n = 6) [7] and control adult dermal fibroblast lines (n = 3) (S16 Table ).No significant difference (p = 0.61) was observed between the Exp+ (44.2% ± 5.68) and control fibroblast (41.17% ± 12.1) lines, suggesting the dysregulation of TCF4 is cell-type specific.This result suggests that the presence of the expanded repeat alone consistently skews TCF4 transcript ratios to favour shorter isoforms, specifically in CECs.As this subgroup of isoforms typically contains only two out of three activation domains, a higher prevalence of shorter TCF4 isoforms may affect potential dimerization and TCF4 transcriptional function in Exp + CECs.
Rare TCF4 variants identified within a genetically refined CTG18.1 expansion-negative FECD cohort suggest isoform-specific TCF4 dysregulation may be a risk factors for FECD in the absence of CTG18.

expansions
We next wanted to explore if rare and potentially deleterious TCF4 variants could be contributing to disease in FECD cases without a CTG18.1 expansion, given that Exp+ CECs display alternative TCF4 exon usage and skewed ratios of TCF4 isoforms (Figs 3 and 4), with detectable downstream features of TCF4-driven dysregulation (Fig 2D).To test this hypothesis, we interrogated exome data generated from 134 Exp-FECD cases, acquired from our ongoing inherited corneal disease research program, representing approximately 22% of total FECD cases recruited to date.Overall, 8 rare and potentially deleterious (MAF <0.005; CADD score >15) TCF4 variants were identified in 7/134 probands (Table 3).Sanger sequencing confirmed the presence of all 8 variants and exome data from these cases were also interrogated using the same thresholds (MAF <0.005, CADD >15) to determine if any other rare variants in genes previously associated with FECD could also be contributing to disease [4] (S17 Table ).The large number of rare and potentially deleterious TCF4 variants detected in this cohort seemed intriguing given the high levels of evolutionary constraint on this essential and ubiquitously expressed transcription factor encoding gene (pLI = 1, missense Z = 4.1 gno-mAD).We therefore decided to apply CoCoRV, a rare variant analysis framework to determine if the Exp-FECD cohort was indeed enriched for rare and potentially deleterious TCF4 variants compared to the gnomAD dataset.CoCoRV utilises the publicly available genotype summary counts to prioritise disease-predisposition genes in case cohorts [40].Under a dominant model, applying the ethnically-stratified (S18 Table ) analysis pipeline, and filtering for rare (MAF <0.005) and potentially deleterious (CADD score threshold of >15) variants, an inflation factor estimate (λemp) 1.21 was achieved.At the exome wide level TCF4 was ranked as 34/17,025 genes with a p-value of 0.001 suggesting the Exp-FECD cohort could be enriched for rare and potentially deleterious TCF4 variants.However, after applying CoCoRV's stringent false discovery rate correction methodology TCF4 no longer remained significant (corrected p-value = 0.249).Interestingly rare variants identified in Proband A (Variant 1a/1b), B (Variant 2) and D (Variant 4) and E (Variant 5) are all predicted to only affect a small fraction of total TCF4 transcripts (Tables 3 and S19 and Fig  (Glu22 =)).Despite being synonymous, Splice AI predicts the variant introduces loss of the splice donor site (SpliceAI Δ score 0.78).Furthermore, SpliceRover also predicts that this could result in the activation of a cryptic splice donor site downstream (from 0.098 to 0.233) of the wildtype donor site, which would introduce a short frameshift insertion followed by a premature termination codon (PTC) c.66_67insGTGCTCGATGAATTTTC, p.(Arg23Valfs*12).The variant identified in Proband D (Variant 4) is within the protein coding region of three (out of 93) TCF4 transcripts and in the 5' UTR of one additional transcript.Variant 5 is only encompassed by the 5' UTR of a single TCF4 transcript.Collectively, all of these identified variants have the potential to exert a functional impact on TCF4, but importantly only on a small subset of overall transcripts.

Discussion
FECD is now recognized to be the most common repeat expansion-mediated disease in humans, with up to 80% of European FECD cases harbouring CTG18.1 expansions [4].The non-coding nature of the repeat and the transcriptomic complexity of TCF4 have made it extremely experimentally challenging to decipher the molecular mechanisms underlying this disease.Thus, many aspects of its pathophysiology remain elusive, despite the strong mechanistic parallels with other repeat-mediated diseases, and the urgent clinical need for new treatments for this common and visually disabling disease.To tackle this challenge, we employed a large, biologically independent set of primary FECD case-derived CEC lines and numerous complementary transcriptomic methodologies to gain novel mechanistic insight into both RNA toxicity driven and TCF4 dysregulation-related molecular features of this repeat-mediated disease.By including both Exp+ and Exp-FECD data, we have been able to dissect out more generic downstream biomarkers that occur irrespective of CTG18.1 expansion status.Our extensive long-and short-read RNA-Seq data and spatial transcriptomic analysis of FECD case-derived CEC lines demonstrates that, in addition to being a spliceopathy, expansion-positive FECD is concurrently associated with isoform-specific patterns of TCF4 dysregulation, hence providing novel mechanistic insights into the pathophysiology of this triplet-repeat mediated disease.Furthermore, we have identified a subset of Exp-FECD cases with rare and potentially deleterious TCF4 variants, supporting the hypothesis that dysregulation of TCF4 itself irrespective of CTG18.1 expansions may also be a risk factor for FECD.
Our robust transcriptomic analysis, presented within a biologically relevant cellular and genomic context, highlights an array of novel molecular signatures driven by CTG18.1 expansions, as well as validating a range of previously described Exp+-associated transcriptomic events.We anticipate that many of the novel events identified here could serve as informative biomarkers to aid the development of CTG18.1 targeted therapeutic approaches.To date, CUG-containing RNA foci and MBNL1/2 missplicing have been adopted as the standard for therapeutic outcome measures [7,27,41,42].Data presented here support and expand the hypothesis that accumulation of repeat expansion-derived RNA structures induce toxic gainof-function effects that disrupt cellular homeostasis and induce aberrant RNA metabolism [7,27].For instance, we have discovered a novel array of RBP encoding transcripts, including MBNL3, CELF1 and a range of hnRNP and DDX family members, uniquely dysregulated in Exp+ CECs that warrant future follow up to elucidate their respective contributions to aberrant RNA metabolism related mechanisms.However, such biomarkers may only relate to the spliceopathy aspects of the disease, and our data go on to also illustrate that translational efforts would also benefit from using more diverse outcome measures.In addition to RNA metabolism biomarkers, global Exp+-mediated differential gene expression profiles, alongside TCF4-specific dysregulation itself, should also be included as relevant outcome measures for Exp+ therapies being developed, given we hypothesise these processes are also pivotal to the pathology.Novel cell surface biomarkers reported here that are specifically upregulated in Exp+ (e.g.ITGA5 and ITGB1) could have particular utility in a clinical trial setting due to their accessible locations.Other membrane encoding proteins interfacing with the extracellular matrix or signal transduction could also be a rich source of potential external markers, such as cell surface receptors involved in the P13K-Akt pathway (e.g.EGFR, MET, and LTK) that we show are specifically upregulated in Exp+ CECs.
Genetic stratification of patient-derived CECs in this study has also highlighted that Exp+ FECD is transcriptionally distinct from Exp-disease.The data show that 76% of all differentially expressed genes, 81% of skipped exon events, and 67% of all enriched pathways detected in Exp+ are unique to the repeat expansion disease.For example, in accordance with previous studies [12,43], we observe that pronounced upregulation of fibronectin (FN1) is a robust transcriptomic feature of FECD, irrespective of genetic underlying cause.However, our comprehensive transcriptomic analysis has enabled us to determine that the EDA/B+ forms of fibronectin [26] are significantly favoured in Exp+ FECD.This illustrates that FN1 expression and splicing can act as both a general FECD molecular biomarker and an Exp+ specific biomarker.Similarly, AQP1 previously documented to be downregulated in FECD [44], has here been demonstrated to only display marked downregulation in Exp+ CECs, illustrating the importance of stratifying cases based on CTG18.1 status.Our long-read data suggests that the splicing diversity seen in Exp+ FECD is not predominantly a result of abnormal missplicing, given the proportion of annotated full splice match transcripts remains the same between control and Exp+.This further exemplifies the unique transcriptome of Exp+ FECD endothelial cells.However, the increased proportion of non-canonical splice junctions seen in Exp+ also suggests that CTG18.1-specificisoforms may be present within the endothelium and warrants future investigation.However, transcriptome wide differences in coverage were observed between the Exp+ FECD versus the control group within the long-read data, thus it will be important for future long-read approaches to further validate these findings.These examples, alongside the emerging evidence that the Exp+ FECD displays distinctive pathological features and epidemiological characteristics [5,7], suggests that Exp+ FECD should be categorised as a distinct clinical entity.We suggest integrating CTG18.1 genotyping into routine FECD clinical care pathways is warranted, especially given the numerous international efforts underway to develop gene-directed therapies for CTG18.1-mediatedFECD [27,41,42,45].
Our complementary RNA-seq approaches, in combination with the spatial resolution of TCF4 transcripts subgroups, has highlighted a unique and distinctive pattern of TCF4-specific dysregulation that occurs within Exp+ FECD.Importantly, this signature of dysregulation was not detected within a patient and control dermal fibroblast cell series, implying it may be unique to affected cell types.Additionally, our pathway analysis has isolated features of dysregulation that we hypothesise represent downstream consequences of this isoform shift and an important pathogenic component of the disease.Notably, we have observed commonality between signalling transduction pathways dysregulated in our Exp+ CECs (e.g.P13K-Akt and MAPK) and those dysregulated in neural progenitor cell TCF4 knockdown transcriptomes [33].We postulate that this convergence is being driven by TCF4 functional dysregulation, as the shift in isoform distributions towards shorter (1-2 AD domains) versus longer (2-3 AD domains) isoforms would result in lowered TCF4 activity within the corneal endothelial cell.TCF4 isoforms with 3 AD domains have been shown to have an order of magnitude more transcription activation function than the shorter isoforms containing 1 or 2 activation domains [16].Furthermore, given that domains in the AD3-containing isoforms can also dictate intracellular location and dimerization partners of TCF4, we also hypothesise that additional TCF4-driven functional implications of this shift occur within affected CECs that requires future experimental investigation.Interestingly, a recent single-cell (sc)RNA-seq study has identified that varying levels of TCF4 expression exist within discrete populations of healthy CECs [46].It was not possible as part of this study to genotype CTG18.1 whilst concurrently analysing transcriptomic profiles of the CEC populations, thus future development and application of scRNA-seq in combination with single cell genotyping of CTG18.1 within FECD-case derived CEC samples is anticipated to shed light on if and how genomic instability of this repeat element within the corneal endothelium may drive and/or exacerbate the pathophysiology and explain the unique vulnerability of this cell layer to CTG18.1 expansions.[47,48] Additionally, we identify rare and predicted deleterious TCF4 variants in 7/134 Exp-FECD cases.This was notable to us given the high levels of evolutionary constraint acting on TCF4 and suggests such rare variants may be acting as risk factors for FECD, independently of CTG18.1 repeat expansion.This could also in-part explain the established missing heritability of the disease [4] and does further reinforcing the hypothesis that dysregulation of TCF4 itself is pivotal to the pathophysiology of FECD.Despite a rare variant gene-burden style analysis of these data not reaching significance, we observed an emerging trend whereby a subset of qualifying variants identified notably affected regions of the gene not included by the vast majority of TCF4 isoforms.Hence, they can only be predicted to exert a functional impact on a small subset of total TCF4 transcripts.This observation is in keeping with the relatively mild and tissue specific nature of FECD, which is in stark contrast to the severe neurodevelopmental disorder Pitt-Hopkins associated with total TCF4 haploinsufficiency [49].Hence, we propose that TCF4 variants which subtly affect TCF4 functionality may induce CEC-specific disease.Further experimental investigation of these variants and the potential tissue specific role they may exert in the corneal endothelium warrants further future investigation.
Parallels can be drawn to other repeat expansion-mediated disorders, such as fragile X syndrome (FRM1) [50], spinocerebellar ataxia type 7 (ATXN7, MIM #164500) [51] and C9orf72mediated ALS [52], whereby rare loss-of-function mutations have been proposed to induce comparable phenotypic outcomes to non-coding repeat expansions.However, in these instances loss of total protein isoform function, in combination with toxic gain-of-function mechanisms, is thought to underlie the respective pathogenic mechanisms, in contrast to this scenario whereby we hypothesise only a fraction of total TCF4 isoforms are susceptible to the effect of either CTG18.1 expansions or TCF4 variants within the corneal endothelium.In a limited number of DM1 cases, FECD has been recognised to be part of the phenotypic spectrum, suggesting that CEC dysfunction can also arise from CTG repeat expansions in DMPK [53,54].Data presented in this study suggests that TCF4 variants, independent of the repeat, may also represent a potential path to FECD.We suggest future FECD genetic screening approaches should include sequencing of coding and regulatory regions of TCF4, to further elucidate the potential role that rare TCF4 variants may play in the absence of CTG18.1 expansions.
In conclusion, this study supports the hypothesis that at least two distinct pathogenic mechanisms, RNA toxicity and TCF4 isoform-specific dysregulation, underpin the pathophysiology of the transcriptionally distinct CTG18.1-mediatedFECD.In our established primary corneal endothelial system, we see evidence of TCF4 isoform dysregulation at the TCF4 transcript level and can detect the potential downstream effects of the isoform-specific pattern of TCF4 dysregulation.Our data also suggests that, in rare instances, irrespective of CTG18.1 expansions, FECD cases can harbour TCF4 variants that may independently induce isoform-specific TCF4 dysregulation and act as risk factors for disease.This supports the hypothesis that TCF4 dysregulation is a key mechanism underlying this common, age-related and visually disabling disease.We anticipate that concepts and biomarkers identified by this study will inform and guide the development of future translational interventions targeting this triplet-repeat mediated disease, and these in turn may act as exemplars for much rarer and currently untreatable and severe repeat expansion disorders that typically affect less accessible neurological/neuromuscular systems.

Materials and methods
endothelium and Descemet membrane (DM) removed as part of the surgical procedure (MEH).All individuals recruited with FECD undergoing DMEK surgery had confluent guttae affecting the central cornea with visual symptoms and were of similar age.Importantly, all individuals at the time of bio-sample recruitment had similarly advanced disease (Krachmer Grade 5).
We applied strict criteria in the selection for control tissues.We obtained control tissue as donor (age >45 years) corneoscleral discs classed as unsuitable for clinical transplantation due to non-CEC related reasons (Miracles in Sight Eye Bank, Texas, USA).We selected the age of the donor tissues to ensure there was no statistically significant differences in the ages between control and case at the time of bio-sample recruitment.All control corneas were examined and were shown to have a high CEC counts (�2,600 cells/mm 2 ) with no guttae or other endothelial abnormality.To standardise handling, the control DM and corneal endothelium were excised by a consultant-grade surgical clinician in accordance with standardised clinical practice.All excised case and control tissues were stored in Lebovitz L-15 media (Life Technologies) before transferring to the laboratory for processing within 24 hours of excision from either donor corneas or patients undergoing elective posterior corneal transplantation surgery.
Participant genomic DNA for CTG18.1 genotyping was extracted from whole blood (QIAgen Gentra Puregene Blood Kit), whereas control DNA was extracted from the residual corneoscleral disc (QIAgen Blood and Tissue kits).

CTG18.1 genotyping
We determined the CTG18.1 repeat length using a short tandem repeat (STR) genotyping assay as previously described in Zarouchlioti et al. 2018 [7], originally adapted from Wieben et al. 2012 [55].Briefly, STR-PCR was performed using a 6'-FAM-conjugated primer (5' CAGATGAGTTTGGTGTAAGAT 3') upstream of the repeat and an unlabeled primer (5' ACAAGCAGAAAGGGGGCTGCAA 3') downstream.Post-PCR, products were combined with GeneMarker ROX500 ladder and separated on an ABI 3730 Electrophoresis capillary DNA analyzer (Applied Biosystems).Sizing and data analysis was performed on GeneMarker software (SoftGenetics).To distinguish between samples with presumed bi-allelic nonexpanded alleles of equal length and samples with a larger repeat length beyond the detection limit of the STR-PCR, we performed a further triplet-primed PCR assay to confirm the presence or absence of a CTG18.1 allele above the detection range of the STR assay (approximately 125 repeats), following previously published methods [56].

Fibroblast cell culture
Fibroblast cell lines from dermal skin biopsies were cultured DMEM/F-12, GlutaMAX (Life Technologies) supplemented with 10% FBS and 1% penicillin/streptomycin.Throughout culture, cells were kept in an incubator at 37˚C, 5% CO 2 and medium was refreshed every 48 hr until the cells showed appropriate confluence for experimentation or passage.

RNA extraction
All primary CEC monolayer cultures selected for downstream analysis displayed a consistent morphology characteristic of CECs in vivo.Total RNA was extracted from the primary CEC lines using a NucleoSpin RNA XS kit (Macherey-Nagel), in accordance with the manufacturer's protocol.The RNA was eluted and stored at -80˚C prior to sequencing.To assess RNA integrity, RNA integrity numbers (RIN) were determined for all RNA samples using a Bioanalyzer 2100 (Agilent).All samples analysed as part of this study had a RIN value �9.6.

Short-read RNA-seq
Stranded RNA-seq libraries were prepared by BGI Genomics using the TruSeq RNA Library Prep Kit (Illumina) following the manufacturer's protocol.We sequenced these with a HiSeq 4000 platform (Illumina) with short-read strand-specific library prep with poly A enrichment (Illumina).Paired-end reads were filtered to remove low-quality reads (Q �5) or those with >10% of N base calls.Adaptor sequences were also removed during data quality analysis.We checked the quality of the resulting fastq files for the short-read data using FastQC Quantification [58].Reads were aligned to the human genome (hg38) patch 13 using STAR [59] (v2.7.10a) and Salmon (V1.4.0) [60].After alignment, hierarchical clustering was performed using PoiClaClu [61].We also performed principal component analysis (PCA) using PCAtools [62] (v2.2.0) to determine global sample clustering.Sex and ethnicity have been factored in as covariates in all applicable downstream pipelines.
We assessed differential splicing at a junction-based level with rMATS-turbo 4.1.2[23] with the Ensembl hg38 release 105 GTF file as reference.Default rMATS settings were used.For this study, we focused on splicing events that are fully/partly annotated by Ensembl.Significant alternative splicing events were identified as an absolute delta percent spliced in (psi) magnitude � 0.1 and FDR � 0.05.rMATs output was subsequently analysed using rMAPS2 [30] and maser 1.12.1 [67].All script utilised to generate data can be found at: https://github.com/InheritedCornealDisease/TCF4-transcriptomic-paper-2024

RNAScope detection of TCF4 transcripts
We seeded primary FECD and control CECs and fibroblasts on chamber slides (Lab-tek) and confluent cell cultures were stained using standard RNAScope protocols with minor adjustments (ACD Bio).We used the Hs-TCF4-C2 (cat.557411) probe set to detect all TCF4 isoforms, as well as a custom set of TCF4 probes to detect all 5' exons in the TCF4 transcript NM_001083962.2excluding all 3' exons contained in the shorter transcript NM_001243234.2.Akoya TSA Cy3 fluorophores were used to visualise probe staining.Negative control probes (cat.320871) were provided by ACD-Bio to verify specificity.All cell lines analysed by RNA-Scope were also tested for RNA foci in parallel via fluorescence in situ hybridization (FISH) using a CUG-specific probe (Cy3-(CAG) 7 ) [7].A minimum of 100 cells and four images were acquired for each cell line and each probe.

Exome sequencing and rare variant analysis
Exome sequencing libraries were generated from 134 whole blood derived gDNA samples from individuals with Exp-FECD, using a SureSelect Human All Exome V6 capture kit (Agilent, USA) or SeqCap EZ MedExome Enrichment Kit (Roche) and sequenced on a HiSeq 4000 or HiSeq 2500 platform (Illumina).All raw FECD case sequencing data was aligned and annotated following our previously published methods [71].In brief, sequencing reads were aligned using Novoalign (version 3.02.08)and variants and indels were called according to Genome Analysis Toolkit (GATK) (version 4.4.0.0) best practices (joint variant calling followed by variant quality score recalibration) were normalised and quality-controlled, as recommended by CoCoRV [40].CoCoRV, a rare variant analysis framework, was applied to prioritise diseasepredisposition genes in the Exp-FECD cohort, utilising publicly available genotype summary counts from gnomAD (genomes version 2.1.1)as a control dataset.For consistency, both case and control variants were annotated using the Variant Effect Predictor (VEP) (release 110) [72].The CoCoRV pipeline was then run, applying a dominant model and ethnically stratified in accordance with ethnicity predictions generated from the case cohort using Fraposa [22].Fraposa is based on principal component analysis, using 1000 Genomes project as the reference panel (with known ancestral background, categorised into 5 distinctive superpopulations: Africans, admixed Americans, East Asians, Europeans and South Asians).Common variants between the control and exome cohort were identified and filtered (based on minor allele frequency, linkage equilibrium and missing genotype call rate) by plink and the principal components were computed with the default OADP approach (Online Augmentation, Decomposition, and Procrustes Transformation).Variants of interest were defined as having a CADD score (CADD) >15 [73], and a minor allele frequency (MAF) <0.005 in the Genome Aggregation Database (version 2.1.1)[74].FECD cases identified to harbour rare TCF4 variants were additionally interrogated for rare and potentially disease-associated variants (MAF<0.005 in gnomAD and Kaviar Genomic Variant Database; CADD>15) in previously reported FECD-associated genes, including COL8A2, SLC4A11, AGBL1 and ZEB1 [4].Splice site predictions were subsequently made using SpliceAI [75] and SpliceRover [76] for all qualifying variants in TCF4 and any other FECD-associated genes.

Fig 1 .
Fig 1. Exp+ FECD samples are transcriptomically distinct in both short-and long-read datasets when compared to controls.A) Bar chart showing proportion of annotated genes in long read transcriptomic dataset with only one transcript detect vs 2 or more transcripts in Control andExp+ samples.Exp+ long-read samples have a higher proportion of genes with more than one isoform identified.B) Proportion of annotated transcripts with full splice matches.Exp+ FECD and Control samples have similar percentages of full splice matches suggesting additional splicing diversity seen in Exp+ is not predominantly due to aberrant splicing.C) Hierarchical clustering heatmap of sample-to-sample distances within short-read RNA-Seq data.Darker colours denote higher similarities between samples, demonstrating that Exp+ and control are more similar within each subgroup than with other samples.Exp-shows a similar result but is less pronounced, likely due to differences in unidentified underlying genetic causes.D) Principal component plot (PCA) of all short-read samples with sex, disease state and ethnicity as covariates.All three subsets of data cluster separately in the PCA plot.Control samples appear as circles, FECD Exp+ appear as orange triangles, FECD Exp-appear as green triangles.Male individuals are represented as filled shapes while female individuals are outlined.https://doi.org/10.1371/journal.pgen.1011230.g001

Fig 2 .
Fig 2. Differential gene expression analysis on Exp+ FECD, Exp-FECD, and control primary corneal endothelial cell (CEC) transcriptomes.A) Differential gene expression was assessed via DESeq2.Three pairwise comparisons were conducted: PWC1 (Exp+ vs control), PWC2 (Exp+ vs Exp-), and PWC3 (Exp-vs control).Venn diagrams show overlap of significantly differentially expressed genes (adjusted p-value < 0.05) within these three comparisons.B) Total TCF4 expression showed no significant dysregulation between the different groups.C) Volcano plot shows the distribution of gene expression in Exp+ FECD compared to the control group.Horizontal dotted lines denote the boundary between significance (adjusted p-value < 0.05), whereas vertical dotted lines show a shrunkLFC cutoffs for -1 and 1. Dark grey solid markers show significantly expressed genes with a fold change magnitude greater than 1.Solid orange markers delineate uniquely Exp+ differentially expressed genes.D) Pathways enriched in Exp+ vs Control (g:profiler).Shaded grey boxes show direction of pathway enrichment in TCF4 knockdown models (Doostparast Torshizi et al. 2019) [33].Green dots show pathways that are significantly enriched in PWC3 as well.Solid orange dots and open orange circle show that pathway enrichment persists in genes unique to Exp+ (1,028 genes and 3,292 genes, respectively).https://doi.org/10.1371/journal.pgen.1011230.g002 Fig 3. CTG18.1 expansion-mediated FECD results in alternative TCF4 exon usage within corneal endothelial cells (CECs).A) Differences in exon usage between control (grey dotted line) and Exp+ (orange dotted line) CECs.Significantly upregulated and downregulated exons are highlighted with blue and red vertical lines, respectively.The location of CTG18.1 is depicted with an orange line.Inset boxes magnify dysregulated regions.A genomic schematic of TCF4 with transcripts aligned is presented below the axis.Key protein domains highlighted (green-basic helix loop helix (bHLH) domain, blue-activation domains (ADs), red-main nuclear localization signal (NLS).B) Differences in exon usage between control (grey dotted line) and Exp-(green dotted line) CECs.No exons were statistically dysregulated.https://doi.org/10.1371/journal.pgen.1011230.g003

Fig 4 .
Fig 4. RNAScope probes targeting TCF4 transcripts demonstrate that longer transcripts containing activation domain 3 (AD3) account for a smaller proportion of total transcripts specifically in Exp+ FECD corneal endothelial cells (CECs).A) Probe targeting schematic for TCF4.Probe A targets all exons in NM_001083962.2and visualises total TCF4 transcripts, while Probe B targets only the 5' exons, excluding those that are contained within shorter transcripts (NM_001243234.2).Probe B visualises the longer subset of AD3-containing transcripts within the cell.CTG18.1 is highlighted in orange.Green bar denotes TCF4 bHLH region, blue regions highlight TCF4 activation domains, red box shows the bipartite TCF4 NLS signal and orange shows the genomic region containing the CTG18.1 repeat.Key domains highlighted (green-basic helix loop helix (bHLH) domain, blue-activation domains (ADs), red-main nuclear localization signal (NLS)).B) Representative confocal images of Exp+, Exp-and control CECs and Exp+ and control fibroblasts with nuclei (blue, DAPI) and TCF4 puncta (visualised with Cy3 but represented here with white).Scale bars represent 20 μm.C) Plotting the proportion of longer AD3-containing TCF4 transcripts in Exp+ (orange), control (black), and Exp-(green) shows that Exp+ have significantly lower proportion of AD3-containing transcripts compared to both control (p = 0.0145) and Exp-(p = 0.0213) CECs, while there is no significant difference between control and Exp-(p = 0.362).D) Plotting the proportion of longer AD3-containing TCF4 transcripts in Exp+ (orange), and control (black) shows that there is no statistically significant (p = 0.61) difference in TCF4 isoform ratios between the two groups.Open circle denotes mean and error bars show standard deviation.Diamond datapoint shows bi-allelic expanded FECD cells.Significance determined with student T-test.https://doi.org/10.1371/journal.pgen.1011230.g004 5).For Proband A, two consecutive in cis heterozygous missense and nonsense variants were identified.These occur within the coding region of only one (out of 93) TCF4 transcripts (ENST00000566286.5;c.[57G>T; 58A>T]; p.[(Arg19Ser; p. Lys20*)]) and in the 5' UTR of a further 5/93 transcripts.The variant identified in Proband B is only encompassed by the same 6 TCF4 transcripts (ENST00000566286.5;c.66G>A, p.

Fig 5 .
Fig 5. Schematic map of rare and potentially deleterious variants identified in individuals with molecularly unsolved FECD within TCF4, without CTG18.1 expansions.All 8 rare (minor allele frequency (MAF)<0.005)and potentially pathogenic (CADD>15) variants in 7 Exp-Probands are mapped along TCF4 genomic region and transcriptomic locations contextualised.Proband A variants are found in cis within the proband.Conserved intronic transcripts refer to variants located in intronic regions identified as conserved via Genomic Evolutionary Rate Profiling (GERP) conservation scores on Ensembl.Splice modifying refers to transcripts where the variant identified would induce alternative splicing.Transcript numbers where the variant occurred within introns were not included.UTR: untranslated region.NLS: nuclear localization signal.bHLH: basic helix loop helix.https://doi.org/10.1371/journal.pgen.1011230.g005