KLF3 and PAX6 are candidate driver genes in late-stage, MSI-hypermutated endometrioid endometrial carcinomas

Endometrioid endometrial carcinomas (EECs) are the most common histological subtype of uterine cancer. Late-stage disease is an adverse prognosticator for EEC. The purpose of this study was to analyze EEC exome mutation data to identify late-stage-specific statistically significantly mutated genes (SMGs), which represent candidate driver genes potentially associated with disease progression. We exome sequenced 15 late-stage (stage III or IV) non-ultramutated EECs and paired non-tumor DNAs; somatic variants were called using Strelka, Shimmer, SomaticSniper and MuTect. Additionally, somatic mutation calls were extracted from The Cancer Genome Atlas (TCGA) data for 66 late-stage and 270 early-stage (stage I or II) non-ultramutated EECs. MutSigCV (v1.4) was used to annotate SMGs in the two late-stage cohorts and to derive p-values for all mutated genes in the early-stage cohort. To test whether late-stage SMGs are statistically significantly mutated in early-stage tumors, q-values for late-stage SMGs were re-calculated from the MutSigCV (v1.4) early-stage p-values, adjusting for the number of late-stage SMGs tested. We identified 14 SMGs in the combined late-stage EEC cohorts. When the 14 late-stage SMGs were examined in the TCGA early-stage data, only Krüppel-like factor 3 (KLF3) and Paired box 6 (PAX6) failed to reach significance as early-stage SMGs, despite the inclusion of enough early-stage cases to ensure adequate statistical power. Within TCGA, nonsynonymous mutations in KLF3 and PAX6 were, respectively, exclusive or nearly exclusive to the microsatellite instability (MSI)-hypermutated molecular subgroup and were dominated by insertions-deletions at homopolymer tracts. In conclusion, our findings are hypothesis-generating and suggest that KLF3 and PAX6, which encode transcription factors, are MSI target genes and late-stage-specific SMGs in EEC.


Introduction
Defects in mismatch repair can result in DNA strand slippage and the appearance of microsatellite instability (MSI) [1]. MSI is common in endometrial carcinoma (EC) in which it occurs in~30% of sporadic tumors. In this context, MSI generally results from MLH1 hypermethylation and is associated with a hypermutated genome [2][3][4]. MSI/hypermutated ECs are one of four distinct molecular subgroups of EC, defined by The Cancer Genome Atlas (TCGA) [2]. The three remaining subgroups are referred to as POLE/ultramutated, copy number-low/ microsatellite stable (MSS), and copy number-high (serous-like) [2]. Each molecular subgroup has distinct clinical outcomes [2] (and reviewed in [5]) and the prognostic utility of this molecular classification is an area of active exploration.
Endometrial carcinoma (EC) exacts a significant toll on women's health. It resulted in 89,929 deaths globally in 2018 [6], and is projected to cause 12,940 deaths within the United States in 2021 [7]. Importantly, EC incidence is increasing annually in the US and many other countries [8]. This phenomenon is likely partly due to increasing rates of obesity [9], a wellrecognized epidemiological risk factor for endometrioid endometrial carcinomas (EECs) that make up 75%-80% of all newly diagnosed endometrial tumors. EECs most often present as low-grade, early-stage (stage I or II) tumors, that are confined within the uterus [10]. Five-year survival rates for patients with low-grade, early-stage disease are high because surgery is often curative for this patient population, due to the limited extent of disease [10]. In contrast, patients with late-stage EEC have relatively poor outcomes [11], despite more aggressive treatment approaches of surgery with adjuvant chemotherapy or radiotherapy [12][13][14]. Thus, increasing tumor stage is an adverse prognosticator for EEC that is used in the clinical setting, as are high tumor grade (Grade 3; G3), and extent of lymphovascular space invasion [15]. The prognostic utility of molecular classification, according to POLE, microsatellite instability (MSI), and TP53/p53 status, is an area of active exploration originating from The Cancer Genome Atlas (TCGA) discovery that EECs can be subclassified into four molecular subgroups associated with distinct clinical outcomes [2](and reviewed in [5]).
Given the dynamic nature of tumor genomes during disease initiation and progression, it is conceivable that the repertoire of pathogenic driver genes may differ in late-stage compared to early-stage EEC. However, the annotation of SMGs in primary EEC exomes by TCGA was performed in a stage-agnostic manner [2,16]. An improved understanding of the molecular etiology of late-stage EEC may provide novel insights into disease pathogenesis and progression. The aim of this study was to delineate SMGs in late-stage EEC exomes, and to determine whether these genes are also significantly mutated in early-stage disease. To this end, we exome sequenced 15 "in-house" late-stage EECs (National Human Genome Research Institute (NHGRI) cohort) and reanalyzed somatic mutation calls from 66 late-stage and 270 early-stage non-ultramutated EECs within TCGA (Fig 1). Collectively, we identified 14 SMGs in 81 late-stage tumors. Krüppel-like factor 3 (KLF3) and Paired box 6 (PAX6), which encode transcription factors, were SMGs in late-stage tumors, but were not statistically significantly mutated in early-stage tumors. All KLF3 mutations, and almost all PAX6 mutations, were in the MSI-hypermutated EEC subgroup; within this subgroup, KLF3 and PAX6 mutations were more frequent in late-stage than early-stage tumors. The mutation spectrum of both genes included recurrent insertions-deletions (indels) at homopolymer tracts, consistent with strand slippage resulting from mismatch repair defects and suggesting that PAX6 and KLF3 are likely MSI target genes.

Ethics statement
The NHGRI cohort of de-identified, fresh-frozen endometrioid endometrial tumors and matched non-tumor (normal) samples were obtained from the Cooperative Human Tissue Network (CHTN). The National Institutes of Health Office of Human Subjects Research Protections determined that research using these specimens was exempt from IRB review. Because the specimens were obtained from CHTN as de-identified specimens with an agreement that we will never request re-identification, we do not have information on whether consent was written or oral.

NHGRI clinical specimens
For 15 cases in the NHGRI cohort, de-identified, fresh-frozen endometrioid endometrial tumors and matched non-tumor (normal) samples were obtained from the Cooperative Human Tissue Network (CHTN) ( Table A in S1 Table). The National Institutes of Health Office of Human Subjects Research Protections determined that this research was not human subject research, per the Common Rule (45 CFR 46). For each tumor sample, an H&E stained section was reviewed by an experienced gynecologic pathologist to identify regions containing �70% neoplastic cellularity; accompanying surgical pathology reports were retrospectively evaluated by the same gynecologic pathologist to annotate tumor stage using the International Federation of Gynecology and Obstetrics (FIGO) 2009 classification (Table A in S1 Table).

Genomic DNA preparation and next-generation sequencing
Genomic DNA extraction, identity testing and MSI analysis of tumor and normal samples in the NHGRI cohort were performed as previously described [19]. DNA was purified by phenol-chloroform extraction prior to library preparation. DNA libraries were prepared using the SeqCap EZ Exome + UTR capture kit (Roche) and sequenced with the Illumina HiSeq 2000 platform (Illumina). A flow diagram summarizing the approaches and methods used to generate and analyze the NHGRI exomes is provided in S1 Fig.

Alignment and variant calling
Short sequence reads from NHGRI cohort exomes were aligned to the Hg19 human reference sequence using NovoAlign version 2.08.02 (University of California at Santa Cruz). Four somatic mutation detection algorithms, Strelka [20], Shimmer [21], SomaticSniper [22], and MuTect [23], were used to call potential somatic variants. Insertions and deletions (indels) were identified by Shimmer and Strelka, while single nucleotide variants (SNVs) were identified by all four somatic algorithms. Strelka workflow version 1.0.14 (https://doi.org/10.1093/ bioinformatics/bts271) was run with default parameters. Shimmer version 0.2 (https://github. com/nhansen/shimmer) was run with-min_som_reads = 6 and-minqual = 20 [21]. Soma-ticSniper version 1.0.5 was run with options -Q 40 -G -L, followed by the "standard somatic detection filters" described in Larsen et al [22]. MuTect version 1.1.5 was run with default parameters, and data were then filtered to include only calls designated as "KEEP" in the program's output [23]. Following analysis with each algorithm, a VarSifter-formatted file was generated containing the somatic variant allele frequencies observed in each tumor and matched normal sample for every called variant [24]. ANNOVAR (downloaded on August 12, 2014) was used to annotate all variants using the UCSC "known genes" gene structures [25].

Variant filtering
Coding, splicing, and non-coding (intronic, 3' or 5' untranslated region (UTR), and 1kb upstream of the transcription start or downstream of the transcription end site) somatic variant calls in the NHGRI cohort were displayed using VarSifter [24]. We prioritized mutations for the NHGRI tumors using criteria similar to those that have been shown to yield accurate mutation datasets in past studies [26][27][28][29][30][31]. A minimum of 14 reads covering a site in the tumor and 8 in the normal were required for mutation calling [26,27]; potential germline variants (those with a variant allele frequency (VAF) of greater than 3% in matched normal samples) were excluded. Coding and splice-site single nucleotide variants (SNVs) were annotated against dbSNP Build 135 and nonpathogenic single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) greater than 5% were excluded. Indel variants that were present in dbSNP Build 135 were excluded without further evaluation of MAF. SNVs called by all four algorithms and indels called by either Strelka or Shimmer were retained and further annotated against GENCODE hg19 using Oncotator (v1.5.3.0) (http://www.broadinstitute.org/ oncotator) [32]; noncoding variants, those with a variant classification of UTR, Flank, lincRNA, RNA, Intron, or De novo start were excluded.

TCGA data analysis
A subset of TCGA Uterine Corpus Endometrial Carcinoma (UCEC) somatic mutation data (TCGA UCEC PanCancer Atlas [16]) was extracted from the MC3 Public MAF file (mc3.v0.2.8. PUBLIC.maf.gz, https://gdc.cancer.gov/about-data/publications/mc3-2017) [33]. Briefly, the MC3 Public MAF file was filtered to include somatic variants from 336 EECs from the MSIhypermutated (n = 141), copy number-low/MSS (n = 140) or copy number-high (n = 55) molecular subgroups; variants from EECs within the ultramutated-POLE molecular subgroup or those without a molecular subgroup assignment were excluded ( Table B in S1 Table). The TCGA mutation dataset used in our manuscript had been previously filtered to retain only the highest quality calls using both coverage and population frequency information [33]. Molecular subtype annotation for each sample was obtained from the cBioPortal for Cancer Genomics [34,35]. Variants with a PASS, WGA, or Native_WGA_mix designation as described by [33] were retained and further filtered to include SNVs called by MuTect and Indels called by Indelocator [16]. The final set of selected variants was annotated against GENCODE hg19 using Oncotator (v1.5.3.0) (http://www.broadinstitute.org/oncotator) [32]; noncoding variants, those with a variant classification of UTR, Flank, lincRNA, RNA, Intron, or De novo start were excluded. Additional clinicopathologic information for each tumor, including histology, stage, and grade, was obtained from Berger et al [16], and the cBioPortal for Cancer Genomics (URL: https://www.cbioportal. org/) [34,35] (Table B in S1 Table). Early-stage tumors were defined herein as stage I or II; latestage tumors were defined as stage III or IV. A flow diagram summarizing the approaches and methods used to analyze the TCGA mutation calls is provided in S2 Fig.

Power analysis
MutSigCV's statistical power to detect SMGs was estimated using the binomial model described in [36]. Briefly, the probability of obtaining a p-value < = 0.1/14 (for 14 tests) was calculated assuming a background mutation rate of p 0 = 1−(1−μf g ) 3/4L , where μ is the background mutation rate, and f g = 3.9 and L = 1500 are the 90 th percentile gene-specific mutation rate factor and gene length, respectively. We also assumed a signal mutation rate of p 1 = p 0 +r(1 −m), where r is the frequency of non-silent mutations in tumor samples and m = 0.1 is the mis-detection rate. Power estimates were performed and plotted for a range of mutation rates and frequencies (S3 Fig) using an R script available at https://github.com/nhansen/ LateStageEECs.

Annotation of SMGs
SMGs were annotated using MutSigCV (v1.4). Briefly, MutSigCV (v1.4) was run on the NIH high-performance computing Biowulf cluster (http://hpc.nih.gov) using the coverage, covariate, and mutation type dictionary files provided by the Broad Institute. Filtered somatic variants for each data set were annotated against GENCODE hg19 using Oncotator (http:// www.broadinstitute.org/oncotator) [32], noncoding variants were excluded in accordance with a published approach [37], and the resulting coding mutation annotation format (maf) files were uploaded to the Biowulf cluster. Somatically mutated genes with a false discovery rate (q-value) �0.10 were defined as SMGs in accordance with a published approach [36].
Determining whether late-stage SMGs are statistically significantly mutated in early-stage tumors MutSigCV (v1.4) was run as described above on the set of filtered somatic variants from the 270 early-stage EECs to obtain p-values for all mutated genes. For all genes annotated as SMGs in late-stage tumors, q-values were re-calculated from the MutSigCV (v1.4) p-values assigned to the early-stage data, adjusting for 14 tests (reflecting the total number of SMGs identified in late-stage tumors).

Survival analyses
We utilized the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) to query the relationship between SMG mutation status and survival (overall-, disease-free-, progressionfree-, and disease-specific-survival) stratifying cases by stage (all stages, early-stage, late-stage) and molecular subgroup (MSI-hypermutated, CN-low, CN-high, all non-ultramutated), and applying a Bonferroni correction to account for multiple testing.

Identification of SMGs among late-stage EECs
For the NHGRI late-stage cohort (n = 15), the average depth of coverage within regions targeted by the capture kit for tumor and normal samples was 67.2x and 65.5x, respectively; 90.87% of targeted bases for each tumor/normal pair had sufficient coverage for variant calling ( Table C in S1 Table). Using a combination of somatic variant calling algorithms and stringent filtering parameters, we identified 2,879 high-confidence coding and splice-site somatic variants (consisting of 2,214 nonsynonymous (1,405 SNVs, 809 indels), 92 splice-site, and 573 synonymous variants) ( Table D in S1 Table). Combined, the 2,306 nonsynonymous and splice-site variants affected 1,968 protein-coding genes and averaged 153.7 variants per tumor (range 9-542 per tumor) ( Table D and Table E in S1 Table). For the TCGA late-stage cohort (n = 66), we extracted a total of 28,996 somatic coding and splice-site variants distributed among 10,504 protein-encoding genes ( Table F and Table G in S1 Table). Using MutSigCV (v1.4), we identified a total of 14 unique late-stage SMGs (Fig 2), representing 6 SMGs (qvalue �0.1) in the NHGRI ( Table 1) and 12 SMGs in the TCGA late-stage EEC cohorts ( Table 2).

KLF3 and PAX6 are SMGs in late-stage but not early-stage EEC
To test whether each of the 14 late-stage SMGs are also statistically significantly mutated in the TCGA early-stage EECs (n = 270), we first estimated MutSigCV's power to detect genes as significantly mutated in the early-stage cohort. Estimating power using a binomial model as described in [42], we determined that the data from 270 tumors, when tested on 14 genes, yields >95% power to detect genes as significantly mutated across a wide range of background   Table I in S1 Table). To determine whether any of the 14 late-stage SMGs were significantly mutated in this dataset, p-values for all somatically mutated genes in early-stage tumors were calculated and used to determine q-values adjusting for 14 tests (reflecting the 14 late-stage SMGs queried) using the Benjamini-Hochberg procedure [43] ( Table 3). Results showed that 12 of 14 late-stage SMGs were statistically significantly mutated (q-value <0.1) in early-stage EECs whereas two late-stage SMGs, KLF3 and PAX6 were not ( Table 3). Somatic mutations were more frequent among late-stage tumors than early-stage tumors for both KLF3 (10.6% (7 of 66) late-stage vs 4.8% (13 of 270) early-stage) and PAX6 (10.6% (7 of 66) late-stage vs 1.9% (5 of 270) early-stage) ( Table 4).
We constructed Q-Q plots to verify that our q-values, calculated using the Benjamini-Hochberg procedure on MutSigCV's p-values, are the result of real statistical significance and not stratification of our dataset (S4 Fig). The Q-Q plots show significant deviation from ideal behavior due to MutSigCV's testing model [44], and the limited number of tumors analyzed.

KLF3 and PAX6 mutations occur in MSI-hypermutated EEC and are predicted to affect protein function
For the TCGA cohorts, we evaluated the distribution of KLF3 and PAX6 mutations across the MSI-hypermutated (n = 141 cases), CN-low (n = 140 cases), and CN-high (n = 55 cases)  Table). All but one (11 of 12) of PAX6 mutations were in the MSI subgroup; the PAX6 X306_splice mutation was present in a CN-low tumor ( Table 4). The higher frequency of PAX6 mutations  ξ Data were extracted from previously published TCGA data [16].  114)). There was no significant difference in the frequency of PAX6 mutations between tumors of differing grade; PAX6 mutations were present in 3.6% of grade 1 (1 of 28), 13.5% of grade 2 (5 of 37) and 7.9% of grade 3 (6 of 76) MSI-hypermutated tumors ( Table J in S1 Table). We observed no statistically significant differences in KLF3 or PAX6 mutation frequencies between POLE/POLD1-mutated and POLE/POLD1-wildtype cases within the MSI-hypermutated subgroup ( Table K in S1  Table).

Survival analysis
We utilized the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) to query the relationship between patient survival and somatic mutation status of all 14 late-stage SMGs identified herein, applying a Bonferroni correction to account for multiple testing (456 tests). With respect to KLF3 and PAX6 in the MSI-hypermutated subgroup, no significant differences in overall survival (OS), progression-free survival (PFS), disease-free survival (DFS) or diseasespecific survival (DSS) were observed between mutated and non-mutated tumors when all stages were combined or when early-and late-stage tumors were considered separately ( Table M in S1 Table). For the remaining 12 SMGs, there were no statistically significant differences in survival for any stage or molecular subgrouping ( Table N through Table V in  S1 Table).

Discussion
The mutational landscape of EEC was reported by TCGA in an initial 2013 study and a subsequent "pan-gyn" study which included the 2013 EEC cohort and additional cases. Both studies performed in silico annotation of SMGs, which represent candidate driver genes, in a stageagnostic manner. However, cancer genomes are dynamic and the mutational repertoire of tumors can evolve during progression and metastasis [45]. Recent comparisons of primary and metastatic endometrial cancer genomes have demonstrated divergence in their mutational landscapes [46][47][48]. But exome-wide comparisons of late-stage and early-stage primary tumors are lacking. Here, our stage-specific analysis of TCGA mutation data for non-ultramutated EECs showed that KLF3 and PAX6 are SMGs in late-stage (III/IV) but not early-stage (I/II) disease, raising the possibility that KLF3 and PAX6 mutations undergo positive selection during tumor progression.
KLF3 encodes a zinc finger transcription factor with roles in adipogenesis, erythroid maturation, B-cell differentiation, and cardiovascular development (reviewed in [49]). In the Human Protein Atlas, KLF3 expression was detected at "medium" levels in the normal in the glandular epithelium of the endometrium (https://www.proteinatlas.org/ ENSG00000109787-KLF3/tissue/endometrium), by immunohistochemistry. The encoded protein includes an N-terminal CtBP-binding motif, three C-terminal Cys2His2 zinc finger domains, and a primary phosphorylation site at serine-249 that is important for DNA binding and enhancing transcriptional repression [49]. In our analysis of NHGRI EEC exomes and TCGA mutation data, the majority of KLF3 mutations, including three mutation hotspots, were frameshift mutations that occur N-terminal to the zinc finger domains and to serine-249. Because frameshift mutations often generate a downstream premature stop codon, they may result in the production of a truncated protein or the transcript may be subjected to nonsensemediated decay resulting in haploinsufficiency [50]. Based on the positional rules for nonsense-mediated decay [51], it is likely that the KLF3 frameshift mutations among the ECs in this study result in nonsense-mediated decay and haploinsufficiency because the associated premature stop codons are located more than 50-55 nucleotides upstream of the final exonexon junction [51]. In addition, in silico analyses predicted deleterious effects for the KLF3 R257W and KLF3 R261G missense mutants that occur in EEC; KLF3 R257W also occurs somatically in 2 colorectal cancers (1 MSI-high/CIMP (CpG island methylator phenotype)low; 1 CIN (chromosome instability)-subgroup) [52,53].
The fact that KLF3 mutations in EEC occur predominantly at homopolymer tracts, were restricted to the MSI-hypermutated EEC subgroup, and are more frequently mutated in latestage than early-stage MSI-hypermutated tumors (25.9% versus 11.4%, respectively), indicate that KLF3 is an MSI target gene that may be involved in the etiology and progression of a subset of hypermutated EECs. Consistent with the idea that KLF3 is an MSI target gene, frameshift mutations at codons 106 and 227, which are recurrent in MSI-EECs, are also recurrent in the colorectal MSI-colorectal and MSI-stomach TCGA molecular subgroups [35,54,55].
Studies in other tumor types have reported KLF3 alterations as adverse prognosticators. For example, decreased KLF3 expression in colorectal and cervical cancers is associated with lymph node positivity and poorer outcomes [56,57]. Conflicting data exist regarding the occurrence and effects of reduced KLF3 levels in lung cancer. However, one study reported lower levels of KLF3 mRNA and protein expression in lung adenocarcinomas compared with adjacent normal tissues and more frequent loss of KLF3 expression in late-versus early-stage disease [58]. Although we found KLF3 is a late-stage-specific SMG in EEC, there was no significant association between KLF3 mutation status and survival for EEC patients, possibly reflecting tissue-specific differences in KLF3 association with outcome, and/or outcome differences between mutation and reduced expression of KLF3.
The second late-stage-specific SMG identified in our study was PAX6. PAX6 encodes a highly conserved paired box transcription factor that includes paired box and homeobox DNA-binding domains and a C-terminal transactivation domain (TAD); the final 40 residues of the TAD influence homeobox-DNA binding [59]. In the Human Protein Atlas, PAX6 expression was undetectable by immunohistochemical analysis of the normal glandular epithelium of the endometrium (https://www.proteinatlas.org/ENSG00000007372-PAX6/ tissue/endometrium). PAX6 has important roles in the development of several tissue types, including the eye (reviewed in [60]). Inherited and de novo nonsense and frameshift mutations in PAX6 cause the autosomal dominant eye disorder aniridia 1, whereas germline missense mutations are associated with attenuated ocular phenotypes [61]. Dysregulation of PAX6 expression has been implicated in a variety of human cancers, resulting in tumor suppressive or oncogenic phenotypes depending on the cellular context [62][63][64][65][66][67][68][69][70][71][72][73][74]. A recent study reported a potential role for epigenetic silencing of PAX6 in EC progression based on hypermethylation of PAX6 in primary EC versus endometrial hyperplasia, and in metastatic EC versus primary EC [75]. Our analysis of TCGA mutation data found that PAX6 mutations almost exclusively occur in MSI-hypermutated tumors. This observation, coupled with the fact that PAX6 mutations were more frequent among late-stage than early-stage MSI-hypermutated tumors (25.9% versus 3.5%, respectively), raise the possibility that, like KLF3 mutations, PAX6 mutations may be pathogenic drivers of tumor progression in the context of MSI-hypermutated EECs.
Most PAX6 mutations in TCGA MSI-hypermutated EECs were the recurrent PAX6 P375Hfs � 7 frameshift mutation in the transactivation domain [2,16]. We predict that PAX6 P375Hfs � 7 and an adjacent PAX6 H376Tfs � 36 frameshift mutation encode truncated proteins with reduced transactivation capacity, because the associated premature stop codons are located within 50 nucleotides of the penultimate exon-exon junction [51] and are located proximal to a synthetic nonsense mutation (PAX6 Q422X ) that exhibits reduced transactivation capacity in vitro [76]. Moreover, the fact that the PAX6 P375Q aniridia-associated missense mutation results in attenuated DNA binding affinity in vitro [76], raises the possibility that the recurrent PAX6 P375Hfs � 7 mutant also may have attenuated DNA binding. Similar to KLF3 frameshift mutations, the PAX6 P375Hfs � 7 and PAX6 H376Tfs � 36 frameshift mutations in EEC both arise within a (C) 7 homopolymer tract indicating that PAX6 is an MSI target gene. Consistent with this idea is the fact that PAX6 frameshift mutations originating at codon 375 and/or codon 376 are also recurrent in MSI-stomach cancer and MSI-colorectal carcinoma [34,35,77].
Compared to frameshift mutations, PAX6 missense mutations are relatively rare in the non-ultramutated TCGA cohort, occurring in three cases. The PAX6 A33T EC-mutant occurs in the N-terminal paired box domain at a residue highly conserved across paired domains in Pax family members and other proteins and is predicted to impact function [78]. A different substitution at this residue (PAX6 A33P ) exhibits altered transactivation activity in vitro and is a germline variant associated with partial aniridia [78,79]. The other two PAX6 missense mutations in EC (PAX6 E220G and PAX6 G141S ) were not uniformly predicted to be functionally significant in our analysis and, to our knowledge, are not pathogenic variants for ocular phenotypes.
In conclusion, our findings indicate that KLF3 and PAX6 are candidate driver genes in a subset of late-stage hypermutated EECs and are MSI target genes. Despite sufficient power, neither KLF3 nor PAX6 were detected as candidate driver genes in early-stage EECs. To our knowledge, this is the first study to annotate KLF3 and PAX6 as late stage-specific SMGs in EEC. Our findings warrant future studies to independently validate the enrichment of PAX6 and KLF3 mutations in late-stage, MSI-hypermutated EECs, to determine expression levels of KLF3 and PAX6 proteins in endometrial tumors, and to determine the functional effects of recurrent frameshift mutations in these genes particularly in regard to phenotypic properties associated with tumor progression.