KAT6A amplifications are associated with shorter progression-free survival and overall survival in patients with endometrial serous carcinoma

Somatic copy number alterations (CNA) are common in endometrial serous carcinoma (ESC). We used the Tumor Cancer Genome Atlas Pan Cancer dataset (TCGA Pan Can) to explore the impact of somatic CNA and gene expression levels (mRNA) of cancer-related genes in ESC. Results were correlated with clinico-pathologic parameters such as age of onset, disease stage, progression-free survival (PFS) and overall survival (OS) (n = 108). 1,449 genes with recurrent somatic CNA were identified, observed in 10% or more tumor samples. Somatic CNA and mRNA expression levels were highly correlated (r> = 0.6) for 383 genes. Among these, 45 genes were classified in the Tier 1 category of Cancer Genome Census-Catalogue of Somatic Mutations in Cancer. Eighteen of 45 Tier 1 genes had highly correlated somatic CNA and mRNA expression levels including ARNT, PIK3CA, TBLXR1, ASXL1, EIF4A2, HOOK3, IKBKB, KAT6A, TCEA1, KAT6B, ERBB2, BRD4, KEAP1, PRKACA, DNM2, SMARCA4, AKT2, SS18L1. Our results are in agreement with previously reported somatic CNA for ERBB2, BRD4 and PIK3C in ESC. In addition, AKT2 (p = 0.002) and KAT6A (p = 0.015) amplifications were more frequent in tumor samples from younger patients (<60), and CEBPA (p = 0.028) and MYC (p = 0.023) amplifications were more common with advanced (stage III and IV) disease stage. Patients with tumors carrying KAT6A and MYC amplifications had shorter PFS and OS. The hazard ratio (HR) of KAT6A was 2.82 [95 CI 1.12–7.07] for PFS and 3.87 [95 CI 1.28–11.68] for OS. The HR of MYC was 2.25 [95 CI 1.05–4.81] and 2.62[95 CI 1.07–6.41] for PFS and OS, respectively.


Introduction
Somatic copy number alterations (CNA), including aneuploidy, segmental duplications and focal aberrations are frequently observed in neoplasia. For critical oncogenes and tumor suppressor genes, changes in gene copy number might result in alteration of gene expression a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and drive the neoplastic process. For example, PTEN [1] and RB1 [2] deletions result in decreased gene expression of tumor suppressor genes, whereas MET [3], ERBB2 [4] and MYC [5] amplifications lead to increased gene expression levels. The frequency of somatic CNAs varies significantly according to the histologic type of neoplasm as well as anatomical site. For example somatic CNAs are very common in endometrial serous cancers (ESC), but not in other endometrial cancer (EC) histologic types. As a matter of fact, ESC overlap with the "copy number (CN) high group" at the molecular level to such an extent; in the current molecular classification of EC the CN-high group is also known "serous-like" carcinoma [6]. ESC is one of the high-grade EC with a worse clinical outcome compared to low-grade (type 1) EC [7].
We hypothesized that frequently observed somatic CNAs are highly relevant in the pathogenesis of ESC by changing expression levels of critical cancer-related genes. To address this hypothesis, we took advantage of the publically available TCGA Pan Can dataset deposited in cBioPortal (cbioportal.org). The main objective of this study is to identify candidate oncogenes and tumor suppressor genes, which have been implicated in other human neoplasms, but not implicated ESC. We pursued this objective by correlating copy number and mRNA expression levels in patients with ESC using the TCGA Pan Cancer dataset (TCGA Pan Can), and cross-tabulating the highly correlated genes with known cancer genes (i.e. Tier 1 Cancer Census Genes for Catalogue of Somatic Mutations in Cancer (CGC-COSMIC) [8]. The secondary objective of this study is to explore associations of identified Tier 1 CGC-COSMIC genes with clinic-pathological parameters such as disease stage, age of onset, overall survival (OS) and progression free survival (PFS), to identify potential biomarkers associated with these cancers.

TCGA endometrial serous cancer cohort
ESC samples (n = 108) were identified in public TCGA dataset [9] from cBioPortal [10]. Frozen tumor samples with companion normal tissue were collected at diagnosis according to the consent provided by the relevant institutional review boards of participating institutions. Patients were selected only if their treatment plan required surgical resection and had received no prior treatment for their disease. Pathologic diagnoses were made at local laboratories using formalin-fixed and paraffin-embedded (FFPE) sections. Each frozen, OCT-embedded tumor was processed centrally by the TCGA and a hematoxylin-eosin stained section was reviewed by a pathologist to confirm the tumor subtype and grade [6,9]. For a given patient, clinical data such as age of onset, stage of tumor, OS and PFS were extracted using the visualization tools of cBioPortal.

Identification of recurrent somatic CNA
Copy number status of each gene (n = 24,881) in the genome was determined according to the TCGA analysis methods described elsewhere. Using cBioPortal tools [10], the dataset specifying somatic copy number aberration (CNA) frequency for each gene in 108 patients was downloaded. Then, 1,449 genes with recurrent somatic CNA (i.e. amplification or deletion in at least 10% of tumor samples) were identified (S1 Table). GRCh38 coordinates of each identified gene was obtained from the Galaxy platform [11] (S2 Table). Based on these coordinates, genomic blocks with recurrent somatic CNAs were determined ( Table 1).

Correlation of gene expression and copy number data for genes with recurrent somatic CNA
Relative linear copy number values were plotted against mRNA expression z-scores (RNA Seq V2) in order to determine the impact of somatic CNA on gene expression at the mRNA level, Pearson correlation coefficients were obtained using the cBioPortal visualization tool. The cutoff for "high-correlation" was arbitrarily accepted as equal as or more than 0.6 (r > = 0.6).

Identification of cancer relevant genes among genes with recurrent somatic CNAs
In order to identify cancer relevant genes among the genes with recurrent somatic CNA, we cross-tabulated these genes with Tier 1 Cancer Gene Census (CGC) genes (n = 576) from the Catalogue of Somatic Mutations in Cancer (COSMIC) (S3 Table). CGC is an ongoing curation effort under the auspices of COSMIC to catalogue genes whose mutations have been causally implicated in cancer. To be classified into Tier 1, a gene must possess a documented activity relevant to cancer, along with evidence of mutations in cancer which change the activity of the gene product in a way that promotes oncogenic transformation [8]. From cBioPortal, somatic CNA and point mutation data were obtained.

Association of cancer relevant genes with clinic-pathological data
GraphPad Prism (v8.0.0) and Minitab software (v18) were used for statistical analysis. Fisher exact tests were applied for categorical variables such as somatic CNA, stage of disease and age of diagnosis. For age, we used 60 years as an arbitrary cut-off since ESC is usually diagnosed at more advanced ages, typically in the eight decade [12]. The Kaplan-Meier method was used to estimate PFS and OS at last follow-up date for alive patients with no evidence of progression for PFS estimation. For OS, just being alive is qualified for censoring. Using the curve comparison analysis module in GraphPad Prism software (v8.0.0), the median PFS differences, hazard ratios (Mantel-Haentzel) and p-values (Mantel-Cox test) were calculated. p <0.05 was considered to be statistically significant.

Cohort characteristics
The median patient age was 68 years (range, 45-90). There were 38 patients with stage I, 12 with stage II, 45 with stage III and 13 with stage IV disease. The patients < 60 years of age presented more often with advanced stage (III and IV) disease compared to older patients (p = 0.002). More advanced disease stage was associated with shorter PFS (p<0.001) and OS (p<0.001).

Recurrent somatic CNA in endometrial serous carcinoma
1,449 genes with somatic CNA were observed in at least 10% of tumors (S1 Table)

Impact of recurrent somatic CNA in endometrial serous carcinoma
The 1,449 genes with recurrent somatic CNA were evaluated based on two criteria. The first criterion was whether the gene copy number and the gene expression was highly correlated (r> = 0.6), and the second criterion was whether the gene was implicated in cancer (Tier 1 CGC-COSMIC gene). The number of the genes that fulfilled the first (highly correlated) and second (implicated in cancer) criteria were 383 and 45, respectively. In addition, 18 genes fulfilled both criteria, as they are highly correlated Tier 1 CGC-COSMIC genes (Fig 1).  (Fig 2A-2Q).

Association of Tier 1 CGC-COSMIC gene amplifications with clinicopathological parameters
Association with age. The frequency of AKT2 and KAT6A amplifications were much higher in patients who were younger than 60 years. For AKT2, the frequency of amplifications for younger patients was 36% (n = 11), whereas this figure was 7% for older patients (n = 97) (p = 0.002). For KAT6A, younger and older patients had amplification rates of 45% and 15%, respectively (p = 0.015). There was no association between age and other genes ( Table 2).
Association with disease stage. The frequency of CEBPA and MYC amplifications was much higher in patients with advanced stage disease. For CEBPA, the frequency of amplifications in patients diagnosed at advanced stage (n = 58) was 21%, whereas it was 6% for patients

PLOS ONE
Somatic copy number aberrations and gene expression in endometrial serous cancer diagnosed at stage I and II disease (n = 50) (p = 0.028). For MYC, tumor samples obtained from advanced and early stage diseases had an amplification rate of 33% and 14%, respectively (p = 0.023). An association with other genes was not observed (     (Fig 4C and 4D).

Discussion
The primary objective of this study was to identify candidate oncogenes and tumor suppressor genes in ESC cases by correlating DNA copy number and mRNA expression in the TCGA cohort. We identified 18 amplified known oncogenes, which also were overexpressed in ESC. Our PubMed search of these 18 genes in relation to "endometrial cancer", "endometrial serous cancer" and "gynecologic cancers" categorized them into four groups. The first group of genes were reported previously in ESC pathogenesis. For ERBB2 [13], BRD4 [14] and PIK3CA [15], our findings are in accord with previously reported findings on ESC suggesting overexpression due to amplification. The second group of genes has been implicated in EC, but not in the serous histologic type. These include IKBKB [16,17], KEAP1 [18,19], AKT2 [20,21] and SMARCA4 [22]. The third group of genes are ARNT [23], KAT6B [24], DNM2 [25] and ASXL1 [26]. These genes have been reported to be associated with other gynecological cancers, but not in endometrial cancers. To the best of our knowledge, TBL1XR1, EIF4A2, HOOK3, KAT6A, TCEA1, PRKACA, and SS18L1 belong to a group of genes that have never been implicated in any gynecological cancers. Point mutation or fusions of the amplified genes were not frequently observed in these tumors, with a notable exception of PIK3CA. Point mutations and amplifications of PIK3CA were observed in approximately half of the tumor samples.
The secondary objective of this study was to explore potential associations of identified cancer genes with clinic-pathological parameters. For this purpose, association of several genes at three recurrent somatic CNA, at 8p21.3, 8q24.13 and 19q11-q13.2, is worth mentioning. The 2.1 Mb recurrent somatic CNA at 8p21.1 (chr8:41261956-43363185) contains three Tier 1 CGC-COSMIC genes HOOK3, IBKB and [25] KAT6A. Expression of all three genes was highly correlated with their copy numbers. No associations with clinic-pathologic parameters were noted for HOOK3 or IKBKB, but KAT6A amplification was associated with shorter PFS and OS and earlier age of onset of disease. KAT6A has never been implicated in gynecological malignancies, and its role is unknown in ESC. KAT6A is a member of the histone lysine acetyltransferase (KATs) family, also known as monocytic leukemia zinc finger protein (MOZ). KAT6A has an important role in the regulation of chromatin organization and function. Translocations involving KAT6A (and KAT6B) is are identified in acute myeloid leukemia [27]. In an animal study, inhibitors of KAT6A/B induced senescence and arrest in lymphoma growth [28]. Even partial blockage of KAT6A reduced proliferation of myc-induced lymphoma and leukemia [29]. Our results indicate KAT6A is one of the candidate genes for further evaluation in ESC pathogenesis.
The 16.92 MB recurrent somatic CNA at 8q24.13-q24.31 (chr8:41261956-43363185) contains two Tier 1 CGC-COSMIC genes, MYC and NDRG1. Neither MYC (r = 0.48) nor NDRG1 (r = 0.55) expression was highly correlated with their copy numbers. However, amplification of MYC was associated with higher disease stage and poorer OS and PFS rates. MYC amplification in EC has been reported in other studies [30,31]. In agreement with our results, MYC amplification along with HER-2/neu and cyclin E high protein expression have been associated with tumor progression, higher tumor grade and deep myometrial invasion in the literature [32]. Although MYC copy number and mRNA expression levels were not highly correlated, PVT1 was co-amplified with MYC (p<0.001) (Fig 5) in 25% of the cases, and had a strong correlation with gene expression (r = 0.60). PVT1 is not a CGC-COSMIC gene. but it encodes a long non-coding RNA with oncogenic properties whose amplification and overexpression have been implicated in several cancers including breast and ovarian carcinomas [33]. Therefore, PVT1, in conjunction with or instead of MYC might be the cancer driver gene in this setting.
The 12.16 MB recurrent somatic CNA at 19q11-13.2 (chr8:41261956-43363185) contains three Tier 1 CGC-COSMIC genes CEBPA, CCNE1, AKT2. Expression of AKT2 was highly correlated with its copy number (r = 0.73), whereas this correlation was weaker for CEBPA (r = 0.23) and CCNE1(r = 0.39). AKT2 amplification was associated with younger age of disease onset.AKT2 belongs to a family of three serine/threonine-protein kinases called the AKT kinases (encoded by AKT1, AKT2 and AKT3), which regulate cell proliferation, cell survival, growth and angiogenesis. Among all three genes, amplification and overexpression of AKT2 was demonstrated in many cancers including EC.AKT2 was associated with cancer cell invasion, metastasis, and survival [34]. A second amplified gene at the same locus was CEBP in our analysis. Its amplification was also associated with more advanced disease stage. CEBPA expression is highly expressed in normal endometrial tissues and is not expressed in clinical endometrial cancer samples [35]. DNA hypermethylation of the upstream CEBPA promoter region is responsible for very low CEBPA expression in lung and endometrial cancers [36]. Decreased expression of CEBPA by posttranscriptional regulation was also shown in myeloid leukemia [32]. Therefore, CEBPA does not appear to be a likely candidate driver gene in ESC. Lastly, CCNE1 amplification was one of the previously reported [37][38][39] genes in EC, and we also identified CCNE1 amplification in serous-like cancers of TGCA samples. However, we did not detect a strong correlation (r = 0.39) between copy number and mRNA expression levels for this gene. It is plausible that AKT2 rather than CCNE1 or CEBPA in this dataset is the driver of the tumor. More detailed studies are required for this group of genes at this locus.
There are several limitations of this study. First, this is a retrospective analysis of a multicenter study and the obtained survival data in regard to PFS and OS are not derived from a randomized clinical trial, or even the clinical practice of a single center. Therefore some heterogeneity is expected. A second limitation is regarding the calls for somatic CNA. GISTIC [40] is the standard algorithm to call somatic CNA in the TCGA studies using Affymetrix data, however other algorithms like Hidden Markov Models(HMM) [41] and Circular Binary Segmentation (CBS) [42] are widely used in commercial software applied in clinical practice. For the purpose of this study, we accepted TCGA calls at their face value. A third limitation of is the arbitrary selection of a cut-off for Pearson coefficient (r = > = 0.6). However, this approach seems adequately sensitive to identify ERBB2 [13], BRD4 [14] and PIK3CA [15], in agreement with previously generated data. A fourth limitation is using CCG-COSMIC Tier 1 genes as the source for "cancer genes". There are other initiatives for curation of cancer genes, such as CIViC [43], myCancerGenome [44] or OnkoKB [45], but we believe CGC-COSMIC is a reputable dataset. A fifth limitation is using mRNA levels as a marker of gene expression. Obviously, protein levels from reverse phase protein arrays (RPPA) and immunohistochemistry studies would be ideal, but these data are not available for EC in the TCGA dataset. Finally, the major limitation of this study is the analysis being performed only on the TCGA dataset. Although ESC collection (n = 108) in the TCGA Pan Cancer dataset is the largest known cohort with clinical and genomics data on these tumors, we are aware that in an ideal situation, our findings should be replicated in an independent study group. Unfortunately, the rarity of the ESC, precluded the prospective collection of same of higher number of tumors with relevant clinical information for the purposes of this study.
In conclusion, despite the aforementioned limitations, our analysis of ESC in TCGA samples identified several novel candidate genes which may be important in the ESC pathogenesis. KAT6A is among the most interesting and novel candidate; amplification correlated with increased gene expression and was associated with low PFS and OS. The results of this study also reaffirm the previously known clinic-pathological associations for loci such as 8q24and 19q13-q11. More research is warranted to determine the impact of gene copy number changes in the pathogenesis of ESC.
Supporting information S1