A comprehensive molecular characterization of the 8q22.2 region reveals the prognostic relevance of OSR2 mRNA in muscle invasive bladder cancer

Technological advances in molecular profiling have enabled the comprehensive identification of common regions of gene amplification on chromosomes (amplicons) in muscle invasive bladder cancer (MIBC). One such region is 8q22.2, which is largely unexplored in MIBC and could harbor genes with potential for outcome prediction or targeted therapy. To investigate the prognostic role of 8q22.2 and to compare different amplicon definitions, an in-silico analysis of 357 patients from The Cancer Genome Atlas, who underwent radical cystectomy for MIBC, was performed. Amplicons were generated using the GISTIC2.0 algorithm for copy number alterations (DNA_Amplicon) and z-score normalization for mRNA gene overexpression (RNA_Amplicon). Kaplan-Meier survival analysis, univariable, and multivariable Cox proportional hazard ratios were used to relate amplicons, genes, and clinical parameters to overall (OS) and disease-free survival (DFS). Analyses of the biological functions of 8q22.2 genes and genomic events in MIBC were performed to identify potential targets. Genes with prognostic significance from the in silico analysis were validated using RT-qPCR of MIBC tumor samples (n = 46). High 8q22.2 mRNA expression (RNA-AMP) was associated with lymph node metastases. Furthermore, 8q22.2 DNA and RNA amplified patients were more likely to show a luminal subtype (DNA_Amplicon_core: p = 0.029; RNA_Amplicon_core: p = 0.01). Overexpression of the 8q22.2 gene OSR2 predicted shortened DFS in univariable (HR [CI] 1.97 [1.2; 3.22]; p = 0.01) and multivariable in silico analysis (HR [CI] 1.91 [1.15; 3.16]; p = 0.01) and decreased OS (HR [CI] 6.25 [1.37; 28.38]; p = 0.0177) in RT-qPCR data analysis. Alterations in different levels of the 8q22.2 region are associated with manifestation of different clinical characteristics in MIBC. An in-depth comprehensive molecular characterization of genomic regions involved in cancer should include multiple genetic levels, such as DNA copy number alterations and mRNA gene expression, and could lead to a better molecular understanding. In this study, OSR2 is identified as a potential biomarker for survival prognosis.

Introduction Bladder cancer is one of the most common urologic malignancies with estimated 80,470 new cases and 17,670 new deaths in the United States in 2019 [1]. Despite treatment advances, muscle invasive bladder cancer (MIBC) is associated with a poor 5-year survival rate of 40-60% [2]. To develop targeted therapies for MIBC, much effort has been directed at understanding the pathogenesis of this disease. Technological advances in molecular profiling, such as highthroughput sequencing technologies, have augmented cytogenetic, fluorescence in situ hybridization (FISH) and array-comparative-genomic-hybridization (aCGH) studies. These results have created an additional complexity through the generation of an unprecedented series of copy number alterations (CNA) and mRNA gene expression data, in addition to classical oncogene mutations [3][4][5][6][7]. The complexity of this molecularly altered landscape and its potential interactions requires an in-depth comprehensive analysis and correlation with clinical phenotype and prognosis [8].
Common regions of gene amplification on chromosomes are known for various solid cancers and hematologic malignancies [9]. In an effort to define these regions of co-amplified neighboring genes, many authors have used the term 'amplicon' [10], but the exact definition of an amplicon is currently a topic of discussion. Bignell et al. defined an amplicon as "a somatically acquired increase in copy number of a restricted genomic region," which "is often found in cancer cells as a mechanism of increasing the transcript and therefore, protein levels of dominantly acting cancer genes" [11][12][13]. Based on this definition, the GIS-TIC2.0 algorithm has been used to select potential target genes in a specific region with a CNA of �2. If co-amplification of these genes is detected, this region is referred to as an amplicon [14,15]. Since amplicons are usually investigated in the context of survival or prognostic influence on cancer, using CNA may underestimate functionally relevant levels of the genetic code, such as mRNA and protein expression. As current methods to measure protein expression are limited, mRNA expression is suggested as the closest and most accurate surrogate for gene activity [16].
The mRNA-based approach for selecting amplicons has been applied by Luo et al., who used a multigene mRNA expression signature to identify FGFR1-amplified estrogen receptor-positive (ERfl) breast tumors [17]. While this study and other research have mainly focused on amplicons in breast and gastric cancer, limited knowledge exists for bladder cancer [18][19][20].
i. The primary aim of this study is to provide a method for a comprehensive molecular characterization of the 8q22.2 region in bladder cancer using in silico CNA and mRNA gene expression data.
ii. A second aim of this study is to compare the CNA and mRNA amplicon structure of 8q22.2, investigate the prognostic value of 8q22.2 amplification on survival, and identify predictive genes. Prognostically significant genes are validated using RT-qPCR in an independent cohort of MIBC tumor samples.
iii. Finally, this study is also designed to explore whether the number of chromosomal alterations is associated with a distinct group of patients with clinically or histologically defined parameters.

TCGA cohort
Data for the first muscle invasive bladder cancer (MIBC) cohort for this study was derived from The Cancer Genome Atlas and has been produced in earlier analyses [7]. Clinicopathological data on the cohort, CNA data, mRNA data, and mutation data were downloaded from the open access portal, cBioPortal (https://www.cbioportal.org) provided by the Memorial Sloan Kettering Cancer Center [25,26]. Molecular subtypes and focal amplifications and deletions were downloaded from a supplementary excel file from the TCGA analysis performed by Robertson et al., named S1 Table: Clinical and Molecular covariates, Related to STAR Methods [7]. Focal gains and amplifications, as well as focal losses and deletions, were grouped together and analyzed as events of genomic amplification or deletion.
Receipt of neoadjuvant chemotherapy and unknown tumor stage (Tx) or T<2 were defined as exclusion criteria. After exclusion, 357 patients remained for final analysis.
Validation cohort. In silico findings were retrospectively validated in an independent MIBC (�2) cohort from the Clinic for Urology and Urosurgery at the University Hospital Mannheim. Formalin-fixed, paraffin-embedded (FFPE) tissue samples were obtained from 46 patients (24% female, with a median age of 72.5 years; 78% T3 and T4 tumors and 27% lymph node positive patients (N+)) (S2 Document). Histopathologic information has been provided by ZVP. Specimens were graded according to the most recent TNM classification (2017) and the WHO 2010 classification of genitourinary tumors. Prior to commencement the review board 2 of the University of Heidelberg approved the study under the number 2015-549N-MA, in accordance with the Declaration of Helsinki. All patients provided written informed consent for this study.

TCGA bladder cancer copy number alterations and gene expression data
Varying degrees of co-amplified genes in the 8q22.2 region for patients in the TCGA bladder cancer cohort were visualized using a heatmap (oncoprint) (S1 Fig). For this study we only addressed high CN gains or amplifications and did not differentiate between balanced and unbalanced gains. We chose to define genes as amplified on the DNA level if they showed a GISTIC2.0 algorithm value �2, as reported by Mermel et al. [29].
Amplification at the mRNA level was defined as mRNA overexpression with a z-score-normalization value �2 (https://docs.cbioportal.org/5.1-data-loading/data-loading/file-formats/zscore-normalization-script). This value was based on the fact that a z-score of 2 (2 standard deviations from the mean) is the default setting used when downloading mRNA data from cBioPortal [26]. We further performed analyses with more or less restrictive z-scores (S1 and S2 Tables).
As in the work of Heidenblad et al., the size of the 8q22.2 amplicon was defined by the minimal common region (i.e., number of genes) of amplification present in a given percentage of the patient population [22]. Based on the presence of a regional amplification in 8q22.2, patients were divided into amplified (AMP) and non-amplified (NONAMP) groups.
We then compared the mRNA gene expression between AMP and NONAMP to determine if amplification of genes resulted in mRNA overexpression (S3 Table).
Prognostic value of 8q22.2 and clinical parameters. To identify whether the number of chromosomal alterations in a patient was associated with specific clinical criteria, clinicopathologic data for patients were compared for each amplicon definition. Respective parameters included age (�70 vs. <70), gender (male vs. female), tumor stage (T2 vs. T3/4), lymph node status (N0 vs. N+), and molecular subtype (luminal vs. basal). Molecular subtype analysis was further extended to include five mRNA expression based molecular subtypes, as described by Robertson et al. [7] (S4 Table). Univariable Cox proportional hazard ratios and Kaplan-Meier curves were used to assess overall survival (OS) and disease-free survival (DFS). Clinical variables and 8q22.2 amplicon definitions or individual genes (RNA_AMP_Gene) with a p-value of <0.2 were further subjected to a multivariable Cox regression analysis [30].
We further assessed, whether amplification at the 8q22.2 locus was associated with genomic events such as mutations and chromosomal aberrations, including amplifications and deletions.
A list of 18 highly mutated (i.e., mutations in more than ten percent of patients in the TCGA cell 2017 cohort) known oncogenes and tumor suppressor genes in MIBC, as reported by Robertson et al., were chosen for comparison between AMP and NAMP groups [7] (S6 and S7 Tables).
The frequency of amplifications and deletions of frequently altered genomic regions in MIBC were compared for the AMP and NAMP groups (S8 Table).
Validation of prognostic markers with RT-qPCR. Gene expression of COX6C and OSR2 was quantified using reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) methodology, as described previously [33,34]. Briefly, RNA was extracted from 10 μm thick FFPE tissue slides with the bead-based XTRAKT FFPE kit (Xtract1 kit; STRA-TIFYER Molecular Pathology GmbH, Cologne, Germany). After lysing and purifying, nucleic acid isolates were treated with DNAse I. After DNA digestion RNA was transcribed with Super Script III reverse transcriptase (Thermo Fisher Scientific, Waltham, MA, USA) and sequence specific primers (S9 Table). Relative quantification of mRNA expression of COX6C (ENSG00000164919), OSR2 (ENSG00000164920) and the reference gene CALM2 (ENSG00 000143933) was measured on a StepOnePlus Real-Time PCR system (Life Technologies, Karlsbad, California, USA) with the SuperScript III Platinum One-Step quantitative RT-PCR system (Invitrogen, Karlsruhe, Germany) [35]. To normalize Ct values the Ct Value of the housekeeping gene CALM2 was subtracted from the Ct values of the target genes (ΔCt). mRNA expression was further reported as 40-ΔCt. [35][36][37]. Median, 1 st and 3 rd quartile 40-ΔCt values of COX6C and OSR2 were used as cut-off points for survival analysis.
A protocol of the workflow used in this study can be accessed at protocols.io (http://dx.doi. org/10.17504/protocols.io.bmewk3fe).

Statistics
All p-values were calculated for two sided tests. Spearman coefficient analysis was performed to assess gene correlations using a cutoff of �0.5 for positive correlations, including all statistically significant negative correlations (S10 Table). Contingency analysis was visualized using mosaic plots and differences between DNA_Amplicon_Core and RNA_Amplicon_Core definitions were compared using Fisher's exact test. Chi-Square tests were used to compare clinicopathologic data of patients for different amplicon definitions. Comparisons included age, gender, tumor stage, lymph node status, and molecular subtype.
Association with mutations and genomic events was investigated using Chi-Square tests, and the Bonferroni correction was used to correct for multiple testing.
Survival analyses were conducted using univariable Kaplan-Meier regressions and tested for significance with the log-rank and Wilcoxon tests. Multivariable analyses were performed using Cox-proportional hazard regression models. For results from the univariable analysis a p value cut-off of <0.2 was chosen to include relevant clinical or pathologic parameters that would have been missed with a more restrictive p value of <0.05 [30]. Variables for the multivariable analysis included significant (p<0.2) clinicopathological characteristics on univariable analysis (pT-Stage, pN-Stage, age, gender,) and genes (RNA_AMP_COX6C, RNA_AM-P_OSR2), or amplicon definitions (DNA_Amplicon_Core, RNA_Amplicon_Core). Statistical analyses of numeric continuous variables were performed with non-parametric tests (Wilcoxon rank-sum test, Kruskal-Wallis test). Contingency analyses of nominal variables were performed with Pearson's chi-squared test. All statistical analyses were performed using GraphPad Prism 8.0 (GraphPad Software Inc., La Jolla, California, USA) and JMP SAS 14.0 (SAS, Cary, North Carolina, USA). Results were considered to be significant if significance levels lower than 0.05 were obtained (the exception being significance levels lower than 0.2 on univariable analysis that were included in the multivariable analysis).

8q22.2 is a highly amplified region in TCGA cohort and can be divided into a core and extended region
More than ten percent of patients in the Cancer Genome Atlas (TCGA) cohort show CNA amplifications within the 8q22.2 region (https://www.cbioportal.org). The 8q22.2 amplicon can further be defined by a minimal common or core region consisting of two genes, RNF19A (ENSG00000034677) and SPAG1 (ENSG00000104450), that are present in all patients with amplifications in this region. This region can be extended to seven genes from RNF19A to VPS13B (ENSG00000132549), Ext1, and 15 genes from RNF19A to ERICH5 (ENSG00000 177459), Ext2 (Table 1).
When using mRNA gene expression data only the core amplicon definition could be replicated due to a high variance in gene co-expression at the mRNA level.

Univariable and multivariable survival analysis (DFS and OS) of the 8q22.2 region and genes in the TCGA cohort
Kaplan-Meier analysis of 5-year DFS and 5-year OS could not detect a statistically significant prognostic influence of 8q22. 2    Note that an mRNA expression greater than the median is associated with a significantly lower OS for OSR2 (p = 0.0066) and COX6C (p = 0.0287). OSR2 � median is also associated with a significantly reduced DFS (p = 0.0399).

Association of copy number alterations and mRNA levels in the 8q22.2 region
Concordance of DNA and mRNA level could only be assessed between the two core amplicon definitions, and was significant between both definitions (p<0.001). Twenty-three patients were identified as AMP by both 8q22.2 amplicon core definitions, with RNA_Amplicon_Core showing a higher sensitivity (23 of 30 RNA_AMP_Core patients vs. 23 out of 58 CNA_AMP_-Core patients). DNA_Amplicon_Core was associated with higher specificity (292 CNA_NO-NAMP_Core out of 299 patients vs. RNA_NONAMP_Core 292 out of 327).
A classical amplicon definition would suggest an increase in transcript levels in DNA amplified samples. As anticipated, RNF19A and SPAG1 showed a higher median gene expression in the

PLOS ONE
Comprehensive molecular characterization of 8q22.2 reveals prognostic relevance of OSR2 in MIBC AMP group. A marked difference between both groups could also be observed for POLR2K (ENSG00000147669), COX6C, and RIDA (ENSG00000132541). Interestingly, RGS22 (ENSG 00000132554), KCNS2 (ENSG00000156486), and NIPAL2 (ENSG00000104361) showed negative gene expression in both groups. Several genes also showed a switch from negative gene expression in the NONAMP group to positive gene expression in the AMP group (Fig 5, S3 Table) [38].

PLOS ONE
Comprehensive molecular characterization of 8q22.2 reveals prognostic relevance of OSR2 in MIBC

Associations with genomic events and gene functions of the 8q22.2 amplicon
When analyzing mutations in 18 known oncogenes and tumor suppressor genes, the DNA_AMP_ Core showed a higher rate of mutations in TP53 (65% vs. 46%) and KMT2A (19% vs. 10%), as compared to the DNA_NONAMP_Core. In contrast, FGFR3, CREBBP and ERBB2 were more frequently mutated in the DNA_NONAMP_Core group (DNA_AMP_Core vs. DNA_NO-NAMP_Core; FGFR3: 4% vs. 15%; CREBBP: 4% vs. 13%; ERBB2: 4% vs. 13%). (S5 Table) Interestingly, there was a significant difference in the mutational status of ERBB2 between the RNA_AMP_Core and RNA_NONAMP_Core. No patient in the RNA_AMP_Core group showed mutations in ERBB2, whereas 13% of RNA_NONAMP_Core patients exhibited mutations in this gene (p = 0.04  Table). Biological and functional gene analysis revealed most genes in the 8q22.2 region to be involved in protein or RNA/DNA processing pathways (S5 Table).
RNF19A and FBXO43 (ENSG00000156509) are both involved in protein ubiquitination, whereas POLR2K mediates the transcription of DNA into RNA. COX6C and STK3 (ENSG00000104375) show potential involvement in apoptosis, as cytochrome oxidases can initiate the apoptotic process; however, STK3 produces a pro-apoptotic kinase, which is part of the Hippo signaling pathway.

Comprehensive molecular analysis within multiple levels of the genetic code
This is the first in-depth comprehensive molecular characterization of the 8q22.2 region in the MIBC genome using the TCGA cell 2017 cohort. This study highlights the importance of placing comprehensive characterization at the center of any analysis regarding the potential role of restricted genomic regions in cancers. In this study, a comprehensive approach was used to take the structure and genomic proximity of the region into account, and it included multiple levels of the genetic code to allow for a more accurate assessment of interactions and gene

PLOS ONE
Comprehensive molecular characterization of 8q22.2 reveals prognostic relevance of OSR2 in MIBC activity. Once a restricted genomic region has been structurally defined, its correlation with phenotype and prognostic impact on survival can be assessed using clinicopathologic and survival data. A similar concept has already been used by Jones et al., who used integrative analysis of gene copy number alterations and mRNA expression to identify potential drivers of tumor recurrence in breast cancer [39].

Genes of the 8q22.2 region exhibit typical and paradoxical patterns of copy number alterations and RNA expression levels
While data from breast cancer and other cancers suggest that amplified CNA leads to subsequent mRNA overexpression, our results show that genes of the 8q22.2 region express a heterogenous concordance pattern between CNA and mRNA gene expression levels [38,[40][41][42]. While the classical pattern of CNA amplification leading to mRNA overexpression seems to be conserved in the core region of RNF19A and SPAG1, genes such as KCNS2 and NIPAL2 demonstrate a contrasting pattern, with a higher median gene expression in non-amplified patients. This paradoxical pattern of opposing CNA and mRNA expression levels has been described in lung adenocarcinoma by Tokar et al., who proposed that epigenetic regulatory mechanisms were responsible for these changes [43]. These mechanisms include microRNAmediated control of mRNA expression levels and methylation.
Genomic proximity of genes does not seem to be responsible for the heterogenous concordance pattern observed between CNA and mRNA in 8q22.2, as KCNS2 and NIPAL2 exhibit a paradoxical pattern and lie in the middle of the region, with STK3 between them, exhibiting the classical CNA-mRNA expression pattern.
Additionally, potential tumor growth and progression in the CNA_AMP_Core group could have increased through TP53 and KMT2A mutations, as this group harbored a higher percentage of these mutations than did the CNA_NONAMP_Core. When investigating head and neck squamous cell cancer cell lines, Cheng et al. found concurrent 3q26.3 amplification and TP53 mutation to be associated with a reduced survival rate [44]. Additionally, Zhang et al. found that KMT2A promoted melanoma cell growth by targeting the hTERT signalling pathway [45]. Interestingly, FGFR3, CREBBP and ERBB2 were more frequently mutated in the NONAMP group. Future research into the associations of 8q22.2 amplifications and mutations in bladder cancer is needed.
Amplification of 8q22.2 could also be viewed in an overarching context of chromosomal aberrations, as amplified patients showed significantly more frequent gains of 1q23.3, 8q22.3, 3p25.2 and 19q12, even after correcting for multiple testing. Concomitant gain of 3p25.2 was also reported by Hurst et al. [21]. Findings by Riester et al. further support the association of amplifications in specific regions of MIBC with survival, since gains in the 1q23.3 region were associated with poor survival in two cohorts of metastatic urothelial carcinoma [15].
Additional investigations are necessary to determine whether 8q22.2 amplification is a relevant genomic event in the pathogenesis of BC or simply a bystander of more complex chromosomal changes.

RNA_HIGH_OSR2 is an independent prognostic factor for survival
Analysis of the prognostic impact of individual genes of the 8q22.2 region revealed OSR2 to be an independent prognostic factor for survival. RNA_HIGH_OSR2 was associated with a worse prognosis for DFS (HR [CI]; 0.5 [0.32; 0.85]; p = 0.01) in the in silico analysis and decreased OS (HR [CI]; 6.25 [1.37; 28.38]; p = 0.018) in RT-qPCR data.
While the exact role of OSR2 in tumorigenesis is unknown, its methylation appears to play a role. Li et al. investigated the diagnostic value of OSR2 hypermethylation in gastric cancer patients and found significant differences in methylation status between cancer samples and normal controls. The authors concluded that hypermethylation of OSR2 and two other genes could provide a good alternative for non-invasive detection of gastric cancer [46]. Kostareli et al. found that promoter hypermethylation of ALDH1A2, OSR2, GATA4, GRIA4, and IRX4 showed a significant inverse correlation to transcript levels in oropharyngeal squamous cell carcinoma (OPSCC). A signature of low methylation levels in ALDH1A2 and OSR2 promoters, as well as high methylation levels in GATA4, GRIA4, and IRX4 promoters correlated well with improved survival in 3 independent patient cohorts [47].
Thus, the promotor hypermethylation status of OSR2 could be a potential predictor of survival in MIBC and a potential target of epigenetic therapies. In addition to copy number amplifications on the 8q22.2 OSR2 promotor, methylation could potentially influence OSR2 mRNA expression. Further research is needed to investigate this association.

Amplicon definitions are associated with unique respective clinicopathologic variables
Comparative analysis should include two gene expression levels (CN and mRNA) in correlations with phenotype and prognostic impact on survival. This is important because the prognostic value of the 8q22.2 core region, containing amplifications of RNF19A and SPAG1, seems to be highly dependent on the chosen amplicon definition. A potential relationship between an mRNA defined amplicon and lymph node status may exist, as RNA_AMP_Core patients were more likely to show lymph node positive disease. This is consistent with research conducted by Lindquist et al., who found gains in 22 genes on chr3p25 and chr11p11, which remained significantly associated with lymph node involvement in MIBC. The addition of CNA data improved discrimination relative to the use of clinical variables alone (p = 0.04). Gains in chr3p25 and chr11p11 were further associated with shorter overall survival periods [14].
Associations of chemotherapy resistance and 8q22 amplification have been shown by other groups. Clearly additional experiments are needed to address the potential involvement of 8q22.2 in treatment resistance [48,49].
Furthermore, luminal subtype MIBC may be associated with the presence of amplicons, as patients with core and extended amplicons in the 8q22.2 region were more likely to harbor a luminal subtype. The luminal infiltrated subtype shows a mesenchymal expression signature [7]. Thus, genes on 8q22.2 involved in mesenchymal differentiation, such as COX6C, could play a role in the pathogenesis of luminal MIBC. Further studies in this regard are essential in order to understand the underlying mechanisms.
This study has several weaknesses, including a lack of uniformity in techniques, algorithms, bioinformatics, and definitions used for the in silico investigation of amplicons in cancer.
Results of this study should be interpreted cautiously due to the heterogeneity of the cohorts, the small sample size of the validation cohort and the use of varying overexpression definitions (z-scores and median). Validation in larger independent cohorts is needed.
Additionally, we could not address the relevance of balanced or unbalanced gains because loss of heterozygosity was not assessed.
Nonetheless, our study is the first one to highlight the significance of the 8q22.2 region in MIBC and to identify potential targets in this region.

Conclusions
High OSR2 mRNA levels are associated with decreased DFS in silico and OS in RT-qPCR data, respectively, and could serve as a potential biomarker for MIBC. Although RNA_Amplicon_Core mRNA expression is high in lymph node positive patients, future research on this association is necessary.
The typical pattern of CNA and high mRNA expression could not be validated in every gene of the 8q22.2 region, as some genes showed an inverse relationship between CNA and mRNA expression.
The associations found in this study were detected through an in-depth and comprehensive molecular characterization, which included multiple levels of the genetic code and correlations with phenotype and prognostic impact on survival.
Clearly, this approach needs to be further validated in large independent cohorts and other genomic regions of MIBC.