Survival outcomes are associated with genomic instability in luminal breast cancers

Breast cancer is the leading cause of cancer related death among women. Breast cancers are generally diagnosed and treated based on clinical and histopathological features, along with subtype classification determined by the Prosigna Breast Cancer Prognostic Gene Signature Assay (also known as PAM50). Currently the copy number alteration (CNA) landscape of the tumour is not considered. We set out to examine the role of genomic instability (GI) in breast cancer survival since CNAs reflect GI and correlate with survival in other cancers. We focussed on the 70% of breast cancers classified as luminal and carried out a comprehensive survival and association analysis using Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) data to determine whether CNA burden quartiles derived from absolute CNA counts are associated with survival. Luminal A and B patients were stratified by PAM50 subtype and tumour grade and then tested for association with CNA burden using multiple statistical tests. Analysis revealed that patients diagnosed with luminal A grade 3 breast cancer have a CNA landscape associated with disease specific survival, suggesting that these patients could be classified as at-risk. Furthermore, luminal A grade 3 cases largely occupy a region of stratification based on gene expression at the boundary where luminal A and luminal B cases overlap. We conclude that GI reflected by absolute CNA score is a statistically robust prognostic factor for survival in luminal A grade 3 breast cancer. Therefore, luminal A grade 3 breast cancer patients in CNA burden quartiles 3 or 4 may benefit from more aggressive therapy. This demonstrates how individual genomic landscapes can facilitate personalisation of therapeutic interventions to optimise survival outcomes.

Advances in areas such as next generation sequencing have now led to breast cancer 8 being regarded as a collection of highly heterogeneous diseases with distinct molecular 9 and clinical phenotypes including disease progression rate, treatment response and 10 survival [1]. The molecular classification of breast cancer currently makes use of PAM50 11 intrinsic subtyping determined by the Prosigna Breast Cancer Prognostic Gene 12 Signature Assay (formerly called the PAM50 test) [6] based on gene expression 13 profiling [7,8]. This distinguishes luminal A (lumA), luminal B (lumB), human 14 epidermal growth factor receptor 2 (HER2 )-enriched and basal-like subtypes [6]. The 15 differences in gene expression patterns among these intrinsic subtypes reflect basic 16 alterations in the cell biology of the tumours [9]. Importantly, it has been observed that 17 ∼ 85% of the variations in gene expression patterns of breast cancers are as a result of 18 CNAs [1,10]. 19 Approximately 70% of breast cancers belong to the luminal subtypes lumA and 20 lumB [11] characterised by increased levels of estrogen receptor (ER) and progesterone 21 receptor (PR). LumA tumours display lower levels of genomic instability (GI) compared 22 to lumB tumours [11]. GI is regarded as a hallmark of cancer [12] and refers to an 23 increased tendency toward alterations in the genome during the life of cells. These 24 alterations range from single nucleotide alterations to large scale structural changes of 25 chromosomes, aneuploidy and whole genome duplications [12]. GI has the ability to 26 initiate cancer, affect progression and influence patient prognosis [13]. 27 Recent studies suggest that the relationship between lumA and lumB may be a 28 continuum rather than a strict division of subtypes [9][10][11]. It has also been hypothesised 29 that lumA tumours may evolve into lumB tumours as a result of stochastic acquisitions 30 of mutations in genes associated with worse prognosis, including HER2 and tumour 31 protein p53 (TP53 ) [11,14]. 32 At present breast cancer diagnosis and treatment follows an integrative approach 33 whereby both clinical and histopathological features such as age at diagnosis, tumour 34 size, lymph node metastasis and tumour grade are utilised alongside tissue derived 35 biomarkers [15]. However, it is widely accepted that breast cancer is largely dominated 36 by chromosomal rearrangements [1], and a growing body of evidence suggests that the 37 incorporation of the genomic landscape of the tumour into treatment decisions is 38 extremely beneficial to the patient [16,17]. 39 Several studies have shown that the copy number landscape of a tumour can affect 40 survival [1,18,19]. A pan-cancer analysis suggests that the CNA burden measured as 41 the percentage of the tumour genome with CNAs is associated with both overall 42 survival (OS) and disease specific survival (DSS) in a range of cancers including breast, 43 endometrial, renal, thyroid, and colorectal cancer [19]. Assessing aneuploidy in prostate 44 cancers at diagnosis has been shown to be more predictive of long term survival than Materials and methods 60 A CNA score was developed using the absolute CNA profiles of all luminal patients 61 profiled within the METABRIC consortium. This was calculated by summing the 62 absolute value of the scores for all genes. Cases were then assigned to ranked quartiles 63 as a first-order means of segmenting the CNA scores for analysis ( Figure 1). Survival 64 analysis was carried out for these quartiles using associated clinical data to determine 65 survival associated variables. Statistical association tests were then applied to validate 66 that the association between a given CNA score quartile and its survival outcomes was 67 due to the CNA score quartile and not to a confounding variable. Finally, Cox models 68 and the associated assumption tests were used to confirm that the survival outcomes are 69 associated with GI in specific cohorts of luminal breast cancers. In addition, the 70 quantile classification based on the gene expression analysis of Tishschenko et al.

71
(2016) [11] was utilised to examine the luminal stratification associated with cases where 72 GI affects survival. All analysis was conducted using the R statistical processing

76
METABRIC provides a well-annotated dataset of over 2,000 breast cancer cases with 77 long-term clinical follow-up data, transcriptomic and genomic data [10]. Cases have an 78 average follow-up time of 125 months and a maximum follow-up time of 355 months.

79
All CNA profiles, clinical patient and sample annotations for luminal patients (n = 80 1175) were obtained from cBioportal [20]. The METABRIC consortium [10] utilised 81 both the circular binary segmentation algorithm [21] and an adapted Hidden Markov 82 model [22] for segmentation, followed by CNA calling. The patient-specific somatic 83 CNA profile calls for each gene have values indicating homozygous deletion (-2), 84 hemizygous deletion (-1), diploidy (0), single copy gain (+1) and high level 85 amplification (+2). Quantile classification based on relative gene expression for luminal 86 METABRIC cases was obtained from Tishchenko et al. [11]. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 26, 2020. .

88
Clinical data and CNA profiles were extracted using Python (version 3.6.3) and 89 analysed using R (version 3.5.1) and RStudio (version 1.2.1335) with R packages 90 survival, survminer and gglplot2 [23][24][25]. Additional functions such as mutation analysis 91 using the R package maftools [26] were also implemented. These packages and 92 associated processing scripts were packaged into a bespoke R Shiny app with multiple  A number of recent studies report that CNAs reflecting GI are associated with survival 102 outcomes in several types of cancer [1,18,19]. We hypothesised that CNA quartiles 103 based on absolute CNA score would be associated with both OS and DSS in luminal 104 breast cancer patients. Patients within quartile 4 (Q4) have higher CNA scores 105 indicative of higher levels of GI and significantly worse survival outcomes than patients 106 in quartiles 1-3 (Q1-3) with less GI (Figure 3, p-values < 0.0001). Patients in Q4 had 107 OS and DSS rates of 32% and 61% respectively, while patients in Q1 had OS and DSS 108 rates of 48% and 82% respectively. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 26, 2020.  [27][28][29]. A number of 112 steps were taken to determine whether the association between survival outcomes and 113 CNA quartiles was the result of confounding variables, which are additional factors 114 influencing survival outcomes that are correlated with CNA quartiles. 115 First, a survival analysis using Kaplan-Meier (KM) plots and univariate Cox models 116 was carried out to identify whether any of 23 clinical variables (Supplementary Table 1) 117 and CNA quartiles in the luminal data were associated with survival outcome. We 118 found that 20 of the former and the CNA quartiles were associated with either OS or 119 DSS, or both, (Supplementary Table 3) and these were taken forward for examination 120 using statistical tests. A χ 2 test was used to test the association between two categorical 121 variables with sufficient cell sizes in the two-way table of categorical variables. Fisher's 122 exact test was used in the case where any cell size was sufficiently small. The  Table 4).  Table 5). The assumptions of the multivariate Cox regression model for 133 OS and DSS were tested and showed that the test was not statistically significant for The results of the univariate Cox analyses showed that the relationship between survival 143 and PAM50 subtype was highly statistically significant (Supplementary Table 3).

144
Furthermore, as the CNA quartiles progress from Q1 to Q4 so too does the proportion 145 of lumB tumours (Supplementary Figure 1). This is expected as lumA tumours 146 generally display lower levels of GI than lumB tumours [11].  Figures 4 and 9). 150 The subtype groups were found to be significantly associated with OS and DSS for both 151 lumA ( Supplementary Figures 5 and 6, and Supplementary  Analysis of lumA and lumB cases revealed that NPI was highly associated with both 161 survival and the CNA quartiles, so NPI can be considered as a confounding factor. NPI 162 incorporates tumour grade so the lumA and lumB cases were further stratified by grade. 163 CNA quartiles were not associated with OS or DSS in lumA grade 1 and 2 cases and  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 26, 2020.  Table 16). Statistical tests were used to determine which of these 178 variables associated with survival are also associated with the CNA quartiles and thus 179 could be potential confounding variables (Supplementary Table 17). Only age at 180 diagnosis was significantly associated with the CNA quartiles meaning it was a 181 confounding factor in the OS analysis.

182
To formally quantify the effects of CNA quartiles on OS and DSS in lumA grade 3 183 patients, Cox PH models were fitted to the data. A multivariate Cox regression model 184 was used to examine the survival association of CNA quartiles and age at diagnosis.  test p-value = 0.025). This is in agreement with the KM analysis ( Figure 4B). PH 191 assumptions were checked using statistical tests and graphical diagnostics based on 192 scaled Schoenfeld residuals as the Cox PH model makes several assumptions. The test 193 was not statistically significant for each of the variables, and the global test was also not 194 statistically significant (Supplementary Figures 21-22). Hence, it can be concluded that 195 the association between CNA quartiles and DSS in lumA grade 3 breast cancer patients 196 in the METABRIC data was a direct result of CNA quartiles.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 26, 2020.   Figure 32). This is consistent with increased GI detected by our CNA score preceding 222 the transition in expression of cell proliferation genes. Together with the association of 223 CNA score with survival outcome for lumA grade 3 cases, this suggests GI profiles could 224 be informative for treatment decisions for luminal breast cancers.

226
We analysed the effect of CNA burden on breast cancer survival by implementing a 227 series of statistically robust tools for interrogation of the rich and well annotated 228 METABRIC dataset. We assigned each patient to a group by quartile segmentation of 229 the summed distribution profile of absolute CNA scores, providing a first order measure 230 of CNA burden that is a more realistic representation of GI compared to the more 231 simplistic binary segmentation used in previous studies [1]. Our analysis revealed that 232 the presence of high CNA levels in the tumour genomes of patients with lumA grade 3 233 breast cancer is associated with worse DSS outcomes, based on the clinical information 234 and CNA profiles of 1175 luminal breast cancer patients registered in the METABRIC 235 archive.

236
The observed difference in survival outcomes could be the result of either a true 237 association between survival and CNA quartiles, or the result of confounding factors.

238
LumA and lumB cases were considered separately since the PAM50 subtype was 239 determined to be a significant confounding factor in the overall luminal analysis. A 240 significant association was again observed between survival outcomes and CNA quartiles 241 for both subtypes.

242
It was determined that NPI was one of the most significant confounding factors for 243 the association in both lumA and lumB subtypes using the Kruskal-Wallis test. NPI 244 incorporates tumour grade so lumA and lumB patients were further stratified based on 245 grade. CNA score quartiles within lumA grade 1 and 2 and lumB grade 2 and 3 cases 246 were not associated with OS or DSS whereas CNA quartiles within lumA grade 3 and 247 lumB grade 1 cases did show a statistically significant association with DSS. Due to the 248 small sample size per quartile for lumB grade 1 patients this association was viewed with 249 caution. Further analysis of lumA grade 3 cases revealed that CNA score quartiles were 250 February 22, 2020 9/17 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 26, 2020. with approximately 25% of patients in each group. The incidence of lumB tumours was 256 found to increase along with the patients risk level in the progression from quantile 1 to 257 4 by these authors. This led to their hypothesis that luminal tumours represent a 258 continuum whose subtype range correlates with increasing genomic damage.

259
The lumA grade 3 tumours identified in our analysis largely correspond to quantiles 260 2 and 3 of the Tishchenko et al. study and occupy the region of PAM50 subtype 261 stratification where the boundaries between lumA and lumB cases overlap. This implies 262 that the lumA grade 3 cases we have identified display gene expression levels that more 263 closely resemble those associated with the lumB category. Therefore, our work provides 264 further support for the proposal of a gradient in luminal classification [11] by providing 265 a robust statistical validation of the association between CNA burden and survival 266 outcome for lumA cases at the boundary where lumA and lumB cases overlap in cell 267 profileration gene expression.

268
LumA grade 3 patients who belong to both higher CNA score quartiles and higher 269 gene expression quantiles are at particular risk for long term survival outcome. This has 270 potential clinical utility because these patients are potentially not well stratified by the 271 PAM50 subtype but can be identified by the simple measure of a CNA score. Therefore, 272 patients classified as lumA grade 3 that display gene expression levels more akin to 273 lumB tumours and also have high CNA burden may benefit from the more aggressive 274 treatment regime used for lumB patients in contrast to standard endocrine therapy for 275 lumA patients [30]. 276 We defined CNA score as the sum of the absolute CNA values over all genes per   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 26, 2020. . https://doi.org/10.1101/2020.02. 25.20027920 doi: medRxiv preprint number of women diagnosed is increasing and the majority of cases belong to luminal 301 subtypes. We analysed freely available clinical and genomic patient data from the 302 METABRIC dataset to study the impact of CNA burden on overall survival within the 303 luminal subtypes. We observed that CNA quartiles based on absolute CNA score are a 304 prognostic factor for breast cancer survival outcomes in a subset of patients suffering  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 26, 2020. . https://doi.org/10.1101/2020.02.25.20027920 doi: medRxiv preprint S13