Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer

Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable.


Introduction
Comparative genomics have been extensively used to identify DNA copy number variations in cancer. At the chromosomal level, Mertens et al. assessed the distribution of chromosomal gains and losses in published karyotypes from 11 tumour types including 333 cases of colorectal carcinomas (CRCs) [1]. In CRC, recurrent gains in chromosomes 7, 8q, 13 and 20; and losses of lp, 5q, 8p, 14p, 17p, 18 and 22 were found [2,3]. Later, metaphase-based comparative genomic hybridization (m-CGH), a technique of about 5 million bases resolution, was utilized to decipher chromosomal copy number changes in CRC progression. A plethora of studies used m-CGH to identify chromosomal imbalances in CRC, which included gain of chromosomes 1, 13 and 20 and chromosome arms 7p and 8q, whereas chromosome 4 and chromosome arms 8p and 10q were frequently deleted [2][3][4][5][6][7]. However, the large size of genomic aberrations identified by these low resolution techniques, which may contain hundreds of genes, prohibited the precise identification of DNA stretches involved in CRC progression [8][9][10][11][12]. In 2004, array-based comparative genomic hybridization (aCGH) emerged as a more promising technology for studying copy number variations at higher resolutions even from formalin-fixed paraffin-embedded (FFPE) archived material [13][14][15][16][17]. Bacterial artificial chromosome or BAC-based microarrays have a resolution of 1 million bases, while oligonucleotide-based microarrays have a much higher resolution of 8.9 KB overall median probe spacing (7.4 KB in Refseq genes) with 44,000 to 4.5 million probes dotted on slides, many of which densely cover all known or possible human genes [8]. Oligonucleotide-based aCGH is now capable of identifying copy number alterations in few thousands of bases or smaller [18][19][20].
This powerful technology, however, comes with its own limitations. For example, similar or even the same samples performed on different platforms may yield significantly different results. Moreover, the lack of standardized bioinformatic algorithms or analytical methods used to detect genomic aberrations complicates conclusions even further. These are compounded by the inherent heterogeneous nature of cancer evolution [21]. Seldom do aCGHbased studies account for such biologically confounding variables. Consequently, previously published colorectal cancer-related aCGH studies have yielded a high level of discordance in the reported genomic aberrations of colorectal cancer [20,22,23]. In addition, traditional means of classifying the importance of cancer-related copy number aberrations (CNAs) include the frequency of their occurrence in different patients. However, cancer genomes are highly complex and frequently harbor random 'passenger' CNAs that are of no functional significance [24]. To alleviate interference from those non-random CNAs, a systematic method, termed Genomic Identification of Significant Targets in Cancer (GISTIC) was recently developed and used in identifying biologically significant CNAs in several cancer types. The GISTIC algorithm determines a 'G' score based on the frequency and amplitude of the gains and losses. By giving more weight to high copy gains and homozygous losses (amplitude), the GISTIC algorithm argues that such aberrations may be more functionally relevant to the successful evolution of the cancer genome [25][26][27].
Here, we have utilized genomic high-density oligonucleotide-based microarrays to identify CNAs in well-defined colorectal cancers. In addition, we used the GISTIC algorithm to identify 'driver' chromosomal aberrations in colorectal cancer. We identified 3 distinct CNA patterns in CRCs and show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the evolutionary and biologically relevant driver mutations of MSI and MSS CRC are not as dissimilar with respect to non-randomcopy number aberrations as traditional methods have previously suggested [15,28].

CRC samples
A total of 116 CRC patients were recruited for this study. The study's protocols were approved by by the Health Sciences Center (HSC) and Kuwait Institute for Medical Specialization (KIMS) joint committee for the protection of human subjects in research. Written informed consent was obtained from all patients before their inclusion in the study. DNA extracted from formalin-fixed paraffin-embedded (FFPE) tissues from 96 patients with sporadic early stage II CRC were used for genomic profiling. Genomic DNA was isolated from microdissected FFPE CRC tissues as described previously [29].
aCGH for FFPE samples a. Labeling of genomic DNA. Human Genome CGH Microarray 244A slides (Agilent Technologies, CA, USA) were used for FFPE extracted DNA samples. We followed the protocol described in [30,31]. A total of 2.5 μg sex matched control DNA (Promega, WI, USA) was fragmented by sonication. The FFPE DNA was fragmented only if there was any large molecular weight DNA. About 500 ng of the fragmented samples were then run on 1.5% agarose gel for 1 hour to check the extent of fragmentation of the DNA. Once fragmentation was deemed appropriate, 2 μg of the control DNA was labeled with Cy3 (Agilent technologies, CA, USA) and 2μg of FFPE DNA with Cy5 (Agilent technologies, CA, USA) for 30 minutes at 85˚C. After labeling the DNA was purified using KREA pure columns (Agilent technologies, CA, USA). The samples were then measured on a Nanodrop and Degree of Labeling (DOL) was calculated according to the following formula; The samples were hybridized onto microarray slides only if the DOL was between 1.5-2.5%.
b. Hybridization of labeled DNA. Appropriate volumes of Human Cot-I DNA (Invitrogen, CA, USA), 10X Blocking Agent (Agilent Technologies) and 2X Hybridization buffer (Agilent Technologies) were added to the paired Cy5 and Cy3 labeled DNA and the hybridization mix was mixed by pipetting gently. The Hybridization cocktail was then incubated at 95˚C for 3 minutes, then immediately followed by 30 minutes at 37˚C. The tubes were spun to collect the samples and an appropriate volume of Agilent-CGH block buffer (Agilent technologies, CA, USA) was added to the hybridization cocktail. The samples were mixed gently and centrifuged for collection. We dispensed 490 μl of the hybridization cocktail onto a clean gasket slide, which was already placed into a hybridization chamber. A microarray slide was placed active side down onto the gasket. The chamber was assembled and incubated in the hybridization rotating oven (Agilent technologies, CA, USA) for 40 hours at 60˚C and 20 rpm.
c. Washing of the microarray slide. The hybridization chamber was disassembled carefully and the microarray slide sandwich was completely submerged into wash buffer 1 (Agilent technologies, CA, USA) at room temperature. The slides were then gently pried open using a pair of forceps and the gasket was allowed to drop to the bottom of the jar. The microarray slide was quickly transferred to a slide rack submerged in wash buffer 1 and incubated for 5 minutes. Then the rack was transferred to the next dish containing wash buffer 2 (Agilent technologies, CA, USA) at 37˚C for 1 minute. The slide rack was then transferred to a dish containing Acetonitrile for 1 minute followed by 30 seconds in Stabilization solution (Agilent technologies). The rack was removed carefully in order to minimize the number of droplets on the slide. The slides were scanned immediately on an Agilent scanner. d. Data analysis of the sample. Scanned images were imported; background subtracted and normalized using Feature extraction software version 10.7.1.1 (Agilent Technologies, CA, USA). The feature extraction software generates a quality Control Report which helps determine the quality of the aCGH. Quality Control metrics such as derivative of log ratio spread (DLRS), background noise (BG noise), signal intensity, reproducibility and signal to noise ratio are generated in the QC report.DLR Spread is defined as the spread of the Log Ratio differences between consecutive probes along all chromosomes. It is the most important metric as it gives us the ability to measure noise of the log ratio independent from the number and severity of aberrations found, making it instrumental in assessing the overall quality of each microarray experiment. If the DLR Spread value was higher than 0.5, the cases were excluded from further analysis. The text files representing data ratio points log2 of test/control ratios were imported to Nexus software (Biodiscovery, CA, U.S.A). Quality values ranged between 0.05-0.4, which are excellent values given the degraded nature of the samples. To minimize false positive calls and random CNV variations, Fast Adaptive State Segmentation Technique (FASST2) with a stringent significance threshold of 5.0E-6 was used to determine copy number aberrations. Moreover, we utilized two algorithms to more accurately reflect functional CNVs and separate them from bystander genomic aberrations. The first termed Significance Testing for Aberrant Copy number (STAC) algorithm, and the second is a systematic method termed Genomic Identification of Significant Targets in Cancer (GISTIC) o identify biologically significant copy number aberrations in these samples. Both algorithms were calculated using Nexus software version 8 (Biodiscovery, El Segundo CA, USA)

Multiplex Ligation-dependent Probe Amplification (MLPA)
DNA was diluted to a working stock concentration of 50 ng/μl. A total of 250 ng (5μl) of DNA was aliquoted into sterile 0.2 ml tubes and denatured and then cooled to 25˚C in a thermal cycler. A master mix containing hybridization components supplied as part of the kit (SALSA MLPA probemix P294-B1 Tumour-Loss from MRC Holland, Amsterdam, Netherlands) was prepared and added to the samples at 25˚C and the reaction was mixed by pipetting. The hybridization reaction was carried out according to the manufacturer's instructions and the samples were incubated overnight at 60˚C. A ligase buffer mix was prepared with reagents supplied with the kit and ligation was carried out with Liagse-65. Following ligation, a PCR master mix was prepared using SALSA PCR reagents from the kit and PCR reaction was carried out by mixing 10μl of ligation product with the PCR master mix in new tubes at temperatures recommended in the protocol. All reactions were carried according to manufacturer's protocol. The amplified PCR product was mixed with formamide, CEQ-600 marker, and 3μl of this MLPA PCR sample was then added to each well of a 96 well plate and a drop of mineral oil was added on top. Fragment separation was carried out by loading the plate into the CEQ8000 Genetic Analysis System according to manufacturer's protocol. CSV files generated from these runs were then imported into the Coffalyser Software (MRC Holland, Amsterdam, Netherlands) for MLPA analysis.

MSI fragment analysis
DNA was extracted from 116 macro-dissected colorectal tumors and MSI fragment analysis was performed on them and their matching normal using MSI analysis system version 1.2 kit (Promega Corporation, WI, USA). Powerplex Matrix Standards 3100/3130 kit (Promega Corporation, WI, USA) was used to perform spectral calibration of the Applied Biosystems 3130 Genetic Analyzer. The system allowed co-amplification of a total of seven markers including mononucleotide repeat markers (Bat-25, BAT-26, NR-21, NR-24 and MONO-27) and pentanucleotide repeat markers (Penta C and Penta D). MSI status was determined using the mononucleotide markers by comparing results from tumor and its matching normal, a cancer was classified as MSS when no length variations were detected between the samples for all the markers. The cancer was classified as MSI-high when 2 or more markers showed length variations between the tumors sample and its matching normal. We further classified each cancer into MSI or MSS subclasses by analyzing the expression of MLH1, MSH2, PMS2 and MSH6 using immunohistochemistry. Statistical correlation was performed for samples were both methods were used to determine MSI status.

Results and discussion
Genetic aberration in colorectal cancer detected using high resolution oligo-microarrays We have performed aCGH on a total of 150 cases of CRC out of which 116 yielded acceptable DLR Spread value below 0.5 and were utilized for further analysis. Table 1 shows the clinical characteristics of the cases that were utilized for aCGH analysis. The resultant aCGH profiles showed many chromosomal gains and deletions. The most frequent chromosomes involved in copy number gains in colorectal cancer were: Chromosomes 7 (56%), 8q (56%), 13 (61%), and 20 (79%). Chromosomal losses most frequently involved were chromosome arm 1p (71%), 8p (72%), 17p (55%), 22q (60%) and chromosomes 14 (77%), 15 (66%) and 18 (80%) (Fig 1). This data is consistent with previously published data from our and other groups [28][29][30][31][32][33][34]. However, these data represent a summation of all aberrations involved in colorectal cancer regardless of the type, stage, location and other molecular characteristics that may have a significant impact on the aCGH data obtained. Next, we subdivided this heterogeneous colorectal cancer set into genetically and phenotypically wellcharacterized sets and analyzed the copy number aberrations for each set using the GISTIC algorithm to obtain a functional list of genes involved in the aberrations within different subsets. It is worthy to note that our cohort was selectively biased towards TNM stage II (Dukes' B). Therefore, the frequencies of MSI and stage do not reflect population incidence. Microsatellite instability data was successfully obtained from 108 of 116 CRC cases (93%). Table 2 shows the clinico-pathological characteristics of MSS and MSI CRC cases. As expected, most MSI cases were associated with poor cancer differentiation and right sidedness.

Genomic landscape patterns in MSI and MSS CRC
Subdivision of aCGH profiles was performed to delineate distinct CNAs pertinent to the evolution of gentically and phenotypically distinct CRC subsets. Microarray data revealed 3 distinct patterns of genomic aberrations in CRC (Table 3). Pattern 1, characterized by small-sized and few gains and losses copy number alterations, was observed in 52.6% of MSI cases, and in 10.1% of MSS cases (Fig 2A). CRC with Pattern 2, on the contrary, have numerous losses and gains that cyclically alternate resembling a pattern previously described in solid tumors termed Chromothripsis [25][26][27][28][29][30][31][32][33][34][35][36][37][38]. Recently, using genome-wide long mate-pair sequencing and SNP microarray of CRC and their metastases, chromothripsis rearrangements have been found to Genomic profiling in MSI and MSS sporadic colorectal cancer occur frequently in CRC [39][40][41]. Since aCGH can only detect copy number alterations, and only that, we are not sure that these rearrangements represent true chromothripsis events. We, therefore, will refer to this pattern as chromothripsis-like or alternating copy state ( Fig 2B). Chromothripsis-like events were observed in 31.6% of MSI CRC cases. In MSS CRC cases, chromothripsis-like events were uncommon and documented in only 10% of the cases. This data indicated that chromothripsis-like events were more significantly associated with MSI than MSS genotypes (p = 0.024). Pattern 3, exemplified by large sized gains and losses, which may span chromosomal arms or whole chromosomes ( Fig 2C) were found in 15.8% of MSI CRC cases. As expected, this particular pattern was the most common in the genomic landscape of MSS CRC cases (89.9%).

Genomic aberrations in MSI and MSS CRC using supervised clustering analysis
Genetic aberrations generated by Supervised clustering from well-defined sets of samples, in this case 19 and 89 cases of MSI and MSS CRC respectively, are shown in (Fig 5A). Consistent with unsupervised clustering, the genomic aberration profiles of MSS and MSI CRC were largely identical to group 1 and group 2 respectively (Fig 5A). Direct comparison of the generated genomic profiles between MSS and MSI CRC, highlighted the extensive differences in Table 4. Clinicopathological characteristics of the 5-clusters generated by unsupervised complete Linkage Hierarchical clustering. genomic aberrations between the two groups ( Fig 5B). Chromosome arms 1p, 8p, 10q, 17p and chromosomes 4, 14, 15, 18 deletions were significantly more frequent in MSS than MSI CRC (Table 5). Similarly, chromosome arms 7p, 8q, 12q, 20q and chromosome 13 gains were more frequent in MSS compared to MSI CRC ( Fig 5B and Table 6). Other small aberrations on chromosomes 3p, 5, 21 and 22 were interesting. For example, the 3p small deletion was a common event (33%) in our CRC cases (Fig 5A and 5B). The deletions were well demarcated around exon 5 of the FHIT gene and appeared in both MSS and MSI CRC, although more significantly deleted in MSI cancers (Fig 5B). FHIT gene deletion and its reduced expression have been reported before in association with MSI CRC by our group [42].

Functional relevance of genomic aberrations in MSS and MSI CRC
It is well accepted that CNA in cancer involves many random genetic events that frequently distort and may conceal the important genomic aberrational events that are functionally relevant. Our classical comparison of the frequencies of genetic events between MSS and MSI CRC cases highlighted the extensive and widespread differences in genomic aberrations between the two groups ( Fig 5B). However, how 'genomically' close or apart are MSS and MSI CRC in terms of functional or driver mutations, is a question that has not been well-addressed before. Therefore, we next reanalyzed the 108 dataset using 2 independent algorithms that  focus on the functional relevance of a particular genomic aberration. First, we used STAC algorithm approach to identify statistically significant and recurrent genomic CNAs within subsets of samples under the same settings mentioned above. The second algorithm GISTIC, which identifies functionally significant CNAs by giving more weight to high copy gains and homozygous losses (amplitudes), which may be functionally relevant to the successful evolution of the cancer genome [27]. Such permutations are not considered in the STAC analysis we used. Therefore, the two methods may be considered complementary rather than comparable. Fig 6 shows functionally significant CNAs generated by the STAC algorithm in MSI and MSS CRC. The peaks identify aberrations that are significantly recurrent. As demonstrated before with the traditional frequency-based method, that MSS CRC accumulated more CNAs than MSI CRC, albeit, more functionally relevant here. A remarkable observation that emanated from the use of the STAC algorithm was the high-degree of similarity between the functionally-relevant copy number alterations in MSI and MSS CRC. On almost all the autosomes, functionally significant CNAs found in MSI CRC mapped precisely onto those found in MSS CRC (Fig 6). All genomic aberrations specific to MSI CRC are highlighted with arrows in Fig  6. These aberrations did not reach the 35% cutoff, or did not contain functional genes (chromosome 4) and therefore excluded by the algorithm.
We next employed the GISTIC algorithm on the same data set. Fig 7 shows the significant CNAs in both MSS ( Fig 7A) and MSI (Fig 7B) as broad and narrow regions represented by vertical grey lines. Again, these functionally significant aberrations were more common in MSS compared to MSI CRC cases (Tables 5 and 6). Genes involved in cellular senescence, S-phase control of the cell cycle, histone H4-K5 acetylation, nucleosome assembly and telomere maintenance were among the top 10 most significant GO-functions identified.
Deletions were the most overwhelming functional aberrations identified by the GISTIC algorithm in MSI CRC ( Fig 7B). As proposed by the STAC algorithm, the GISTIC-generated regions in MSI corresponded precisely to regions on the MSS histogram (Fig 7 and Table 5). However, GISTIC analysis identified 8 narrow autosomal regions specific to MSI CRC (Fig 7B  and 7C and Table 6). Our STAC profiling suggests, with the exception of 8 private genomic copy alterations identified by the GISTIC algorithm, that MSI and MSS CRC functional CNAs are similar. These shared genomic events may be key in the evolution of both types of CRC. These reserved CNAs might be critical to the oncogenesis in primary cellular transformation stages, later distinct CNAs specific to MSI and MSS might occur due to the influence of specific tumor microenvironments. For example, it has been shown that MSI distinct CNAs include immune related genes that alters their susceptibility to the immune system, hence MSI CRC's favorable prognosis [43][44]. Regardless of the mechanism of how such similarities in CNAs are attained in these two distinct groups, our data argue that, at least at the genomic copy number level, the evolutionary and biologically relevant driver mutations of MSI CRC are a subset of those found in MSS CRC, and that the two groups are not as dissimilar with respect to functional CNAs as traditional methods have previously suggested. We next tested this notion using MLPA tumor-loss panel (P294-A1), which profiles 20 key tumor suppressor genes that are frequently deleted in cancer with 50 probes spanning different exons of the corresponding genes. Due to the limited tissues available to us, MLPA was performed on microdissected DNA extracted from 44 cases (subset of 116 cases, used for aCGH). The MMR status was known for 40 cases (MSI n = 7 and MSS n = 33). Fig 8 shows the normalized copy number ratios for the 20 tumor suppressor genes distributed according to MMR status. The data show loss of 7 tumor suppressor genes in this cohort; CHD5, FHIT, TSC1 (exon 7), PTEN, NF1, SMAD4 (exon 5), and SMARCB1 (exon 9) irrespective of the MMR status (Fig 8).
The genomic aberrations detected in MSS and MSI may have a direct effect on genes' expression, which are thought to be the driving force of disease pathology of both types of tumors as suggested previously by several groups [45][46][47][48]. Kheurekseid et al., (2013) identified a comprehensive list of genes showing clear differential expression patterns in CRC. Most of these genes are located in genomic regions well affected by the aberrations found in our study. Unfortunately, we were limited by the use of FFPE tissues to pursue each of the genes located within the CNAs identified. Understanding the molecular impact of these expression The vertical grey lines indicate significant peaks of copy number alterations. Notice how these peaks align precisely in both MSS and MSI CRC. Copy number aberrations that are private for MSI CRC cases are marked with red arrows for deletions and blue arrows for amplifications. Below each red arrow are the corresponding chromosomal region ideogram and a zoomed histogram focused on the red arrowed region. The histograms show deletions in red and amplifications in blue and significant STAC-generated peaks in grey as stated above. Deletions marked with asterisks correspond to the deletions marked with the red arrows. All these aberrations are not considered in the exported regions because they do not reach the 35% cutoff value (marked horizontal black lines on chromosome 9). Note how the marked FHIT deletion crosses the 35% line. The individual copy number alteration for each patient from 19 MSI cases are shown as small vertical red and blue bars indicating deletions and amplifications respectively. Note how well-demarcated are these recurrent aberrations. alterations would provide a better understanding of the molecular pathology of CRC. Localizing the culprit genes involved in CNAs can also serve as molecular markers to aid in diagnosis, follow up and most importantly potential development of individualized therapeutic strategies. This study therefore was limited in its approach and may benefit from a direct correlation between genes located within the CNAs and their expression. Nevertheless, this study illuminated genomic deletions as a possible mechanism for loss or reduced expression of the 7 tumor suppressor genes regardless of the MMR status. The FHIT gene deletion was identified previously by us and others as a mechanism for the reduced FHIT protein expression in CRC [42,49]. The tumor suppressors TSC1 and PTEN protein expression were shown to be reduced in CRC compared to normal tissues or adenomas [50,51]. Similarly, SMAD4 and SMARCB1 expression were absent/reduced in 64% [52], 67% [53] of CRC respectively. Interestingly, CHD5 protein expression was repressed in the majority of adenomas via genetic deletion, hypermethylation or by microRNA-211 suppression [54,55]. Our study is supportive of genomic deletions as partly the reason behind the lack or reduced expression of these tumor suppressor genes in CRC regardless of the MMR status.

Conclusion
The last decade witnessed a substantial increase in our understanding of the genomic landscape of CRC. The use of low resolution genome-profiling technologies, like BAC-microarrays to identify copy number differences between MSS and MSI CRCs propagated the notion that these two subtypes of CRC are substantially different at the copy number level. Our high resolution genetic profiling of CRC supports this genomic heterogeneity between MSS and MSI CRC. For example, MSS-CRC had more numerous larger aberrations while MSI-CRC had smaller-sized aberrations. Moreover, there were distinct patterns of genomic aberrations in CRC, one of which, pattern 2 (chromothripsis-like) was significantly enriched in MSI CRC, while MSS CRC was significantly associated with Pattern 3 (loss/gain of whole chromosomes or chromosome arms). In fact, the examination of the genomic histograms showed significant differences between MSI and MSS as previously reported. However, closer inspection of the patterns generated suggests that, although there were unique copy number differences, MSI and MSS CRC functional copy number aberrations are similar. These shared genomic events could highlight key events in the evolution of both types of CRC. This finding supports the hypothesis that driver mutations incurred by CNAs in MSI CRC are also found in MSS CRC, and that the two groups are not as dissimilar in respect to CRC development. Our results suggest that MSI and MSS copy number aberrations driving CRC development may be functionally comparable.