Mapping of Genetic Abnormalities of Primary Tumours from Metastatic CRC by High-Resolution SNP Arrays

Background For years, the genetics of metastatic colorectal cancer (CRC) have been studied using a variety of techniques. However, most of the approaches employed so far have a relatively limited resolution which hampers detailed characterization of the common recurrent chromosomal breakpoints as well as the identification of small regions carrying genetic changes and the genes involved in them. Methodology/Principal Findings Here we applied 500K SNP arrays to map the most common chromosomal lesions present at diagnosis in a series of 23 primary tumours from sporadic CRC patients who had developed liver metastasis. Overall our results confirm that the genetic profile of metastatic CRC is defined by imbalanced gains of chromosomes 7, 8q, 11q, 13q, 20q and X together with losses of the 1p, 8p, 17p and 18q chromosome regions. In addition, SNP-array studies allowed the identification of small (<1.3 Mb) and extensive/large (>1.5 Mb) altered DNA sequences, many of which contain cancer genes known to be involved in CRC and the metastatic process. Detailed characterization of the breakpoint regions for the altered chromosomes showed four recurrent breakpoints at chromosomes 1p12, 8p12, 17p11.2 and 20p12.1; interestingly, the most frequently observed recurrent chromosomal breakpoint was localized at 17p11.2 and systematically targeted the FAM27L gene, whose role in CRC deserves further investigations. Conclusions/Significance In summary, in the present study we provide a detailed map of the genetic abnormalities of primary tumours from metastatic CRC patients, which confirm and extend on previous observations as regards the identification of genes potentially involved in development of CRC and the metastatic process.


Introduction
The development and progression of CRC is a multistep process leading to the accumulation of genomic alterations that occur at the single cell level over the lifetime of a tumour, from benign to invasive and metastatic states leading to patient death [1,2]. For many years, the genetics of metastatic CRC have been studied with an increasingly high variety of techniques from conventional cytogenetics [3] and fluorescence in situ hybridization (FISH) [4] to comparative genomic hybridization (CGH) [5] and array CGH (aCGH) [6]. Based on these techniques, many different recurrent genetic abnormalities have been identified in metastatic CRC which frequently include gains of chromosomes 8q, 13q and 20q [7,8] together with losses of the 1p, 8p, 17p and 18q chromosomal regions [9]. By contrast, detailed characterization of the common breakpoint regions as well as the identification of the specific genes targeted by such abnormalities has proven difficult with these approaches. This is partially due to the fact that these techniques have a relatively limited resolution which hampers identification of the specific cancer-associated genes recurrently targeted in such alterations. In fact, the highest resolution approaches applied so far to the study of CRC are based on aCGH (i.e. Camps et al who applied a 185K oligonucleotide array with an estimated resolution of 16 kb, to the analysis of 32 primary CRC tumours) [10].
In recent years, the availability of high-density single nucleotide polymorphism (SNP) arrays has allowed identification of small regions of chromosomal gains and losses with a much higher resolution, down to 2.5 kb [11]. Thus, based on genome wide SNP arrays, fine mapping of chromosomal breakpoints and subsequent identification of the specific genes recurrently altered (deleted, gained or amplified) is achieved for individual samples. This allows for a more precise and detailed comparison of the breakpoint regions found in different tumours and their correlation with the clinical features of the disease.
In the present study we used 500K SNP mapping arrays with a mean distance between interrogated SNPs of 5.8 kb (median intermarker distance of 2.5 kb) to map genetic lesions present at diagnosis in primary tumours from a group of 23 sporadic CRC patients who developed liver metastasis. Our major goal was to define the most frequent recurrent breakpoint regions in metastatic CRC and the commonly gained and/or deleted genes in the altered chromosomes. In order to evaluate the reproducibility of the SNP-array results we performed parallel interphase FISH (iFISH) analyses of the same tumour samples using 24 probes directed against an identical number of regions from 20 different human chromosomes frequently altered in sporadic CRC.

Patients and samples
Tissue specimens were obtained from primary tumours from 23 patients (15 males and 8 females; median age of 68 years, ranging from 48 to 80 years) suffering from metastatic sporadic CRC. The study was approved by the local ethics committee of the University Hospital of Salamanca (Salamanca, Spain) and prior to entering to the study, informed consent was given by each individual.
In each case, the diagnosis and the classification of the tumours were performed according to the WHO criteria [12]. According to tumour grade, 13 cases corresponded to well-differentiated CRC, 8 to moderately-and 2 to poorly-differentiated tumours. Histopathological grade was confirmed in all cases in a second independent evaluation by an experienced pathologist.
From the 23 primary tumors, 16 were localized at the right (caecum, ascending or trasverse) or the left (descending and sigmoid) colon and 7 in the rectum. Mean size of primary tumors was of 5.261.8 cm with the following distribution according to the TNM stage [13]: T3N0M1, 3 cases; T3N1M1, 9; T3N2M1, 3; T4N0M1, 5; T4N1M1, 1 and; T4N2M1, 2 patients. In all cases paired liver metastases were identified either at the time of colorectal surgery (n = 14) or during the first year after initial diagnosis (n = 9); the mean size of the largest liver metastases/ patient was of 5.362.8 cm (range: 2 to 10 cm).
After histopathological diagnosis was established, samples from representative areas of the primary tumours showing macroscopical infiltration, were used to prepare single cell suspensions to be stored (220uC) in methanol/acetic (3/1; vol/vol) for further iFISH analyses [14]. The remaining tissue was either fixed in formalin and embedded in paraffin or frozen in liquid nitrogen, and stored at room temperature (RT) and at 280uC, respectively. From the paraffin-embedded tissue samples, sections were cut from three different areas representative of the tumoural tissue used to prepare single cell suspensions and placed over poly Llysine coated slides. All tissues were evaluated after hematoxylineosin staining to confirm the presence of tumour cells and evaluate their quantity in samples to be studied by both iFISH and SNP-arrays. For SNP-array studies, tumour DNA was extracted from freshly-frozen tumour tissues mirror cut to those used for iFISH analyses which contained $65% epithelial tumour cells. In turn, normal DNA was extracted from matched peripheral blood (PB) leucocytes from the same patient. For both types of samples (tumour tissue and PB leucocytes), DNA was extracted using the QIAamp DNA mini kit (Qiagen, Hilden, Germany) following the manufacturer's instructions.

Analysis of single nucleotide polymorphism (SNP) arrays
Paired samples of purified tumoural DNA and normal PB DNA from individual patients were hybridized to two 250K Affymetrix SNP Mapping arrays (NspI and StyI SNP arrays, Affymetrix, Santa Clara, CA) using a total of 250 ng of DNA per array, according to the instructions of the manufacturer. Fluorescence signals were detected using the GeneChip Scanner 3000 (Affymetrix). Average genotyping call rates of 94.4% and 97.3% were obtained for tumoral and paired normal PB DNA samples, respectively. Only those SNPs with a call rate $92.3% were used for further analyses.
In order to calculate genome-wide copy number (CN) changes in tumoural vs. normal samples, the aroma.affymetrix algorithm was used, following the CRMA v2 method, as described elsewhere (Rsoftware package, Berkeley, CA) [15]. The following sequential steps were used for this purpose: i) calibration for crosstalks between pairs of allele probes; ii) normalization for probe nucleotide-sequence effects, and; iii) normalization for PCR fragment length-and probe localization-dependent effects. Then, data derived from both the 250K StyI and the 250K NspI arrays was integrated into a single database and raw CN values calculated as transformed log2 values of the tumoural/normal ratio obtained for paired SNP fluorescence signals.
Log2 ratio values were then used to identify DNA regions which showed similar CN values, using the Circular Binary Segmentation (CBS) algorithm [16]. For the identification of altered (gained or lost) DNA regions, a threshold was established based on the changes observed in the log2 CN values (fluorescence intensity ratio) of sequential tumour DNA segments found for each individual. Therefore, log2 ratio .0.09 and ,20.09 were used as cut-off thresholds to define the presence of increased and decreased CN values, respectively. High-level gains (amplifications) were defined as regions with a mean log2 CN ratio $0.22 for $3 contiguous SNPs. The specific frequencies of both CN gains and losses per SNP were established and plotted along individual chromosomes for each individual case analyzed. Minimal common regions (MCR) of gain and loss were defined as the smallest group of contiguous SNPs ($3) with a high frequency of gains and losses (Z-score threshold $2.1) according to the overall distribution of CN values found in the entire tumour cell genome, respectively. Common recurrent breakpoint regions were defined as those chromosomal regions which recurrently showed transition from one CN state (gain, loss or no-change) to another for the whole set of individual samples analyzed, at a frequency of $35% of the cases (n = 8/23 samples).

Interphase fluorescence in situ hybridization (iFISH) studies
In all cases, iFISH studies were performed on an aliquot of the single cell suspension prepared from the tumour sample. A set of 24 locus-specific FISH probes directed against DNA sequences localized in 20 different human chromosomes, specific for those chromosomal regions more frequently gained or deleted in sporadic CRC [4,6,8,17,18] were systematically used to validate the results obtained with the SNP arrays ( Table 1).
The methods and procedures used for the iFISH studies have been previously described in detail [19]. Briefly, dried slides containing both the tumour cells' and the probes' DNA were denatured (1 min at 75uC) and hybridized overnight (37uC) in a Hybrite termocycler (Vysis Inc, Downers Grove, IL, USA). After this incubation, slides were sequentially washed (5 min at 46uC) in 50% formamide in a 26 saline sodium citrate buffer (SSC) and in 2XSSC. Finally, nuclei were counterstained with 35 mL of a mounting medium containing 75 ng/ml of 4,6-diamidino 2phenylindole (DAPI; Sigma, St Louis, MO, USA); Vectashield (Vector Laboratories Inc, Burlingame, CA, USA) was used as antifading agent.
A BX60 fluorescence microscope (Olympus, Hamburg, Germany) equipped with a 1006 oil objective was used to count the number of hybridization spots/nuclei for $200 cells/sample. Only those spots with a similar size, intensity and shape were counted in areas with ,1% unhybridized cells; doublet signals were considered as single spots. A tumour was considered to carry a numerical abnormality for a given chromosomal region when the proportion of cells displaying an abnormal number of hybridization spots for the corresponding probe was at a percentage higher or lower than the mean value plus two standard deviations (SD) of the mean percentage obtained with the same probe in control samples (n = 10).

Quantitative Real-Time PCR
In order to validate the results obtained in the SNP-array studies, quantitative real-time polymerase chain reaction (RQ-PCR) was performed using the Step One Plus Real-Time PCR System (Applied Biosystems, Foster City, CA) in matched normal and tumoural samples in 18/23 cases. Expression of the MAP2K4, MYC and BIRC7 genes was analyzed. We employed TaqManH Gene Expression Assays designed by Applied Biosystems (Applied Biosystems, Foster City, CA) according to the manufacturers instructions, and the assays ID for the genes studied were as follows: Hs_00387426-m1 (MAP2K4), Hs_00153408-m1 (MYC) and Hs_00223384-m1 (BIRC7).
Each PCR was carried out in duplicate in a 10 uL volume using the TaqManH Fast Universal Mastermix (Applied Biosystems) and the following cycling parameters: incubation at 95uC (20 sec), followed by 50 cycles at 95uC (1 sec) and an incubation at 60uC (20 sec). Analysis was made using StepOne software v2.0. The obtained data were normalized by using the internal housekeeping gene, GAPDH. Relative quantification was calculated using the equation 2 2DCT = C TGENE -C TGAPDH. The final mRNA expression index in each sample was calculated as follows (arbitrary units; AU): mRNA expression index = MYC or MAP2K4 or BIRC7 mRNA value/ GAPDH mRNA value X 10,000 AU.

Statistical methods
For all continuous variables, mean values (and SD) and range were calculated using the SPSS software package (SPSS 12.0 Inc, Chicago, IL USA); for dichotomic variables, frequencies were reported. In order to evaluate the statistical significance of differences observed between groups, the Mann-Whitney U and X 2 tests were used for continuous and categorical variables, respectively (SPSS).
A multivariate stepwise regression analysis (regression, SPSS) was performed to determine the correlation between the structural and/or numerical abnormalities found for both iFISH, SNP-array techniques and their relationship with the expression of those genes analyzed by RQ-PCR. Only those iFISH probes with $12 SNPs localized in the iFISH mapped region (Table 1) were used for correlation studies with the CN status identified by the SNP array (gain vs. loss vs. no change) for those SNPs localized at each iFISH region. P-values ,.01 were considered to be associated with statistical significance.
Of note, SNP arrays allowed the identification of 43 small DNA sequences (arbitrarily defined as regions of ,1300 kb) which displayed recurrent CN changes (gains and losses). Interestingly, most of those regions which showed recurrent CN changes (n = 28/43) contained at least one known well-characterized gene, five contained known cancer-associated genes and one region held a microRNA gene (MIR1208), localized at chromosome 8q24.21 ( Table 2). The exact number of small regions characterized by CN changes, as well as the relative proportion of CN gains vs. losses varied widely among the different chromosomes. The 43 small regions containing CN gains and losses were coded in those chromosomes more frequently affected by CN changes and their arrays were localized in chromosomes 1p, 8p, 17p and 18, and involved the whole chromosome 7 and the 8q, 13q and 20q chromosome regions, respectively. doi:10.1371/journal.pone.0013752.g001  each of these larger regions has been previously associated with malignancy and contained genes i) relevant to the metastatic process (i.e. : TPD52, FABP5, MAP2K4, LLGL1, TOP3A,  ALDH3A2, UPK3A, FBLN1, TYMP), ii) associated with intracellular signaling processes (i.e.: PAG1, ELAC2, RASD1 and TNFRSF13B) and iii) genes involved in the regulation of the cell cycle (i.e.: FLCN, PEMT and XIAP); in turn, three of these large CN regions showing CN losses and one with CN gains contained a total of 8 known microRNAs (Table 3).
Interestingly, we recorded a statistically significant association between tumour grade and presence of gains/amplifications at the 20p13 chromosomal region localized between the 2,574,587 and 2,993,797 bp positions and assessed by 66 SNPs with a greater frequency of well-vs moderately-differentiated tumours-(11/13 (85%) vs 2/8 (25%); p = 0.005) among cases with this chromosomal alteration.

Recurrent chromosomal breakpoints identified by SNParrays
Based on the analysis of the distribution of chromosomal breakpoints defined by the SNP-arrays, four recurrent chromosomal breakpoints (arbitrarily defined as DNA segments showing CN changes in more than one third of the cases) were identified at chromosomes 1p12, 8p12, 17p11.2 and 20p12.1 ( Figure S1). Chromosomes 1, 8 and 20 showed a high number (.145) of different breakpoint regions with a variable and heterogeneous distribution; in contrast, a highly prevalent breakpoint region was identified in the centromeric portion of chromosome 17p, between the genome coordinates 20,156,497 bp and 22,975,771 bp (15/19 patients with abnormalities for this chromosome), and a minimum size of 28.2 Mb for the recurrent breakpoint. In these 15 cases, the first gene affected on the retained telomeric side of the breakpoint region was the CYTSB gene and the first constantly deleted gene on the centromeric side was the FAM27L gene. Interestingly, in 13 of these 15 patients a preferential breakpoint occurred at the 21,769,828-22,975,771 genome coordinate where the FAM27L gene is coded.

Correlation between the chromosomal changes detected by SNP-arrays and both iFISH and RQ-PCR studies
In order to evaluate the consistency of the chromosomal changes identified by the SNP-arrays, iFISH analysis were performed in parallel for a total of 24 chromosome regions from 20 different chromosomes. Overall our results showed a high degree of correlation (mean r 2 of 0.73602; range: 0.65 to 0.91) between both methods, including when such analysis was restricted to the most frequently altered regions (r 2 $0.67) ( Table 5).
In order to assess the impact of the information generated by SNP arrays, the expression of three genes (MAP2K4, MYC and BIRC7) was further analyzed in detail using RQ-PCR. As expected from the SNP-array data, the MYC and BIRC7 relative transcript levels were up-regulated in 15/18 (83%) and 14/18 (78%) tumours analyzed, respectively. Conversely, the MAP2K4 gene was downregulated in 16/18 (89%) tumours ( Figure 3). Upon comparing the results obtained with the two methods, a significant (p,0.001) correlation was observed between the microarray data and the expression of the three genes evaluated by RQ-PCR techniques with correlation coefficients (r 2 ) of 0.88, 0.66 and 0.64 for MAP2K4, MYC and BIRC7 genes, respectively.

Discussion
In this study we describe a comprehensive map of the genetic abnormalities present in primary tumors from metastatic CRC through the usage of high-resolution 500K SNP arrays. To our knowledge this is the most extensive study using high-resolution SNP-arrays to define the genetic alterations in this subgroup of CRC patients. Overall, our results confirm previous analyses using chromosome banding techniques [20], CGH [5], SKY [21], aCGH [6,10] and low-resolution 50k SNP-arrays [22].
Previous reports in which similar SNP-array tools have been applied to investigate the genetic profile of non-metastatic CRC [23] have shown in a subset of patients with advanced carcinomas in the absence of liver metastases (n = 18), a relatively low frequency of 1p, 8p, 9q, 14 and 17p losses and unique amplifications at chromosome 20q. Interestingly, among our series of metastatic CRC patients the frequency of losses at the same chromosomal regions was strikingly higher: 1p, 74% vs 11%; 8p, 78% vs 33%; 9q, 35% vs 6%; 14, 65% vs 39%; and; 17p, 83% vs 33%. In turn, we also detected additional amplifications at 7p, 8q and 13q, as well as at the 20q chromosomal region. In line with our observations, Al-Mulla et al [24] also found that, once compared to patients without metastatic disease (n = 30) CRC patients with liver metastases (n = 26) more frequently displayed losses of chromosomes 1p, 4, 5q, 8p, 9p, and 14q. Altogether, those results indicate that the genetic profile of metastatic CRC is defined by imbalanced gains/amplifications of chromosomes 7p, 8q, 13q and 20q together with losses of the 1p, 8p, 9p, 14q and 17p chromosomal regions [5,20,[25][26][27]. In addition, here we describe new recurrently altered regions that contain cancer genes, many of which have been previously involved in the pathogenesis of CRC, at the same time, we provide detailed characterization of recurrent chromosomal breakpoints most frequently occurring in primary tumours from CRC patients who had developed liver metastases.
Interestingly, a relatively high degree of correlation was found between the cytogenetic alterations detected by SNP-arrays and iFISH studies. Despite this, slight differences were noted between both techniques. On one hand, these were due to the lower Table 4. Most frequently detected high-level amplified chromosome regions (average log 2 copy number ratio $0.22) containing genes commonly associated with cancer in primary sporadic colorectal tumors genotyped on the Affymetrix 500K SNP array platform (n = 23). sensitivity of the SNP-array vs. iFISH for the identification of chromosomal abnormalities present in only a small proportion of all cells in the sample (i.e. secondary genetic lesions absent in the ancestral tumour cell clones) [28]. On the other hand, they were attributable to the increased sensitivity of the SNP-array vs. iFISH studies as regards identification of small interstitial changes [11]. In this regard, our results show occurrence of a high number of CN changes involving minimal/small regions (,1.3 Mb) and to a less extent, also extensive/large (.1.5 Mb) regions which frequently went undetectable by iFISH. Interestingly, several of these small and large altered regions contain cancer-associated genes known to be involved in CRC and/or the metastatic process: i.e. the TPD52 [29], FABP5 [30], MAP2K4 [31], LLGL1 [32], FBLN1 [33] and TYMP [34] genes. Among all human chromosomes, chromosomes 17 and 18 were those more frequently found to be altered in our series, their abnormalities typically consisting on extensive deletions involving the TP53 and DCC genes, respectively, in addition to other tumor suppressor genes, such as MAP2K4 at 17p12. A potential role for chromosome 18q in the development of CRC with associated liver metastases has been previously reported [35]; in this regard, decreased expression of Smad4 in addition to DCC, has been pointed out as a potential target protein coded in chromosome 18q since it is associated with both liver and lymph node metastases [36]. In line with these findings we also identified loss of the SMAD4 gene in the great majority (83%) of the metastatic cases analyzed. By contrast, the most frequently (78% of cases) amplified region was found in chromosome 20, at 20q11.22. This is a relatively small region of 178,817 bp which harbors 8 known genes, half of which have been associated with CRC: TNFRSF6B [37], OGFR [38], NTSR1 [39] and CDH4 [40]. Among these genes, overexpression of TNFRSF6B -a gene that belongs to the tumor necrosis factor receptor (TNFR) super-family-has been reported in advanced stages of CRC [37] and other tumors of the gastrointestinal tract [41], in association with an increased resistance to adjuvant chemotherapy [42]; in turn, increased NTSR1 expression has been reported as an early event in colon tumorigenesis that contributes to tumor progression and an aggressive clinical behavior [39]. Similarly, we also identified amplification and overexpression of the MYC gene at 8q24 in the great majority of the primary tumors, which have both been previously suggested to be involved in disease progression to a metastatic tumour [28;43].
From the clinical point of view, gain/amplification of 20p13 was associated with a higher frequency of well vs. moderatelydifferentiated tumours. Noteworthy, this chromosomal region contains genes which have been previously associated with disease progression. Accordingly, Miyoshi N et al have recently suggested that overexpression of the TGM2 gene in CRC patients is associated with a shorter overall survival [44] and expression of the PTPRA gene has been recurrently associated with progression of gastric cancer, including lymphovascular invasion and liver/ peritoneal dissemination [45,46].
Apart from defining the most frequently altered genes in metastatic CRC, this study was also aimed at detailed characterization of the most frequent recurrent breakpoint regions associated with such genetic changes. The number of different breakpoints detected within individual chromosomes is usually considered as a surrogate marker for chromosomal instability in cancer. In the present study, we found 245 different breakpoints for chromosome 1. This frequency is significantly higher than that reported by others using aCGH analyses of CRC without distant metastases: 16 different chromosomes breakpoints found, in a group of 32 patients [10]. These results suggest that advancedstage and metastatic CRC could be associated with a greater number of breakpoints and higher chromosomal instability. In line with this hypothesis, Knutsen et al [21] found 407 chromosomal breakpoints in 15 CRC cell lines, using spectral karyotyping with a high frequency of recurrent breakpoints in the centromeric (p11 to q11) or pericentromeric (p11.2 and q11.2) regions of chromosomes 12, 13, 14, 15, 17 18 and 20. Interestingly, in this latter study Knutsen et al [21] also found recurrent breakpoints at 17p11.2 in 6/15 cell lines.
In the present study, a high percentage of cases showed recurrent breakpoints for chromosomes 1, 8, 17 and 20. Most interestingly, breakpoints at chromosome 17p were preferentially localized at the genome coordinate 20,156,497-22,975,771 bp at 17p12 (15/23 cases); in most of these cases (12/15 cases), the breakpoint was restricted to the genome coordinate (21,769,828-22,975,771 bp) which maps for the FAM27L gene, a gene whose function remains to be elucidated. Whether, disruption of the FAM27L gene may also play a role in the malignant transformation and/or the metastatic process of CRC into the liver in addition to, inactivation of TP53 and inhibition of apoptosis [47,48], remains to be elucidated. Nevertheless, it should be noted that Camps et al [10] have shown a higher frequency of 17p11.2 breakpoints in CRC patients with positive (8/16) vs. negative (4/ 16) lymph nodes using aCGH. This breakpoint has been previously associated with an homogeneous genetic profile defined by a higher frequency of abnormalities of chromosomes 1p, 7, 8, 13q, 18q and 20q and an adverse clinical outcome [35,[49][50][51][52].
Other recurrent chromosomal breakpoints found in our patients were localized in the 1p12, 8p12 and 20p12.1 chromosomal regions. Previous studies suggest that genes typically deregulated by these chromosome breaks included the REG4 [53] and NOTCH2 [54] genes at chromosome 1p12, EIF4EBP1 [55] and FGFR [56] at chromosome 8p12, and the FOXA2 [57] gene at chromosome 20p12; all these genes have been associated with the development and progression of CRC and the metastatic process in a variety of human cancers, including the development of liver metastases in CRC [53][54][55][56][57]. Additional GEP and functional studies as well as direct comparison of paired primary and metastatic tumours are required to validate our findings and to gain further insight into their role in metastatic CRC patients.  Author Contributions