High Resolution Copy Number Variation Data in the NCI-60 Cancer Cell Lines from Whole Genome Microarrays Accessible through CellMiner

Array-based comparative genomic hybridization (aCGH) is a powerful technique for detecting gene copy number variation. It is generally considered to be robust and convenient since it measures DNA rather than RNA. In the current study, we combine copy number estimates from four different platforms (Agilent 44 K, NimbleGen 385 K, Affymetrix 500 K and Illumina Human1Mv1_C) to compute a reliable, high-resolution, easy to understand output for the measure of copy number changes in the 60 cancer cells of the NCI-DTP (the NCI-60). We then relate the results to gene expression. We explain how to access that database using our CellMiner web-tool and provide an example of the ease of comparison with transcript expression, whole exome sequencing, microRNA expression and response to 20,000 drugs and other chemical compounds. We then demonstrate how the data can be analyzed integratively with transcript expression data for the whole genome (26,065 genes). Comparison of copy number and expression levels shows an overall medium high correlation (median r = 0.247), with significantly higher correlations (median r = 0.408) for the known tumor suppressor genes. That observation is consistent with the hypothesis that gene loss is an important mechanism for tumor suppressor inactivation. An integrated analysis of concurrent DNA copy number and gene expression change is presented. Limiting attention to focal DNA gains or losses, we identify and reveal novel candidate tumor suppressors with matching alterations in transcript level.


Introduction
The NCI-60 is a set of 60 widely used cancer cell lines derived from 9 tissues of origin including breast, central nervous system, colon, lung, prostate, ovary and kidney, as well as leukemia and melanomas [1]. We, and others, have previously made available molecular data on multiple platforms for the NCI-60 [2][3][4][5][6][7], making it a unique resource for both pharmacogenomics [8,9] and systems biology [10,11]. These cell lines retain gene expression patterns from their original cancer tissues-of-origin, as demonstrated by co-clustering [4], and comparison to clinical samples [12]. The ability to compare drug response and genomic data for these cell lines is unmatched by any other clinical or cancer cell databases [8,11,13,14].
Prior studies of DNA copy number using aCGH from multiple cancerous cell lines and clinical samples have enhanced understanding of DNA variability at the cellular level [15], as well as yielding translational insights [16]. aCGH provides a measurement of genomic instability [17], a hallmark of carcinogenesis [18]. Associations between gene copy number and expression have also been studied, in some cases yielding implications regarding mechanisms of cancer progression [19,20].
Data on multiple platforms profiling the NCI-60 are accessible through our CellMiner web application [21]. Recently, we have introduced web-based tools that allow the non-bioinformatician to assess and cross-compare the databases [8]. In the current study, we expand this integrative capacity by presenting the highresolution DNA copy number data for the NCI-60 synthesized from the combination of data from four platforms (Table S1), and placed it in a format stereotypical to the other forms of data. We introduce the ''Gene DNA copy number'' web-tool, designed to allow the non-bioinformatician, to query, visualize and download relative DNA copy number data. The output from this tool facilitates integration of DNA copy data with our other databases, enhancing their integrative capacity.
Analytically, we provide measurements of relative DNA copy number variation within and between cell lines, compute several measures of genomic instability, and correlate relative DNA copy number with gene expression levels. Proceeding under the hypothesis that cancer focal gains and losses are the result of selective pressure based on their regulatory effect on gene

DNA Isolation
DNA was isolated as described previously [22]. In brief, genomic DNA was purified from cells using the QIAamp DNA Blood Cell Culture Maxi Kit, (Qiagen Inc., Valencia, CA) according to manufacturer's instructions. Quality was assessed by optical density 260/280 ratio using a spectrophotometer (Beckman-Coulter, Fullerton, CA) and by 0.8% agarose (SeaKem GTG, FMC BioProducts, Rockland, ME) gel electrophoresis in 1x TAE (Roche, Indianapolis, IN). DNA Copy Number in the NCI-60 Using four Microarray Platforms DNA copy numbers for all genes were determined by the integration of probes from i) the Human Genome CGH Microarray 44A (Agilent Technologies, Inc., GEO accession GPL11068) with 44 k probes, ii) the H19 CGH 385K WG Tiling v2.0 array (Roche NimbleGen Systems, Inc., GEO accession GPL13786,), with 385 k probes, iii) the GeneChip Human Mapping 500 k Array Set (Affymetrix Technologies, Inc., GEO accession GPL3812) with 500 k probes, and iv) the Human Human1 Mv1_C Beadchip array (Illumina, GPL6983) with 1,100 k probes. Data for these microarrays can be accessed at CellMiner [21]. In addition, raw data has been deposited in the Gene Expression Omnibus (GEO) under the following accession numbers Agilent 44 k (GSE48568) Affymetrix 500 k (GSE32264), NimbleGen 385 K (GSE30291), Illumina 1 M (GSE47620).

Probe Mapping and Intensities
Probes for the Agilent, NimbleGen and Illumina arrays were remapped to the latest HG19 reference using BLAST+ (Version 2.2.25) [23]. For the Affymetrix array, we used the latest annotation downloaded from the Affymetrix NetAffx website [24]. For each platform, we averaged the replicate samples (if Figure 2. Example of integrated analysis using CellMiner. The leftmost plot shows a barplot of copy number values for CDKN2A obtained by querying CellMiner. The middle plot shows the gene expression and the rightmost plot shows the response to a Mitoxantrone, a drug with significant negative correlation with the copy number status of CDKN2A. Dotted lines indicate some of the cell lines where the direction of copy number alteration is in the same direction as the gene expression and in the opposite direction as the drug activity. doi:10.1371/journal.pone.0092047.g002 ACGH in the NCI-60 PLOS ONE | www.plosone.org available, see Table S1). Probe intensities were determined following manufacturers recommendations as described previously for the Agilent [25], NimbleGen Roche [26], Affymetrix [27], and Illumina [28] microarrays.
For all platforms, the log probe intensities for each sample were normalized by mean-centering, prior to all subsequent analysis. The mean of the log probe intensities was subtracted from all probe intensities for that sample. . Whole genome visualization of aCGH results for the NCI-60. The x-axis is the chromosomal location of the probes, colored by chromosome number and ordered by genomic position. The y-axis is the log ratio of the probe intensities. The black horizontal marks indicate the average log 2 copy numbers in each segment, as calculated by Circular Binary Segmentation (see Materials and Methods). The amount of scatter above and below the segments' black marks indicates the level of probe variability. The locations of some cancer-related genes that have focal gains or losses are also indicated. High-resolution images for all the NCI-60 cell lines are available in Figure S1 and at our Website [21]. doi:10.1371/journal.pone.0092047.g003

Segmentation of Regions with Consistent Copy Number
Segmentation refers to the partitioning of each chromosome into contiguous segments such that the copy number is the same within a segment and there is a significant difference in the copy number between adjacent segments. In our analysis, we used Circular Binary Segmentation (CBS) [29]. CBS returns the average probe intensity within each segment as an estimate of the log 2 of copy number within that segment. Thus a mean probe intensity value of zero would correspond to a measured copy number of 2N (i.e. diploid), a value of -1 corresponds to copy number 1N and 1 corresponds to 4N.
Note that the Affymetrix 500 k data have been used before to detect regions of LOH (Loss of heterozygosity), however the algorithm used to detect the copy number variations was pennCNV which is unsuitable for genome-wide copy number estimation for cancer samples [30]. We have, therefore, re-analyzed the data using Circular Binary Segmentation (CBS).

Combination of Copy Number Estimates from Four Platforms
We used a novel algorithm to combine the segmented copy number estimates from the four platforms for each cell line. We used the segmentation of the copy number to define breakpoints at the junction of two contiguous segments. At a breakpoint, a discrete jump (increase or decrease) of copy number occurs. These points correspond with locations of chromosomal breaks.
We align the breakpoints from the four platforms for the same cell line using the following method: Breakpoints from different platforms that are within 100,000 base pairs from each other and have the same direction of copy number change are matched with each other. This groups together breakpoints from different platforms that putatively refer to the same chromosomal break. Breakpoints that are not matched with any breakpoint from another platform are discarded. Then we compute an average breakpoint location from each group of matched breakpoints as the average of the locations of the breakpoints from the different platform. We compute the average segment copy number by averaging the segmented values between two adjacent averaged breakpoints over the four platforms.
For each gene, we find the segment in which it lies. The copy number for the gene is the average segment copy number for that segment. This assigns copy number estimates to 41 or more cell lines for 23,413 genes.
The copy number estimates for the genes were compared to copy number estimates from the Cancer Cell Line Encyclopedia (CCLE) [13] using 44 cell lines common to both datasets. We computed the Pearson correlation between our measurement of copy number and the CCLE copy number across the 44 cell lines for each gene.

Prominent and Focal Gains and Losses
To identify the regions with the largest, most visually striking gains and losses, we set an arbitrary threshold of 1.5 on the absolute log 2 copy number and joined segments that were less than 500 kilobases away from each other (including any segments between them).
For a systematic identification of all focal copy number gains (or losses) for each sample, we used the CBS (segmented) data to find portions of the genome that are higher (or lower) than both their left and right-hand neighbors. We used three criteria for calling a gain or loss focal: i) the segment must have a difference in log 2 copy number of at least 0.3 from both its left and right-hand neighbors, both differences being either positive or negative; ii) the width of the segment must be less than 5 Mb; and iii) there should be more than 10 probes mapping within the segment. Any gene that has (partial or total) overlap with the segment is called focally gained or lost.

Genomic Instability Parameters
Using the segmented copy number data, we calculated two forms of genomic instability; i) the proportion of the genome that has been gained or lost and, ii) the number of gains and losses. The proportion of the genome that is gained or lost was calculated based on the segmented values of the array CGH. We estimated this by taking the proportion of the probes falling within segments   with absolute average intensities greater than 0.3 (a DNA copy number gain or loss of 0.46). The number of gains and losses was calculated as the total number (of gain/loss regions) with absolute average intensities greater than 0.3 with more than 10 probes mapping to the region.

Gene Expression Determination and its Correlation to DNA Copy Number
Expression for 26,065 genes was taken as an integrated z-score of measurements from five gene expression platforms, as described previously [31]. Genes with expression z-scores were matched to genes with copy number. This resulted in 18,504 genes with both expression and copy number estimates. Copy numbers for these 18,504 genes were compared to gene expression using Pearson's correlation (Table S3). The histogram of these correlations was plotted using R (version 2.15.2). The median correlations for all the genes, as well as for sets of known oncogenes and tumor suppressors, were calculated.

Assessment of known and Putative Tumor Suppressors
We selected genes based on their meeting four criteria; i) statistically significant correlation between copy number and expression (False Discovery Rate FDR ,0.05), ii) the gene being focally gained or lost in at least 3 samples (focal gains and losses as defined in the Segmentation section), iii) the number of cell lines with focal losses is at least 3 times greater than the number of cell lines with focal gains, iv) the genes were more than 2 million base pairs distance away from known tumor suppressors. Criterion 4 was used to remove ''passenger'' genes whose selection might be due to genomic proximity.

Results
The Array CGH Data can be Accessed and Visualized Using the CellMiner ''Gene DNA Copy Number'' web Analysis Tool To facilitate mining of the NCI-60 DNA copy number data, we introduce an intuitive tool to query and visualize the dataset. This tool is available at our CellMiner web site [21] within the ''NCI-60 Analysis Tools'' tab ( Figure 1A). As shown in Figure 1A, users first select ''Cell line signature'' in Step 1, and then ''Gene DNA copy number''. In Step 2, up to 150 genes of interest may be input by either typing in the gene names in the ''Input the identifier'' box, or uploading them as a text or Excel file using the ''Upload file'' radio button. In Step 3, users enter their e-mail address, and click ''Get data''. Results will be sent by e-mail for each gene, with a link to download an Excel file. This file contains four worksheets: i) ''DNA copy number'' containing tabular mean intensity ratios (of the test DNA compared to presumed normal) and estimated DNA copy numbers, and a bar plot of the estimated DNA copy numbers ( Figure 1B), ii) ''Graphical Output'' containing scatter-plots of the individual probe intensities for the gene of interest as well as 2MB flanking region for each cell line ( Figure 1C), iii) ''input'' containing the normalized data for those probes that fall within a gene of interest (highlighted in yellow) as well as 2610 6 nucleotides of flanking region on each end, and iv) ''Footnotes''. Figure 1 shows an example of 3 cancer-relevant genes ( Figure 1A), CDKN2A encoding the Cyclin-Dependent Kinase Inhibitor 2A (p16 INK4a , p19 ARF ), which is commonly deleted in cancers, CCNE1 encoding Cyclin E, which is commonly amplified in cancers, and KRAS encoding Kirsten Rat Sarcoma Viral Oncogene, which is activated in cancers by mutations and more rarely amplification. Panels B and C (Figure 1) show that many cell lines exhibit depletion of the CDKN2A locus (left panels), while ovarian cancer cells OVCAR3 and OVCAR5 show focal amplification of CCNE1 and KRAS, respectively.
A unique feature of the CellMiner website is that the copy number pattern obtained from CellMiner for a gene can be used as input for the Pattern Comparison tool to find correlated genes expression and drug activity. Figure 2 shows the copy number for CDKN2A (p16), the gene with the highest-correlated expression (CDKN2A), and the drug whose response is the most negatively correlated (NSC-301739). The robust correlation between DNA copy number and transcript expression identify the robust affect that DNA copy number alteration has on transcript expression in this gene. The negative correlation of the DNA copy number to the drug activity identifies the FDA-approved drug mitoxantrone (NSC-301739) as being more active in multiple instances of cancer cells with CDKN2A deletion (Figure 2, right panel and dotted lines).

Correlation with the Cancer Cell Line Encyclopedia
There are 44 cell lines common between the NCI-60 and the CCLE. Notably, the combined copy number estimates in the NCI-60 correlate well with the copy number estimates in the CCLE with a median correlation of 0.833. This is higher than the correlation to copy numbers from any individual platform (Agilent: Agilent: 0.660, NimbleGen: 0.448, Affymetrix: 0.821, Illumina: 0.804) implying that combining the platforms together improves the estimation. The higher correlation with the Affymetrix platform could be due to the fact the CCLE data was also generated on Affymetrix arrays (Affymetrix SNP 6.0).

Widespread Alterations in DNA Copy Composition Occurs in the NCI-60 Cell Lines
A global view of the NCI-60 genomic composition was generated using the CBS segmented aCGH results. Figure 3 displays representative examples of several genome variation types. The complete version for the NCI-60 is available in Figure S1 and at our website [21]. These displays reveal that most cell lines exhibit genomic alterations, including frequent genomic losses and gains, as well as altered ploidy. The types of variation in the genomes, however, vary widely within the NCI-60. Only some cell lines show normal (2N) copy number with few altered segments such as CO:HCT_15. Some have multiple altered genomic segments with approximately 2N overall copy number (e.g. RE:CAKI_1). Yet others have many altered segments in addition to being shifted from 2N, including BR:MCF7, CNS:SF_268, LE:RPMI_8226, ME:MALME_3M, OV:NCI_ADR_RES, and PR:PC_3. The data demonstrate the marked variability found in the abnormalities of the NCI-60 genomes.
The high intensity (absolute log 2 values greater than 1.5, i.e. DNA copy numbers greater than 5.60 or less than 0.71) amplifications (gains) and deletions (losses), visualized in Figure 3 and Figure S1, are listed with their locations in Table S2 by cell line, due to their prospective importance. These large gains and losses have chromosome biases, with three chromosomes (9, 3 and 6) having multiple alterations in multiple cell lines, and one (chromosome 21) with no marked gains or losses. These data identify chromosome-and cell-specific focal amplifications and deletions.

Global DNA Copy Number Alteration in the NCI-60
To further categorize the genomic copy number variations across the NCI-60, two parameters were derived from the aCGH data (Table 1). The ''proportion of genome gained or lost'' is the overall fraction of the genome that is gained or lost (compared to 2N); the ''number of gained or lost regions'' per genome represents the cumulative number of altered segments (gained or lost compared to 2N).
Comparison of the two parameters (proportion and number of gains and losses) showed a highly statistically significant positive correlation (Pearson's r = 0.76, p-value = 1.

Prominent Areas of the Genome with Focal Copy Number Changes, and Their Relationship to Known and Prospective Tumor Suppressors
Next we searched for genomic copy number changes that were ''focal'' in nature. Our approach was to look for genomic segments with: i) a difference in log 2 copy number of at least 0.3 from both its left and right-hand neighbors (the differences being either both positive or both negative); ii) a width less than 5 Mb; and iii) a  Table 2 summarizes these focal alterations for known oncogenes and tumor suppressors. Table S3 provides the focal alteration status for all (18,504) genes with both copy number and gene expression (see column S), and their genomic positions (columns Q and R).
The most commonly focally deleted segment occurs in 24 cell lines, and contains the CDKN2A tumor suppressor gene (p16 INK4a and p14 ARF) on chromosome 9 ( Figure 1B, 2 and 4A). The CDKN2A deletions occur in most of the NCI-60 tissue types, with the highest incidence in renal (6 out of 8 lines) and CNS cells (4 out of 6 lines). CDKN2A deletions are less frequent in breast (1 out of 5) and ovarian (2 out of 7) and absent in the colon and prostate lines. The detailed data for CDKN2A is found in Table  S3 (column Q). The next most commonly deleted tumor suppressor gene is PTEN on chromosome 10 ( Table 2 and Table  S3), which is markedly under-represented in 4 cell lines: CNS:SF_539, LE:CCRF_CEM, PR:PC_3 and RE:RXF_393. It is also focally gained in OV:OVCAR_4. Notably TP53, which is inactivated by mutations in 47 of the NCI-60 [3,32] (our submitted results) has focal loss in only two cell lines LE:HL_60, RE:TK_10 (Table S3), demonstrating specificity in mechanism of function knockdown of tumor suppressors.
Besides known oncogenes and tumor suppressors, one of the most intense amplifications was found in the OV:NCI_ADR_RES cell line on chromosome 7q21.12 (Figure 3, lower left panel and Figure 4C). This amplification encompasses two efflux pump ABC transporter genes, ABCB1 and ABCB4 ( Figure 4C), and is consistent with the high doxorubicin (adriamycin) resistance of this cell line [33,34]. Other than this chromosome 7 focal amplification, the OV:NCI_ADR_RES cell line shows an aCGH profile comparable to its parental line OV:OVCAR_8 ( Figure S1).

Correlation between Gene Expression and DNA Copy Number
To determine the relationship between DNA copy number and transcript expression levels, we calculated the correlations between the two parameters for all (18,504) genes with both copy number and gene expression. Table 2 and Table S3 give these correlation values, as well as the corresponding p-value and FDR for the tumor suppressors, and all genes, respectively. The histogram in Figure 5 shows that the median Pearson's correlation is r = 0.247, Two known tumor suppressors that have significant correlation between gene expression and copy number and an abundance of focal losses compared to gains. b Ten out of 22 additional genes identified using the same criteria that are also not proximal (within 2 MB) to a known tumor suppressor (Complete list in Table S4). providing a global indicator of the influence of gene copy number on expression. The median correlation of the combined data is higher than any individual platform (Agilent: 0.212, NimbleGen: 0.149, Affymetrix: 0.242, Illumina: 0.226), again implying that the combined data improves the copy number estimation over using any individual platform.
The subset of 101 known tumor suppressors had a significantly higher median correlation (r = 0.408, Figure 5) than the whole genome (r = 0.247, Figure 5). The subset of 96 known oncogenes showed only slightly higher correlation compared the overall genome (median r = 0.255; Figure 5). These results demonstrate that gene loss influences expression of known tumor suppressors to a greater degree than either the ''all genes'' or oncogenes groups.

Identification of Novel Putative Tumor Suppressor Genes
Since focal changes in DNA copy number of known tumor suppressor genes ( Figure 1B and C, Figure 3, Table 2) showed highly significant correlation to their transcript expression levels ( Figure 5, Table 2), we used this characteristic to search for and identify additional genes with potential relation to cancer. Our approach was based on the results for the known tumor suppressors CDKN2A and PTEN ( Table 3). The selection criteria for novel genes required: i) correlations between DNA copy number and transcript levels significant to a FDR of 0.05, ii) focal gains or losses in at-least three cell lines [focal changes were defined as gains or losses smaller than 5 Mb that overlap the gene], and iii) a 3:1 or greater ratio for the number of cell lines with losses compared to gains. In addition, we required that the genes pass a fourth criteria that there should be no known tumor suppressors within 2 MB (to avoid detecting ''neighbors'' of known driver tumor suppressors).
We assessed all 18,504 genes that have both gene expression and copy number estimates to identify those that passed the above criteria. Thirty one genes passed criteria 1-3 (Table S4), and 22 satisfied all four criteria (Indicated in column U and highlighted in green). Those genes group into 12 ''gene clusters'' such that genes in the same cluster are adjacent to each other and have copy numbers that are highly correlated (to each other) across the NCI-60 (Pearson correlation .0.8), indicating that they are largely lost or gained as a group. The 22 novel tumor suppressor clusters are at cytobands 11q13.4, 17p12, 17p11.2, 17q23.1, 21q11.2, 21q21.1, 22q11.21, 22q12.2, 22q13.1 and Xp22.31. Table 3 lists ten of the genes that fall within these clusters and have been reported to exhibit tumor suppressor characteristics.

Discussion
In the current study we combined data on the NCI-60 cell line panel from four high-resolution array CGH platforms. Combining the four platforms yields a dataset with i) increased probe coverage, ii) higher correlation to the copy number estimates from the CCLE (Cancer Cell Line Encyclopedia), and iii) higher correlation to gene expression, indicating better estimates that any one platform alone.
The dataset adds to the array of molecular data available for the NCI-60, facilitating integrative (''integromic'') [4,8,32,35] studies of cancer biology and molecular pharmacology. The data and analysis tools to facilitate its use are publicly available at our NIH CellMiner web suite [21] (Figure 1A). We also provide an example of the kind of integrative analysis that can be done. Comparing the DNA copy number for CDKN2A, a known tumor suppressor to its mRNA expression reveals the robust manner in which this molecular alteration is associated with the genes expression, and its frequent inactivation in the NCI-60 (see Figure 1 and Table S3). Comparing the DNA copy number for CDKN2A to the compound database reveals the FDA-approved drug mitoxantrone (NSC301739) as being more active in cell lines with CDKN2A knockout (Figure 2).
The patterns of gains and losses in the cell lines encompass a wide spectrum, with different patterns of variation likely representing differences in the molecular malfunctions within the cells (Figure 3, Figure S1 and website [21]). Using the identified areas of relative focal chromosomal gain or loss (size ,5x10 6 bp and amplitude .0.3 of the log 2 of the copy number), we calculate two new measures of genomic instability, the proportion of the genome gained or lost, and the total number of gains and losses for a cell line (Table 1).
Between OV:OVCAR_8 and its adriamycin-resistant derivative OV:NCI_ADR_RES, we find a large number of copy number differences (15 focal gains and 5 losses in OVCAR_8 that are not present in NCI_ADR_RES and 20 focal gains and 9 losses in NCI_ADR_RES that are not present in OVCAR_8) ( Figure 4C and Figure S1 and website [21]). The most striking is the small, focal (,3610 5 nucleotides) high intensity amplification in NCI_ADR_RES ( Figure 3 and Figure S1) that includes two efflux pump genes, ABCB1 (MDR1) and ABCB4 ( Figure 4C). ABCB1 has previously been shown to be up-regulated in the NCI_ADR_RES [33] and other multiple-drug resistant cell lines [33,34,[36][37][38]. Thus, our results confirm over-expression of ABCB1, and add up-regulation of ABCB4 in NCI_ADR_RES (as compared to OVCAR_8) and associate this increase to increased DNA copy number.
Our data demonstrates and catalogues prominent focal gains and losses of cancer-related genes in multiple cell lines (Table 2). Among the tumor suppressors, both the CDKN2A (p16) and PTEN losses are consistent with prior reports of deletion in cancers [39,40]. The oncogene MYC, with focal gains in four cell lines (leukemia HL60 and RPMI_8226, colon carcinoma SW620 and prostate carcinoma DU_145; Table S3) has been reported to be amplified in prior reports [41].
The median positive correlation (r = 0.247) between DNA copy number and transcript expression ( Figure 5) is consistent with prior results [2,[42][43][44]. Interestingly, we found a markedly higher correlation for the 101 known tumor suppressors (r = 0.408), than for the 96 oncogenes (r = 0.255). This implies loss of copy number is a stronger driver of altered gene expression for tumor suppressors than gain of copy number is for oncogenes. To the best of our knowledge, our study is the first attempt to compare focal DNA copy number and transcript expression changes across multiple cancer types.
The known tumor suppressors CDKN2A and PTEN provide criteria of recurrent focal DNA gains or losses that correlate well with transcript expression. They imply that DNA copy number correlated with expression level change is an indicator for cancerrelatedness in genes. Using these criteria to identify novel putative tumor suppressor genes (Table S4), we find 12 chromosomal segments containing 22 correlated genes (without nearby known oncogenes or tumor suppressors). Of these, five contain at least one gene with prior literature association with cancer ( Table 3). The area with the most genes occurs in chromosome 17. It contains 10 genes with correlated expression level change, including four with prior association with cancer (NCOR1, FLCN, PEMT and PTRH2). Among these, FLCN has been suspected to be a tumor suppressor gene whose inactivation by mutations is causative of Birt-Hogg-Dubé syndrome, whose symptoms include susceptibility to renal cancers [45].
In summary, we present a novel combination of chromosomal segmentation results from multiple aCGH platforms and provide both an intuitive web-based public resource [21] and a highresolution and improved quality view of the genome-wide copy number variation of the NCI-60 cancer cell lines. We identify a catalog of focal copy number gains and losses for both important known tumor suppressors and oncogenes, and novel tumor suppressor gene candidates. Copy number changes for any gene of interest can be interrogated using the web-based CellMiner tools [21], which enable users to connect the largest publicly available drug database with a full array of genomic databases. Figure S1 Whole genome visualization of aCGH results for all cell lines from the NCI-60. The x-axis is the chromosomal location of the probes, colored by chromosome number and ordered by genomic position. The y-axis is the log ratio of the probe intensities, shown on the left side of the plots, and the estimated DNA copy number, shown on the right side of the plot. The black horizontal lines indicate the average log 2 copy numbers in each segment, as calculated by CBS. The amount of scatter above and below the segments black lines indicate the level of probe variability. CO:HT29 has data only on the Agilent platform, which makes the number of plotted points much lower than the other cell lines. (TIF)