Genomic Aberrations in Lung Adenocarcinoma in Never Smokers

Background Lung cancer in never smokers would rank as the seventh most common cause of cancer death worldwide. Methods and Findings We performed high-resolution array comparative genomic hybridization analysis of lung adenocarcinoma in sixty never smokers and identified fourteen new minimal common regions (MCR) of gain or loss, of which five contained a single gene (MOCS2, NSUN3, KHDRBS2, SNTG1 and ST18). One larger MCR of gain contained NSD1. One focal amplification and nine gains contained FUS. NSD1 and FUS are oncogenes hitherto not known to be associated with lung cancer. FISH showed that the amplicon containing FUS was joined to the next telomeric amplicon at 16p11.2. FUS was over-expressed in 10 tumors with gain of 16p11.2 compared to 30 tumors without that gain. Other cancer genes present in aberrations included ARNT, BCL9, CDK4, CDKN2B, EGFR, ERBB2, MDM2, MDM4, MET, MYC and KRAS. Unsupervised hierarchical clustering with adjustment for false-discovery rate revealed clusters differing by the level and pattern of aberrations and displaying particular tumor characteristics. One cluster was strongly associated with gain of MYC. Another cluster was characterized by extensive losses containing tumor suppressor genes of which RB1 and WRN. Tumors in that cluster frequently harbored a central scar-like fibrosis. A third cluster was associated with gains on 7p and 7q, containing ETV1 and BRAF, and displayed the highest rate of EGFR mutations. SNP array analysis validated copy-number aberrations and revealed that RB1 and WRN were altered by recurrent copy-neutral loss of heterozygosity. Conclusions The present study has uncovered new aberrations containing cancer genes. The oncogene FUS is a candidate gene in the 16p region that is frequently gained in never smokers. Multiple genetic pathways defined by gains of MYC, deletions of RB1 and WRN or gains on 7p and 7q are involved in lung adenocarcinoma in never smokers.


Introduction
Tobacco smoking is the main avoidable cause of lung cancer. However, lung cancer also occurs in never smokers and would rank as the seventh most common cause of cancer death worldwide [1], [2]. In France, lung cancer in never smokers accounted in the year 2000 for 17% and 4% of lung cancer deaths among women and men, respectively [3].
Lung cancer in never smokers occurs more frequently among women, and it favors the adenocarcinoma histological type [4]. One of the most striking distinctions is the observed differential response to drugs that target the epidermal growth factor receptor (EGFR). Compared with smokers, never smokers treated with these agents have higher response rates to treatment [5], [6].
EGFR mutations in lung cancer are more frequent in never smokers and are exclusive with KRAS mutations [7], [8], [9], [10], [11]. Mutations in HER2 also target never smokers [12]. The transversion/transition ratio and the distribution of TP53 and KRAS mutations differ according to smoking status [13], [14], [15], [16]. The complex mutational signatures of lung cancer cells in smokers reflect the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases [17].
While it is well established that specific DNA sequence abnormalities are linked to smoking status, other oncogenomic events are less well known among never smokers. In most genomic studies, the proportion of never smokers is unknown or small compared to that of smokers. Few separate studies of aberrations in never smokers have been performed, mainly in patients from East Asia [18], [19]. Allelic imbalances were infrequent in never smokers with lung adenocarcinoma [20], although in Chinese never smokers their pattern appeared distinct [18]. In Chinese never smokers the most frequent aberration was gain of 16p [19]. In the largest study of the lung adenocarcinoma genome, never smoker status was associated, although not significantly, with amplification of 7p-q and 16p and deletion of 10q and 15q [21]. Preliminary studies also indicate a relationship between smoking history and EML4-ALK fusions [22].
The catalogue of copy-number aberrations may lead to the identification of imbalances encompassing genes that contribute to the development or progression of lung cancer [23]. Here, we tried to accrue knowledge of aberrations occurring in lung adenocarcinoma in never smokers with the goal to uncover new aberrations that would include cancer genes.

Materials and Methods
Detailed methods on inclusion of patients, processing of samples, EGFR and KRAS sequencing, oligonucleotide aCGH analysis, genomic PCR, fluorescent in situ hybridization studies, gene expression analysis and SNP array analysis are available in supplementary information (Material and Methods S1).

Patients and samples
The project, referred as the Lung Genes (LG) study, involved 13 centers in France. The 60 patients were never smokers -defined following current consensus guidelines [24], [25] as persons with a lifetime exposure of less than 100 cigarettes. All patients had been treated by surgery. The pathological diagnosis was reviewed and cases for which a doubt about the primary site in the lung remained were excluded.
The research has been approved by the Institut National du Cancer review board as part of the Programme National d'Excellence Spécialisé Poumon. Writen consent has been obtained from study patients for the use of their lung samples.
Genomic DNA and RNA were extracted from frozen tumor sections and the HCC827 cell line, obtained from ATCC. The cell line was authenticated by comparison of its Agilent aCGH profile with the previously published whole genome tiling path aCGH profile [26].

Sequencing of EGFR and KRAS
EGFR exons 18, 19, 20, 21 and KRAS exons 2 and 3 were directly sequenced in both sense and antisense directions from at least two independent amplifications.

Oligonucleotide aCGH analysis
Genomic DNA was analyzed using 244K Whole Human Genome (G4411B) microarrays (Agilent Technologies, Santa Clara, CA, USA). The data are described in accordance with MIAME guidelines and have been deposited in ArrayExpress (http://www.ebi.ac.uk/arrayexpress) under E-TABM-926 accession number.
The threshold for gain and loss was abs(log2ratio).0.25 for a minimum of 5 consecutive probes. Focal amplifications were considered for aberrations showing a log2(ratio) .1.58 and extending less than 5 Mb. Minimal common regions (MCR) were identified with STAC v1.2 [27] and by using both the frequency-confidence and footprint methods at lower and higher stringencies (confidence .0.95 and .0.995, respectively). MCR were manually reviewed to validate breakpoints and to discard copy-number variants. For hierarchical clustering, Euclidean distances and Ward's construction method were used. The bootstrap tests were performed using the R environment package Pvclust [28]. Cluster-associated aberrations were identified using ANOVA with P values adjusted for their falsediscovery rate using the Benjamini-Hochberg method [29] The P values (F-test) for the association of clusters with clinicopathological variables were adjusted for multiple testing using Bonferroni correction.

Genomic PCR
Quantification of FUS genomic DNA was performed in TaqmanH assays (Applera, Villebon-sur-Yvette, France) using primers and probes that were designed using Primer3 software.
Fluorescence in situ hybridization (FISH) studies FISH was performed on tumor touch-imprinted slides.

Gene expression analysis
The gene expression analysis encompassed HG-U1133 plus 2.0 Affymetrix array data in a subset of 40 samples belonging to an ongoing study (not published). Expression of probe sets in the 16p11.2 region was compared with the t-test.
Quantification of FUS mRNA expression was performed in predesigned TaqmanHgene expression assays.

SNP array analysis
SNP array genotyping was carried out using the Illumina ''HumanCNV370-Quad'' array (Illumina, Inc., San Diego, CA) in the subset of 40 samples belonging to an ongoing study (not published). Individual cases with aCGH profiles delineating an aberration were selected for cross-validation by SNP array profiles. The aCGH profile in the region of aberration was compared to the corresponding SNP array profile for each selected case using the Integrated Genome Browser (http://www.bioviz.org/igb/).
For assessment of copy-neutral loss of heterozygosity (LOH), only segments with at least 10 consecutive SNPs showing a LOH and a copy number equal to 2 were considered.

Partition of tumors into clusters
A non supervised hierarchical clustering analysis revealed two main classes A and B, which could be further subdivided into 2 clusters A1 (n = 16) and A2 (n = 11) for A and into 3 clusters B1 Figure 1. Aberrations using aCGH analysis in 60 never smokers with lung adenocarcinoma. Panel A. Heat map of gains (green color) and losses (red color) by chromosome generated by non supervised hierarchical clustering. Small blue or yellow dot indicate gains with log2(ratio).1.5 and losses with log2(ratio),21.5, respectively. Blue star (*): two outliers (37875 between classes A and B and 37569 between clusters B1 and B2). Panel B. Distribution of gains (green color) and losses (red color) along the genome. doi:10.1371/journal.pone.0015145.g001 (n = 9), B2 (n = 9) and B3 (n = 14) for B ( Figure 1). An assessment of the uncertainty in hierarchical clustering is provided in Figure S2.
Clusters differed by their AG percentages (P,0.001; Figure S3) and their aberration patterns. Cluster A1 was characterized by few aberrations, which comprised recurring gains on 5p, 7p, 14q and 20q, and losses on 8p (Table S2). In cluster A2 the level of AG (mean 12%, range 2 to 18%) was higher than in cluster A1 (mean 2%, range 0 to 4%). The aberration pattern in cluster A2 was different from the patterns of clusters B1, B2 and B3, indicating that cluster A2 was not a cluster belonging to class B with reduced amplitude in the aberrations. Cluster A2 had more losses (9%) than gains (7%), while cluster B1 had twice more gains (13%) than losses (6%). Notably, cluster B1 was characterized by the occurrence in every case of a gain on 8q. Cluster B2 was characterized by more losses (21%) than gains (10%) with a distinctive combination of numerous and frequent losses on 3p, 8p and 13. Cluster B3 was defined by gains on 7p and 7q, together with gains on 17q, 21, and less frequently X. One outlier between class A and class B was characterized by a uniquely high level of AG (64%), which was distributed in both gains (23%) and losses (41%); another outlier between cluster B1 and B2 displayed a gain of the whole chromosome 12.
By ANOVA, gains including oncogenes and losses including tumor suppressor genes were significantly associated after adjustment for their false discovery rate with particular clusters (Table S3). MYC at 8q24.21 was gained in 100% of cases in cluster B1 (adjusted P = 6.00E-05). BRAF was included in a region extending 1.27 Mb at 7q34 that was gained in 64% of cases in cluster B3 (adjusted P = 0.001). Other gains on 7q including ELN, HIP1, CREB3L2 and KIAA1549 were associated with cluster B3. The gains on 7p containing CARD11, ETV1 and IKZF1 were observed in 78% to 92% of cases of cluster B3. Several regions on 13q that included CDX2, BRCA2, RB1 and ERCC5 were lost in 77% to 88% of cases in cluster B2. WRN at 8p12 was the single gene present in a deleted region in 88% of cases in cluster B2 (adjusted P = 0.002).
The five clusters differed by their association with a central scarlike fibrosis (P = 0.03 after Bonferroni correction), which was more frequent in cluster B2 (7/9 cases) compared to other clusters (12/ 50 cases). They did not differ with regard to other clinicopathological characteristics.

Relationships of clusters with abnormalities in EGFR and KRAS
Forty tumors (67%) harbored EGFR mutations (Table S4). The four KRAS mutations were observed in four EGFR wild-type cases.
The prevalence of EGFR mutations differed with clusters (P = 0.004), gains on 7p (P = 0.04) and AG percentages (P,0.001). EGFR mutations remained associated with clusters after adjustment for AG percentages and gain on 7p (P = 0.05). Cluster B3 was characterized by the highest frequency of gains on 7p (93%), and the highest frequency of EGFR mutations (93%), although these abnormalities did not coincide. Most gains on 7p (80%) and every case with an amplification spanning EGFR were associated with EGFR mutation. Nineteen EGFR mutations were seen in cases with no gain on 7p.
While every gain on 7p included EGFR, only 5 of 14 gains on 12p included KRAS either wild-type (3 cases) or mutated (2 cases). The distribution of mutations or gains involving EGFR or KRAS is displayed in Figure S4. The 10 cases without abnormality involving EGFR or KRAS belonged to clusters A1 (9 cases) or A2 (1 case with 2% AG). Amplifications of MET and ERBB2 occurred with a gain on 7p and an EGFR mutation, respectively.

Minimal common regions
MCRs of gain were identified on 1q, 2p, 5p, 5q, 7p, 7q, 8q, 12p, 12q, 14q, 18p and 20q (Table 1). Their mean width was 879 Kb (range 109 to 2927). The maximum log2(ratio) ranged from 0.53 to 3.13. The twenty-two MCRs contained 152 coding genes, including BCL9, ARNT, MDM4, NSD1, EGFR, MYC and MDM2, as well as 6 miRNA. The highest frequency of recurring gains (62%) was noted at 5p13.33 that contained TERT and CLPTM1L. The MCR containing EGFR was involved in 43% of cases. A 171 Kb MCR at 20q13.33 contained only mir-646. Nine MCR contained between 1 and 5 coding genes, five MCR between 7 and 9 coding genes, and four MCR more than 10 coding genes. The MCR of gains were compared to previously published regions of gain in four representative studies [21], [30], [31], [32]. As shown in Table 1, out of eight MCR that did not overlap with previously reported gains, one MCR contained a single gene (MCOS2) and two MCR contained only three genes.
MCRs of loss were identified on 1p, 3q, 6q, 8q, 9p, 16q and 20p ( Table 2). Their mean width was 560 Kb (range 20 to 1703). The minimum log2(ratio) ranged from 20.43 to 21.19. In four cases it was ,21. The nine MCRs contained 18 coding genes, including CDKN2B for which the highest frequency of losses (53%) was noted. Five MCRs contained only one coding gene, and three MCRs between 3 and 6 coding genes. As shown in Table 2, six MCR of loss did not overlap with previously reported losses. Four of these MCR contained a single gene (NSUN3, KHDRBS2, SNTG1 and ST18) and one MCR contained four genes.

Copy-neutral loss of heterozygosity
Forty-five of regions of interest which had been identified by aCGH (Tables 1, 2 and 3) could be evaluated by SNP analysis in 40 tumors. Thirty-nine regions were cross-validated by the SNP array profiles. An example is shown in Figure S5.
The SNP arrays could be analyzed for detection of copy-neutral LOH in 23 cases. The 17 remaining samples were not informative for LOH. Two-hundred and five regions displayed recurring copyneutral LOH. MCR of recurring copy-neutral LOH with a frequency .20% are shown in suppressor genes that were present in losses identified by aCGH, RB1 and WRN were also present within copy-neutral LOH MCRs.

The 16p11.2 region harboring the oncogene FUS
The short arm of chromosome 16 displayed high-level focal amplifications in case 37817. There were two distinct regions of amplification that were separated by .4 Mb and extended 0.92 Mb and 1.20 Mb at 16p12.1 and at 16p11.2, respectively ( Table 3). Each region comprised three peaks, which extended 36 Kb to 185 Kb and were spaced by 140 to 670 Kb. The 16p11.2 amplicons shown in Figure 2 harbored FUS, 12 other coding genes, and one long non-coding RNA gene. Nine additional cases demonstrated gains of a smaller amplitude encompassing FUS.
Real-time quantitative PCR assays in case 37817 showed a strong increase (.30 times) in FUS copy number compared to AQP8 and AMPD2, which were located in copy-neutral regions.
The 16p11.2 region was explored by FISH by using two BAC clones (RP11-388M20 and RP11-347C12). The former completely covered FUS, while the latter was 745 Kb telomeric to it in the region ,30,109-30,290Mb. (Figure 2). Both BAC were cohybridized on normal metaphases and nuclei, and the signals were superposed. When co-hybridized on tumor cells from case 37817, two independent gene amplification homogeneously staining region (HSR) patterns appeared (Figure 2), demonstrating that the breakpoint of an unknown chromosomal translocation separated the two amplified segments (the telomeric amplification revealed by RP11-347C12 was not apparent in the aGGH results as this region was not covered by Agilent oligoprobes). Then, the amplicon containing FUS was characterized using RP11-388M20 together with the Vysis break apart probe. The BAC probe was stained in the same color as the centromeric part of the Vysis probe, but in a color different from that of the telomeric part. The probes were found amplified with a HSR pattern and co-localized in tumor cells, delimiting the previous breakpoint from 30,27 to 30,50 Mb. Furthermore, the co-localization suggested that the two amplicons ,30,71-30,90Mb. and ,31,09-31,21. were physically linked, as the 0,2Mb region ,30,90-31,09Mb. was not amplified.
As shown in Figure S6, analysis of gene expression array data showed that four probe sets (1565717_s_at, 200959_at, 215744_at and 217370_x_at) interrogating FUS were significantly overexpressed in the subgroup of 10 tumors harboring a 16p gain compared with 30 tumors without such gain.
Real-time PCR gene expression assay established that FUS mRNA relative levels were 4 times higher in tumor 37817 (mean DCT 2.6) compared to NCI-HCC827 cell line (mean DCT 4.6), which displayed no gain on 16p.

Discussion
We used a high-resolution aCGH to analyze aberrations that occurred in lung adenocarcinoma in 60 never smokers. We identified new MCR of gain or loss and new amplifications. Furthermore, unsupervised hierarchical clustering showed that tumors could be classified into clusters exhibiting different levels and pattern of aberrations, which contained cancer genes. Clusters differed by their tumor characteristics.
Fourteen MCR of gain (eight regions) or loss (six regions) did not overlap with regions that were previously reported in four representative studies [21], [30], [31], [32]. Out of our newly described MCR, five contained a single coding gene (MCOS2, NSUN3, KHDRBS2, SNTG1 and ST18) and may be considered as high-priority regions for further studies. Somatic mutations in genes within narrow MCR, including FLT4, MAPK9, SPO11 and KHDRBS2, have been reported in cancers (COSMIC v48 release). Among single genes encompassed by MCR of loss, ST18 was present in a 48 Kb MCR. ST18 was found lost, hypermethylated and its mRNA downregulated in breast cancer [33].
Some newly uncovered aberrations contained oncogenes such as FUS at 16p11.2 and NSD1 at 5q35.2-q35.3, whose association with lung cancer has hitherto not been reported. A gain on 16p has been previously associated with lung cancer in never smokers, although the association was not significant after multiple testing [19], [21]. We note that the association with never smoker status may be confounded by ethnicity or sex [34]. We found that the oncogene FUS was present in a high-level narrow amplification at 16p11.2 in one tumor (37818). It should be noted that nine other tumors displayed gains encompassing FUS, although the gene was first identified from a single patient. Furthermore, in the gene   Although separated by less than a 1 Mb, RP11-347C12 (red) is slightly more telomeric than RP11-388M20 (green), although they are fused expression analysis the mean FUS expression level was compared between the 10 tumors displaying the 16p gain and 30 tumors without such gain. As FUS was found overexpressed in the subgroup with 16p gain, it was identified as a candidate gene from 10 tumors. Originally described as the result of translocations in myxoid liposarcoma [35], FUS encodes a TET protein that exerts roles in transcription and splicing and functions in several aspects of growth control and DNA repair [36]. Here, the aberration in tumor 37818 consisted of three closely spaced amplicons, suggesting amplification through breakage-fusion-bridge cycles [37]. Furthermore, FISH showed that the amplicon containing FUS was joined with the next telomeric amplicon in a HSR. The whole 16p11.2 region appeared highly rearranged as shown by the lack of FISH co-localization of the BAC covering FUS with a farther telomeric BAC. Among genes present in the 16p11.2 amplicon only FUS has until now been reported as altered by somatic simple mutation in cancer (Cosmic v48 release). While our data are consistent with FUS as a candidate gene in lung adenocarcinoma in never smokers, they do not prove that FUS is the functional target of the amplification. It is essential to systematically analyze using functional assays the whole 16p11.2 region.
To pinpoint cancer genes, we used a census that is conducted with relatively conservative criteria [38]. It is remarkable that we found many cancer genes that were previously reported in aberrations in lung cancer, including BCL9, ARNT, MDM4, EGFR, MYC, MDM2, CDKN2B, MET, CDK4, and ERBB2. Large aberrations are also consistent with the literature [19], [21], [23], [26], [31]. The gain containing TERT was reported as the most frequent event (78%) in early lung cancer [39]. TERT was included in this study within a MCR of gains with a high frequency (62%). At 5p15.2 TRIO was previously identified in a focal amplification and was found differentially expressed in earlystage lung cancer [40]. At 5p13 GOLPH3 was recently established as a new oncogene that was gained in lung and other cancers [41]. It was frequently gained in our study without being included in a MCR or a focal amplification. At 14q13.2-14q21.1 we found a MCR of gain containing MBIP, NKX2-1, NKX2-8 and PAX9, whose cooperation is involved in lung tumorigenesis [42]. Overlapping with previously reported regions, other MCR were often delineated with better precision. We identified a 390Kb MCR at 20q13.2, reported by Zhao et al. [32], that contained two genes of which ZNF217 was found mutated in lung cancer. Another MCR at 20q13.33, reported by Tonon et al. [30], contained only mir-646.
We used hierarchical clustering to determine whether tumors were heterogeneous and whether there were cluster-specific aberrations, which could have been hidden in the study of the whole cohort. Tumors could be classified into five clusters that differed by their AG percentages and aberration patterns. Interestingly, the compendium of cancer genes that were present in cluster-associated recurring aberrations was to a large extent different from the list of MCR-associated cancer genes except for MYC. Present in a MCR in the whole cohort, MYC was also strongly associated with one cluster (cluster B1), where it was gained in every case. As point mutations in MYC do not occur in lung cancer, the gain of MYC could be important for lung cancer classification in never smokers. The tumor suppressor gene WRN, which encodes a helicase, was the single gene present in a narrow region at 8p12 that was frequently lost in cluster B2. WRN has been reported to undergo epigenetic inactivation through CpG island promoter hypermethylation in about one-third of non-small cell lung cancer [43]. Other losses associated with cluster B2 were located on 13q and included RB1, which is frequently altered in lung cancer [16], and three other tumor suppressor genes. Another gain that was associated with cluster B3 included BRAF, whose mutation has been reported in 3% of non-small cell lung cancer [44]. There were other noteworthy gains on 7p and 7q, however, among which that of ETV1 was the most strongly associated with cluster B3. The results presented here support heterogeneity in the genetic pathways in lung adenocarcinoma in never smokers. This view is strengthened by the association of cluster B2 with scar-like tumor fibrosis, a desmoplastic reaction which is common in localized peripheral lung adenocarcinoma, and of cluster B3 with the highest rate of EGFR mutation (93%) as well as the highest rate of the co-occurrence of EFGR mutations and gains or amplifications on 7p (86%).
EGFR mutations were found in 68% of cases in our study, a high rate similar to those reported in never or former light smokers in two recent studies [45], [46], while mutations in KRAS were infrequent. EGFR mutations were exclusive of KRAS mutations, a consistent observation suggesting that EGFR and KRAS mutations signal through a common pathway. The fact that every gain on 7p included EGFR supports that the gene is a likely target of those gains. In the absence of a gain on 7p, cases wild-type for both EGFR and KRAS either demonstrated amplification of KRAS or were characterized by low levels of aberrant genome. The targeting of EGFR or KRAS appears a nearly constant finding when tumors display genomic instability. However, it has been shown that the molecular subsets defined by EML4-ALK, EGFR, or KRAS mutations are distinct [47].
MCR of gains outnumbered MCR of loss, although the proportions of gained and lost genome were similar, suggesting a greater dispersion of losses. The predominance of gains is observed in most studies [21], [30], [32]. It is likely that other mechanisms inactivate tumor suppressor genes. Copy-neutral LOH may be such a mechanism. Copy-neutral LOH (also known as uniparental disomy)-wherein the retained homolog is duplicated so as to preserve two total copies per cell-is quite common in some cancers [48]. The SNP array analysis revealed recurrent copyneutral LOH. Among tumor suppressor genes altered by copynumber losses, RB1 and WRN were also present in regions of recurrent copy-neutral LOH. This observation may be meaningful as copy-neutral LOH can be biologically equivalent to the second hit in the Knudson hypothesis. The variety of different genetic events underlying LOH at the RB1 locus in retinoblastoma seems to occur in lung cancer [49]. On the other hand, at less than 75% tumor DNA in heterogeneous samples an allelic duplication event and an allelic LOH bear resemblance to each other [50]. A comparison between smokers and never smokers with lung carcinoma is required to determine whether LOH is less frequent in never smokers as suggested by the early work of Sanchez-Cespedes et al. [20].
In conclusion, new regions of interest, some of which contained cancer genes or few potential candidate genes, were uncovered. Our results do not establish that the new regions were characteristic of never smoker status, but provide interesting insights into genomic imbalances in lung cancer. Amplicons at 16p11.2 were joined in a HSR including FUS, which was overexpressed when the gene was included in 16p11.2 gains. We also for a large part. (b) The same probes on case 37817 cells showing a distinct pattern of amplification. (c) Combination of Vysis FUS probes with RP11-388M20 (red) that show a co-localization of the three probes on the amplicon even in decondensed HS. doi:10.1371/journal.pone.0015145.g002 showed heterogeneity in lung adenocarcinoma in never smokers with MYC as important in the classification. Genetic alterations targeting the EGFR signaling pathway appear nearly constant in tumors with genomic instability.
LG participants, all in France  Figure S1 Correlations between percentages of gain and percentages of loss in the whole genome in never smokers with lung adenocarcinoma. R2: Pearson correlation coefficient. Panel A. Correlation among the 5 clusters A1, A2, B1, B2 and B3. Panel B. Correlation among the 4 clusters A2, B1, B2 and B3 after exclusion of cases with low levels of aberrant genome (,5%) belonging to cluster A1. (TIF) Figure S2 Cluster dendogram with adjusted unbiased (AU) and bootstrap (BP) values (%) in 60 never smokers with lung adenocarcinomas using the R environment package Pvclust. Distance: euclidean. Cluster method: Ward. BP values (right, green color), AU values (left, red color), and cluster labels (bottom). The AU value may be lower than the BP value when the similarities involve a small proportion of the data. An example is provided by cases 37818 and 37892 belonging to cluster B1, whose region of similarity (8q) was narrow as shown in the heatmap. Material and Methods S1 Detailed methods on inclusion of patients, processing of samples, EGFR and KRAS sequencing, oligonucleotide aCGH analysis, genomic PCR, fluorescent in situ hybridization studies, gene expression analysis and SNP array analysis.