Identification of Enriched Driver Gene Alterations in Subgroups of Non-Small Cell Lung Cancer Patients Based on Histology and Smoking Status

Background Appropriate patient selection is needed for targeted therapies that are efficacious only in patients with specific genetic alterations. We aimed to define subgroups of patients with candidate driver genes in patients with non-small cell lung cancer. Methods Patients with primary lung cancer who underwent clinical genetic tests at Guangdong General Hospital were enrolled. Driver genes were detected by sequencing, high-resolution melt analysis, qPCR, or multiple PCR and RACE methods. Results 524 patients were enrolled in this study, and the differences in driver gene alterations among subgroups were analyzed based on histology and smoking status. In a subgroup of non-smokers with adenocarcinoma, EGFR was the most frequently altered gene, with a mutation rate of 49.8%, followed by EML4-ALK (9.3%), PTEN (9.1%), PIK3CA (5.2%), c-Met (4.8%), KRAS (4.5%), STK11 (2.7%), and BRAF (1.9%). The three most frequently altered genes in a subgroup of smokers with adenocarcinoma were EGFR (22.0%), STK11 (19.0%), and KRAS (12.0%). We only found EGFR (8.0%), c-Met (2.8%), and PIK3CA (2.6%) alterations in the non-smoker with squamous cell carcinoma (SCC) subgroup. PTEN (16.1%), STK11 (8.3%), and PIK3CA (7.2%) were the three most frequently enriched genes in smokers with SCC. DDR2 and FGFR2 only presented in smokers with SCC (4.4% and 2.2%, respectively). Among these four subgroups, the differences in EGFR, KRAS, and PTEN mutations were statistically significant. Conclusion The distinct features of driver gene alterations in different subgroups based on histology and smoking status were helpful in defining patients for future clinical trials that target these genes. This study also suggests that we may consider patients with infrequent alterations of driver genes as having rare or orphan diseases that should be managed with special molecularly targeted therapies.


Introduction
Lung cancer is a leading cause of cancer death in both men and women in the United States and throughout the world. Although various chemotherapeutic agents were developed in the late 1980s and 1990s, treatments such as platinum doublet therapy seem to have reached a therapeutic plateau, with an objective response rate of 30-40% and a median survival time of approximately 1 year for patients with stage IIIB or stage IV disease [1]. To further improve treatment outcomes, new strategies targeting molecular genomic abnormalities are under intensive investigation.
Several molecular alterations are known to occur in genes that encode signaling proteins critical for cellular proliferation and survival. These genes have been defined as ''driver genes''. In lung adenocarcinoma, such driver genes include epidermal growth factor receptor (EGFR), KRAS, BRAF, PIK3CA, and EML4-ALK. Mutations in these genes are responsible for both the initiation and maintenance of malignancy [2,3]. Other driver genes have been more recently defined, including STL11 (also known as LKB1), PTEN, DDR2, and FGFR2 [4][5][6][7][8]. By understanding the functions of these driver genes, it may be possible to develop specific therapies for malignancies with known driver gene mutations.
Tyrosine kinase inhibitors (TKI) targeting EGFR, including gefitinib and erlotinib, have become the standard first line therapy for patients with advanced non-small cell lung cancer (NSCLC) that harbor activating EGFR mutations [9,10]. However, almost all patients eventually develop resistance to EGFR TKIs. A number of mechanisms of resistance including KRAS mutation, EGFR exon 20 T790M mutation, and MET gene amplification, have been reported. Thus, a comprehensive molecular profile is needed to understand both the sensitivity and resistance to molecular targeted therapy for lung cancer [11].
Given the importance of biomarker selection to targeted cancer therapies, our group initiated the Guangdong General Hospital Lung Cancer Mutation Project (GGHLCMP). The objective of this project is to explore the impact of tobacco consumption and histology type on the incidence of driver gene mutations and to define subgroups of patients in whom candidate driver gene alterations are enriched. Here we report on a spectrum of driver genes, including EGFR, KRAS, c-Met, PIK3CA, BRAF, STK11, PTEN, EML4-ALK fusion gene, DDR2, and FGFR2 in a population of Chinese patients with primary lung cancer.

Ethics Statement and Patient Selection
A total of 1800 patients were referred to Guangdong General Hospital (GGH) for genomic studies between January 2007 and December 2009 ( Figure 1). Eligibility criteria included the following: histologic diagnosis of primary lung cancer; availability of demographic data, including age, gender, smoking status, histology and disease stage; availability of survival data; availability of tumor samples for genomic analyses; and provision of informed consent. Lung cancer diagnosis was confirmed by an independent pathologist. Clinical data were collected from the case histories of the patients in the hospital. Non-smokers were defined as patients who had smoked ,100 cigarettes in their lifetime; smokers included former and current smokers. Patients with other malignancies or benign lung tumors were excluded. The study was approved by the ethics committee of Guangdong General Hospital. All patients provided written informed consent.

Gene Alterations Detection
Tumor tissues biopsies were snap frozen in liquid nitrogen and stored at 270uC until analysis. We evaluated the tissues before genetic detection using HE staining. Specimens with $50% tumor cells were enrolled into this study. 357 samples were obtained from tumor specimens resected and 167 samples were from core biopsies. DNA and RNA were extracted by the Aqua-SPIN Tissue/Cell gDNA Isolation Mini Kit (Biowatson, Shanghai, China) and RNeasy Mini Kit (Qiagen, Valencia, CA), respectively. The integrity and quantity of RNA and DNA were assessed by gel electrophoresis and Thermo Manodrap 1000 (Thermo, MA, USA) analysis. cDNA was synthesized using an ABI High-Capacity cDNA Reverse Transcription Kit with RNase Inhibitor (ABI, CA, USA). Mutations in EGFR and KRAS were detected by PCRbased sequencing [12]. PIK3CA and BRAF mutations were detected by high-resolution DNA melting analysis [13]. EML4-ALK mutations were analyzed by multiple PCR and RACE on cDNA [14]. c-Met amplification was determined by qPCR on DNA [15]. PTEN (cDNA), STK11(cDNA), DDR2, and FGFR2 (DNA) mutations were detected by PCR-based sequencing (Table   S1). All testing procedures were previously described in the references.

Statistical Analysis
Alterations between different subgroups stratified by histology and smoking status were analyzed by Chi-square and Fisher's exact tests when appropriate. Survival analysis was performed by Kaplan-Meier analysis with a log-rank test. Multivariate analyses were conducted using Cox's proportional hazards model (Forward: Wald; P = 0.05, entry; P = 0.10, Removal). All p values were two-sided, and P,0.05 was assumed to be significant.

Patient Characteristics
A total of 524 eligible patients were enrolled; the patient characteristics are summarized in Table 1. The male to female ratio was 2.2:1. The mean patient age was 59.3 years old, ranging from 23 to 88 years old. A total of 292 patients were never-smokers (55.7%) There were more patients with adenocarcinoma (67.6%) than squamous cell carcinoma (27.5%). Early stage resectable lung cancer accounted for 41% of the study population. Survival outcome data were cut off on August 1, 2011, and a total of 138 death events occurred.
A number of patients were found to have multiple driver gene alterations (Table 3). A total of 23 patients with EGFR mutations had c-MET amplifications or mutations of STK11, PIK3CA, BRAF or PTEN. Of particular interest, one patient with EML4-ALK fusion had an activating EGFR mutation, and another patient had a PTEN mutation. Other common dual mutations occurred in patients with BRAF and KRAS, or BRAF and PIK3CA mutations. Only one patient had a triple mutation of EGFR, PIK3CA, and BRAF. EGFR and KRAS, PTEN and KRAS, PIK3CA and PTEN are mutually exclusive in this study.

Survival Analysis of Patients with Differing Driver Gene Features
Based on EGFR mutations, we classified patients into two subgroups for survival analysis; EGFR mutation positive and negative group. No difference in survival time was found between the two subgroups for the total EGFR mutation detected patient population (X 2 = 0.957, P = 0.328. Figure 4). Multivariable analysis of Cox Regression including the EGFR mutation, stage, histology, gender, smoking status, and age indicated that only the stage was the independent prognostic factor of the patients (X 2 = 16.607, P,0.0005, Forward: Wald; P = 0.05, entry; P = 0.10, Removal). We performed further stratification analysis based on the clinical stages of the patients. For stage 1 patients, the patients with EGFR mutation positive had longer survival time than patients with EGFR negative (X 2 = 3.947, P = 0.047, Figure 4).

Discussion
To our knowledge, this is the first study to investigate an entire profile of both the best known as well as novel driver genes, such as PTEN, DDR2, and FGFR2, in NSCLC patients, while taking into account different histologic types and smoking status in Chinese lung cancer patients. Different studies on driver genes may obtain differing results due to the ethnicity or clinical information of the patients [16][17][18][19][20][21][22]. The large sample size and the complex makeup of our patient population allowed us to compare differences among patient subgroups. The majority of previously published studies investigating driver genes have only focused on a specific subgroup of patients with AC or non-smokers [3,15,21]. The Lung Cancer Mutation Consortium sponsored by NCI is also interested in patients with adenocarcinoma [23]. The management of NSCLC is currently moving from a standard of care based on stage and performance status to more individualized therapies based on clinical, histologic, and molecular factors [24]. Accordingly, our study provides the first clear picture of how driver genes in an NSCLC population can vary with tumor histology and smoking status and finds subgroups of patients in whom alterations in these candidate genes are most enriched; these genes may therefore be targeted for individualized therapies. Furthermore, we analyzed the impact of driver gene alterations on the overall survival of patients.
The specific features of driver genes associated with each subgroup suggest that these subgroups might in fact be different diseases that will, in future, require different targeted therapies. Epidemiological, molecular and clinical-pathological features have shown that NSCLC in never smokers is a distinct entity [25]. Our subgroup analysis of NSCLC based on histology and smoking status showed that histology and smoking status could significantly influence alterations of driver genes, especially for EGFR, KRAS, STK11 and PTEN. EGFR mutations were enriched in nonsmoker patients with AC, KRAS in smokers with AC, and STK11 and PTEN in smokers. Although a subtype of non-smoker patients with AC has been intensively investigated due to their sensitivity to EGFR-TKI, subtypes of smoker patients with AC are rarely studied. Our study revealed that mutations of KRAS and BRAF in smoker AC patients were the highest among these subtypes. This study also indicates that non-smoker patients with SCC had the highest unknown alterations among the four subgroups. No alterations of KRAS, BRAF, PTEN, ALK, DDR2, and FGFR2 were found in non-smoker patients with SCC subgroup. The alteration rates of EGFR, c-Met, and PIK3CA were slightly lower among these subtypes. This demonstrates that non-smoker patients with SCC may have pathogenic mechanisms that different from known driver gene alterations. This study also revealed that a subtype of smokers with SCC had special characteristics as DDR2 and FGFR2 only presented in this subgroup. With the exception of BRAF, which only presented in AC, all of the other genes can be found in a subtype of smokers with SCC. PIK3CA mutations and c-Met amplification rates in smokers with SCC subtype were the highest among this subtype. Although our LCC sample size was small, we found that the PTEN mutation rate (27.3%) in LCC was the highest among all the subtypes. Emerging data reveals that tumor histology may relate to the benefits of specific chemotherapies or targeted therapy regimens [24]. Such relationships may also be partially associated with driver gene differences. The makeup of these genes in different subtypes may therefore be helpful to define specific variations of driver genes that could considerably refine treatment of NSCLC [26].
Our results indicate that all other genes are infrequently altered (,10%) in Chinese NSCLC patients, except EGFR mutations (28.4%). Driver mutations occur in genes that encode signaling  proteins that are critical for cellular proliferation and survival [1]. Together with our findings, this implies that a new classification method for NSCLC patients may be proposed based on the molecular biomarkers. Patients with infrequent alterations of driver genes could be considered to have rare or orphan diseases and should be considered different, from a research and individualized therapy point of view and in the future. A subgroup analysis of these driver genes in our study was very helpful for defining patients with different driver gene alterations for further clinical trials.
The alteration of some driver genes have a little difference from published studies, that may due to the different population studied [20,21]. The study reported that mutations in EGFR and KRAS were observed in 7 (7%) and 36 (38%) patients, respectively [22]. Others reported that KRAS mutations were detected in 75 of 395 (19%) and 40 of 233 (17%) patients with NSCLC, respectively [16,17]. Interestingly, though the KRAS mutation rate is different, but the preference of patients is the same, that is mutation appears to be more frequent in smokers and AC, current or former smokers had a higher frequency of KRAS mutations than never smokers [18,19].
Overlap mutations of driver genes revealed the complexity of individualized therapy in lung cancer in the future. EGFR and KRAS were the two most important genes studied by many researchers. Our study found EGFR mutation could overlap with the other detected genes except KRAS, DDR2 and FGFR2. RAS and several of its downstream effectors, including BRAF, have since been shown to be commonly mutated in broad range of human cancers and biological studies have confirmed that RAS pathway activation promotes tumor initiation, progression and metastatic spread in many contexts [28]. Though in our study BRAF mutation rate in NSCLC is infrequent 1.5% (7/452), 2 of the 7 patients harbored concurrent EGFR and BRAF mutations. This is different from the reported that BRAF mutations are mutually exclusive to EGFR and KRAS mutations [29,30]. ALK was found in 1 patient with AC to be concurrent presented with EGFR mutation. That's different with the previous reported results that the EML4-ALK fusions were mutually exclusive with mutations in the EGFR gene [31,32]. We believe that more and more overlaps of driver genes will be reported in the future, so the clinical practice will not only need to consider the sensitive mutation, but also need to consider the resistance mutation in the same patients for the same target therapy or combined therapy.
Overlap mutations of driver genes revealed the complexity of individualized therapy in lung cancer in the future. EGFR and KRAS were the two most important genes studied by many researchers. Our study found EGFR mutation could overlap with the other detected genes except KRAS, DDR2 and FGFR2. RAS and several of its downstream effectors, including BRAF, have since been shown to be commonly mutated in broad range of human cancers and biological studies have confirmed that RAS pathway activation promotes tumor initiation, progression and metastatic spread in many contexts [27]. Though in our study BRAF mutation rate in NSCLC is infrequent 1.5% (7/452), 2 of the 7 patients harbored concurrent EGFR and BRAF mutations. This is different from the reported that BRAF mutations are mutually exclusive to EGFR and KRAS mutations [28,29]. ALK was found in 1 patient with AC to be concurrent presented with EGFR mutation. That's different with the previous reported results that the EML4-ALK fusions were mutually exclusive with mutations in the EGFR gene [30,31]. We believe that more and more overlaps of driver genes will be reported in the future, so the clinical practice will not only need to consider the sensitive mutation, but also need to consider the resistance mutation in the same patients for the same target therapy or combined therapy.
We analyzed the prognostic significance of these subgroups and driver gene alterations. We did not find any differences in overall survival among these subgroups based on histology and smoking status. Our study and other study indicated that EGFR mutation in stage I patients may be a favorable prognostic biomarker [32]. These driver genes may therefore be used as predictive biomarkers if special compounds target these genes are found.
Our results may be slightly skewed due to the very low variation rate of most of the genes examined, the small sample size of some subgroups, the different detection methods, the different sample sizes of different genes, and the repeated analysis of the same specimens for different genes. As all patient specimens are truly precious, a high-throughput method for detecting driver gene alterations needs to be established as soon as possible.
In summary, our study demonstrates that subtypes of NSCLC defined by histology and smoking status appear to be distinct pathological entities with specific driver gene alterations and could be considered different diseases. Patients with infrequent alterations of driver genes implied that these are rare or orphan diseases that should be treated from a different point of view in the future. Features of characterized subgroups may help to select patients with specific driver gene alterations for future clinical trials and individualized therapy studies.