Cancer Stem Cell Marker Musashi-1 rs2522137 Genotype Is Associated with an Increased Risk of Lung Cancer

Gene single nucleotide polymorphisms (SNPs) have been extensively studied in association with development and prognosis of various malignancies. However, the potential role of genetic polymorphisms of cancer stem cell (CSC) marker genes with respect to cancer risk has not been examined. We conducted a case-control study involving a total of 1000 subjects (500 lung cancer patients and 500 age-matched cancer-free controls) from northeastern China. Lung cancer risk was analyzed in a logistic regression model in association with genotypes of four lung CSC marker genes (CD133, ALDH1, Musashi-1, and EpCAM). Using univariate analysis, the Musashi-1 rs2522137 GG genotype was found to be associated with a higher incidence of lung cancer compared with the TT genotype. No significant associations were observed for gene variants of CD133, ALDH1, or EpCAM. In multivariate analysis, Musashi-1 rs2522137 was still significantly associated with lung cancer when environmental and lifestyle factors were incorporated in the model, including lower BMI; family history of cancer; prior diagnosis of chronic obstructive pulmonary disease, pneumonia, or pulmonary tuberculosis; occupational exposure to pesticide; occupational exposure to gasoline or diesel fuel; heavier smoking; and exposure to heavy cooking emissions. The value of the area under the receiver-operating characteristic (ROC) curve (AUC) was 0.7686. To our knowledge, this is the first report to show an association between a Musashi-1 genotype and lung cancer risk. Further, the prediction model in this study may be useful in determining individuals with high risk of lung cancer.


Introduction
Lung cancer is one of the most commonly diagnosed malignancies and the leading cause of cancer-related death in the world [1]. Cigarette smoking is considered as an important risk factor for lung cancer. However, only 10-15% of smokers develop lung cancer, suggesting that individual variation in genetic susceptibility to lung cancer in the general population may play a role. Cancer stem cells (CSCs) are a small minority of cells in a heterogeneous tumor population that drives tumor growth and have been associated with resistance to chemo-and radiationtherapies [2][3][4]. It has been demonstrated that lung CSCs play an important role in tumor initiation [5,6]. CDCs also share some similarities with normal stem cells, including self-renewal and differentiation, in addition to their potent tumor-driving capability [7][8][9][10]. CSCs are characterized by expression of particular molecular markers that play an important role in promoting stem cell self-renewal and maintenance [11].
Inhibition of EpCAM by small inhibitory RNA diminishes cell proliferation, migration and invasiveness [37].
Together, these observations have correlated aberrant function of CSC marker molecules with cellular hallmarks of cancer: hyperproliferation and metastatic behaviors. While SNPs have been extensively studied for their association with the risk and prognosis of cancers, little is known about the potential role of SNPs in CSC marker genes with relation to cancer. In this study, we examined the association of lung cancer risk in a Chinese population with polymorphisms of the well-established CSC marker genes CD133, ALDH1, Musashi-1 and EpCAM. A forecasting model was constructed using CSC marker SNPs and epidemiologic factors; the results provide a novel method to predict individuals at increased risk of developing lung cancer.

Study population
We conducted a hospital-based, case-control study involving a total of 1000 subjects from northeastern China (Changchun City, Jilin province). All subjects were local residents of Han descent, consisting of 500 patients clinically diagnosed with lung cancer and 500 cancer-free controls. Patients had histologically-confirmed primary lung cancer without previous cancer history, did not receive radiotherapy, chemotherapy or other anti-cancer therapy. Controls were randomly selected normal individuals receiving routine physical examinations in the same hospital. Case matching was performed based on age, gender and place of residence. The study was approved by the Ethics Committee of the First Hospital of Jilin Medical University, and conducted according to the Declaration of Helsinki Principles. All subjects were provided written informed consent.

Diagnostic criteria and Data Collection
A standardized interview was conducted by trained interviewers in the hospital or at the homes of participating individuals. Information regarding socio-demographic details, medical history, family history, lifestyle history, and cancer diagnosis was recorded. Risk factor information and peripheral blood lymphocytes were collected at the time of diagnosis for cancer patients or on the day of interview for controls.

CSC marker gene polymorphism selection
We used a candidate gene approach [38][39][40] to select SNPs for this study. Four well-established CSC marker genes (CD133, ALDH1, Musashi-1 and EpCAM) were selected in the study design. Expression of these four proteins had been reported as a marker to identify lung CSCs [41,42].
Three predefined criteria were used for CSC SNP selection: (a) minor allele frequency (MAF) $5% in the HapMap CHB population; (b) SNPinfo website (http://snpinfo.niehs.nih.gov) for candidate CSC gene SNP selection, and (c) publications showing clinical correlations with cancer risk/outcome or recurrence. Using these criteria, five CSC candidate SNPs were chosen in our model analysis: Rs2286455 in the CD133 gene, rs1342024 and rs13959 in the ALDH1 gene, rs2522137 in the Musashi-1 gene, and rs17036526 in the EpCAM gene ( Table 1). Based upon literature information, we excluded polymorphisms previously implicated in COPD or lung cancer. Additionally, we did not select SNPs in genes encoding proteins involved in pathways of cell-cycle control, oxidant response, apoptosis and airways inflammation. Finally, we avoided SNPs known to have either functional effects on in vitro assays, or were non-synonymous or in regulatory regions.

Genotyping and quality control
Genomic DNA was isolated from peripheral blood lymphocytes. MassArray (Sequenom, San Diego, CA) was used to genotype CSC markers using allele specific MALDI-TOF mass spectrometry. Primers and multiplex reactions were designed using the RealSNP.com Website. Concordance among the 3 genomic control DNA samples present in duplicate was 100%. Of the SNPs with genotyping data, the sample call rates were more than 95%.

Statistical analysis
The Hardy-Weinberg equilibrium (HWE) was tested by a best fit chi-square (x 2 ) test that compared expected genotype frequencies with observed genotype frequencies in cancer-free controls. The model was also used to determine the presence of significant differences in genotype and allele distribution as well as SNP frequency between clinically diagnosed lung cancer and controls. A logistic regression model was used to identify independent risk factors for lung cancer. The forward stepwise likelihood ratio method was employed to screen variables in model selection, where the cut-off for variables in the model was 0.05 and the cutoff for variables outside of model was 0.10; an optimal model with minimum akaike information criterion was selected. All categorical variables were set as dummy variables, and the first category of each variable was selected as baseline. The classification ability of the model was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), and the optimal operating point (OPP) was given afterwards. All analyses were conducted using SPSS v19.0 software (SPSS, Inc., Chicago, IL, USA). All P-values were two-sided, and P-values ,0.05 were considered statistically significant.

Distribution of genotype and its characteristics in cancer and control populations
We recruited 500 cases of lung cancer and 500 cancer free controls between 2010 and 2012. Table 2 shows the distribution and frequency of study-specific risk factors between cancer patients and controls.

Association of CSC marker gene SNPs with lung cancer risk in univariate analysis
We first evaluated lung cancer risk using univariate analysis. Among CSC marker gene SNPs selected, the Musashi-1 rs2522137 GG genotype had a tendency toward a higher incidence of lung cancer than the rs2522137 GG genotype in both recessive model (P = 0.004) and additive model. However, no significant differences were noticed for SNPs in other CSC marker genes in dominant, recessive, additive, or multiplicative models ( Table 3).

Association of SNPs with lung cancer risk in multivariate analysis
Next, we evaluated independent risk factors of lung cancer using multivariate analysis. By incorporating environmental and lifestyle parameters, we found that in the recessive model, Musashi-1 rs2522137 was still significantly associated with lung cancer. These environmental and lifestyle parameters included lower BMI, family history of cancer, prior diagnosis of COPD, pneumonia or pulmonary tuberculosis, occupational exposure to pesticide, occupational exposure to gasoline or diesel, heavier smoking, and exposure to heavier cooking emission ( Table 4). These data suggest that the Musashi-1 rs2522137 GG genotype is a significant genetic risk factor for lung cancer.

ROC analysis
The classification ability of the multivariate model was further evaluated using the area under ROC curve (AUC) and the optimal operating point (OPP). Figure 1 shows the ROC curve derived from our model; AUC was calculated as 0.7686. Furthermore, the OPP was obtained when the cutoff was set at 0.47. The estimated false positive rates, true positive rates, and Youden index were determined to be 0.28, 0.72, and 0.44, respectively.

Correlation between SNPs and lung cancer type
Finally, we looked for correlations between CSC SNPs and lung cancer type (squamous cell, adenocarcinoma, small cell) along with age at onset and gender of lung cancer. We did not observe statistically significant differences between these CSC SNPs and age or gender at the onset of lung cancer (Tables 5). In the pathology-stratified analysis, however, CD133 SNP rs2286455 was significantly correlated with lung cancer type (P = 0.048) ( Table 6). No differences were observed between the remaining SNPs being considered with lung cancer type.

Discussion
Single nucleotide polymorphisms (SNPs) have been extensively examined in practically all cancer types in an effort to identify inherited cancer susceptibility genes and their interaction with environmental factors. Cancer stem cells (CSCs) play an important role in tumor initiation, metastases, and recurrence. We have examined the potential correlation between SNPs present in CSCs and the likelihood of lung cancer. This is particularly important since SNPs in CSC-directing genes could provide a genetic link to cancers that are particularly challenging to treat. While typical cancer therapies may eliminate most of the tumor mass, a small population of CSCs with the potential to repopulate the tumor may remain [5]. It is generally accepted that CSCs are characterized by the unique expression of cell surface molecules called CSC marker genes. CSC markers play an important part in the maintenance of self-renewal and resistance to apoptosis pathway activation in these cells. In this study, we obtained information to support the hypothesis that clinical outcome in lung cancer patients may be influenced by genetic variants of CSC marker genes.
The potential impact of CSC marker gene polymorphisms on lung cancer susceptibility has not been previously explored. In this study we took the advantage of a hypothesis-driven candidate gene approach [38][39][40] to identify potentially functional SNPs associated with histologically validated lung cancer. In contrast to genome-wide association (GWA) and quantitative trait locus (QTL) approaches, the candidate gene approach is economical and has rather high statistical power [38]. We focused on four CSC marker genes that have been used to isolate CSCs: CD133, ALDH1, EpCAM, and Musashi-1 [41,42]. Using the candidate gene approach, we selected a panel of SNPs in these CSC gene loci from SNP websites and peer-reviewed literature. SNPs identified to have high allele frequency were genotyped in 500 lung cancer cases along with 500 age-matched controls. Our results have identified the Musashi-1 variant as an independent risk factor for lung cancer. It is also interesting to note that the Musashi-1 rs2522137 genotype was still associated with lung cancer risk in a multivariate regression model that considered several environmental and lifestyle factors. Taken together, this study provides the first evidence to correlate the Musashi-1 rs2522137 SNP variant with lung cancer.
Currently, we know very little about the detailed molecular mechanisms by which Musashi-1 rs2522137 polymorphisms might contribute to lung cancer development. Musashi-1 is an evolutionarily conserved RNA-binding protein that has profound implications in cellular processes, such as stem cell maintenance, nervous system development, and tumorigenesis. Musashi-1 is highly expressed in many cancers, whereas in normal tissues, its expression is restricted only to stem cells. It is now clear that this RNA-binding protein is involved in cell asymmetric division and is required for the maintenance of stem cell identity [43][44][45]. Interestingly, Musashi-1 mRNA transcript contains an 1811-bp long 39-untranslated region   The 39-UTR of mature Musashi-1 mRNA is potentially targeted by several tumor suppressor miRNAs, including miR-34a, -101, -128, -137 and -138 [46]. In addition, the Musashi-1 mRNA 39-UTR contains several AU-and U-rich sequences that are targeted by an evolutionarily conserved RNA-binding protein HuR [47]. HuR is a member of the Hu/ELAV (embryonic lethal abnormal vision) family, which is highly expressed in tumor tissue and enhances tumorigenesis by interacting with a subset of mRNAs that encode proteins in the regulation of cell proliferation, cell survival, angiogenesis, invasion, and metastasis [48,49]. Using the SNPinfo website (http://snpinfo.niehs.nih.gov/), we found that the Musashi-1 39-UTR also buries potential target sites for miRNAs hsa-miR-1275, hsa-miR-1285, hsa-miR-483-5p, hsa-miR-486-3p, hsa-miR-612, and hsa-miR-625. It is worthwhile noting that Musashi-1 rs2522137 is located within these miRNA and HuR binding sites. Future studies are needed to delineate whether rs2522137 variants may affect the binding of these regulatory miRNAs and HuR protein. Presumably, the Musashi-1 rs2522137 GG variant may interfere with the binding of miRNAs and HuR factor, thus increasing the stability of Musashi-1 mRNA. If true, this mechanism could provide the basis for the Musashi-1 rs2522137 variant to maintain self-renewing lung cancer stem cells.
SNPs represent inherited genetic variations that occur during the lifetime of an individual. It is well known that non-genetic risk factors, such as age, history of lung disease, and smoking history are also very important and can be combined to develop riskbased models of cancer. Robert et al [50,51] have suggested that SNPs need to be combined with other risk variables to identify individuals who are most susceptible to developing lung cancer. Similarly, the Liverpool Lung Project risk model improves its predictive capability of lung cancer by adding a marker SNP (rs663048) in the SEZ6L gene [52,53]. In this study, we identified a correlation between lung cancer and a specific SNP within a CSC marker gene. Environmental and lifestyle factors included in this analysis, such as occupational exposure to pesticide, occupational exposure to gasoline/diesel prior diagnosis of pulmonary tuberculosis, and cooking emission, provide a similar correlation relative to an age-matched control population. In summary, this study revealed a significantly increased risk of lung cancer for the CSC marker Musashi-1 rs2522137-GG compared with -TT and -TG SNPs in a Chinese population. The ROC AUC of our model was 0.7686, indicating the potential to identify high-risk individuals in the Chinese population by focusing on information that can be readily obtained in the primary care setting. Finally, this lung cancer risk prediction model discriminated between high-and low-risk individuals. Further studies are needed in larger cohorts of unselected cases and controls to further validate and extent these initial observations.