Polymorphisms in SELE Gene and Risk of Coal Workers' Pneumoconiosis in Chinese: A Case-Control Study

Background Coal workers' pneumoconiosis (CWP) is characterized by chronic pulmonary inflammation and fibrotic nodular lesions that usually lead to progressive fibrosis. Inflammation is the first step in the development of CWP. E-selectin, an adhesion molecule, is involved in the development of various inflammatory diseases. Methods We investigated the association between the functional polymorphisms in SELE and the risk of CWP in Han Chinese population. Three polymorphisms (T1880C/rs5355, T1559C/rs5368, A16089G/rs4786) in SELE were genotyped and analyzed in a case-control study with 697 CWP cases and 694 controls. The genotyping was based on the TaqMan method with the ABI 7900HT Real Time PCR system. Results The SELE rs5368 CT genotype was associated with a significantly increased risk of CWP (OR = 1.28, 95% CI = 1.02–1.60, P = 0.03) relative to the CC genotype. The statistical analysis of classification and regression tree (CART) and multifactor dimensionality reduction (MDR) were used to predict the interactions among risk factors of CWP. The MDR analysis found that the best interaction model was the two-factor model that contains pack-years smoked and SELE rs5368 genotypes. For non-smokers, the CART analysis showed an increased risk of CWP for carriers of the SELE rs_5368 variant genotype compared with the common genotype (OR = 1.51; 95% CI = 1.11–2.05, P = 0.0069). Conclusion The results suggest that the T1559C/rs5368 polymorphism and smoking are involved in the susceptibility to CWP. Further studies are warranted to validate these findings.


Introduction
Coal workers' pneumoconiosis (CWP) is a chronic occupational lung disease caused by the long-term inhalation and deposition of coal mine dust. The dust triggers a persistent inflammatory response and generation of pro-inflammatory and pro-fibrotic mediators, eventually resulting in irreversible lung damage [1,2]. In 2010, 87.42% of the total reported pneumoconiosis cases were attributed to occupation, and over 50% were related to underground mining [3]. Underground mining is associated with exposures to silica, metals, and coal dust generated during tunnel drilling, mining, roof bolting, and transportation. Most cases of CWP are caused by silica exposure.
Inhalation of respirable silica particles leads to lung injury and alveolar macrophage activation, which initiate inflammation. In the inflammation phase and subsequent fibrosis, various cells, including inflammatory cells, alveolar epithelial cells, mast cells, endothelial cells, and mesenchymal cells, form a complex network and interact with each other to promote the development of lung fibrosis by secreting cytokines, inflammatory mediators, and other bioactive substances [4]. Several epidemiological and experimental studies have been conducted to elucidate the pathogenesis of silicosis, but to date the cellular mechanisms that initiate and drive the processes of inflammation and fibrogenesis are not clear [4][5][6][7].
E-selectin, an 11-kD cell surface glycoprotein synthesized by endothelial cells, is an adhesion molecule of the selectin family. It mediates leukocyte-endothelial adhesion in various physiological and pathological settings [8,9]. Its corresponding gene, SELE (CD62E; ELAM1), is located on human chromosome 1q22-q25 [10]. Recent studies have confirmed that E-selectin is involved in pulmonary diseases [11,12]. Here we evaluated the frequency distributions of three SELE single nucleotide polymorphisms (SNPs) to explore the association between these polymorphisms and CWP. Two non-synonymous candidate SNPs, T1880C/ rs5355 and T1559C/rs5368, are located in the exon region of the E-selectin gene and one SNP, A16089G/rs4786, is in the 39UTR region of the E-selectin gene among all functioning SNPs in SELE, were chosen to be genotyped. These variants may provide clues to the pathogenesis of CWP [13][14][15].

Study subjects
In an ongoing study, 697 CWP patients and 694 controls were males and recruited from the coal mines of Xuzhou Mining Business Group Co., Ltd. between January 2006 and December 2010, as described previously [16]. The cases and controls were selected from the same mines; subjects were excluded if they had clinical evidence of autoimmunity diseases, had received immunosuppressive or immunostimulatory therapy, or were subjected to radiotherapy. The match criteria were as follows: age, dust exposure period, and job type. The questionnaire for participants was conducted by face-to-face interviewers using a double-blind method. This epidemiological questionnaire focus on age, respiratory symptoms, occupational histories, and smoking habits. To confirm diagnoses, high kilovolt chest X-ray and physical examinations were performed based on the China National Diagnostic Criteria for Pneumoconiosis (GBZ 70-2002), which are the same as that of the 1980 International Labour Organization (ILO) in the judgment of opacity profusion [5]. The pneumoconiosis cases were classified into stage I, stage II or stage III according to the size, profusion, and distribution range of opacities. The chest X-rays were assessed by two independent physicians (Z Song and X Jia). Blood samples of 5ml were obtained from all subjects and used for routine laboratory tests. This research protocol was approved by the Institutional Review Board of Nanjing Medical University, and all subjects gave their written informed consent before participating in the study.

SNPs selection
To select the most likely functional SNPs influencing SELE gene, we chose all the non-synonymous SNPs and the SNPs located in 39UTR and 59UTR, as determined in the HapMap Genome Browser release (Phase 1 & 2-full dataset). We included the following criteria for SNPs: (i) the SNPs should be located in exon, (ii) the SNPs should be non-synonymous, and (iii) the minor allele frequency (MAF) should be .5% in the Chinese population. T1880C/rs5355 and T1559C/rs5368 were included. We also searched all the SNPs located in 39UTR and 59UTR of SELE and found that only A16089G/rs4786 has a MAF .5% in Chinese population. Of these three SNPs selected for genotyping, the T1880C/rs5355 and T1559C/rs5368 variants would lead to missense change of the amino acid sequence, and A16089G/ rs4786 likely regulates the SELE transcription.

Genotyping
The genomic DNA from peripheral blood lymphocytes was extracted by the conventional phenol-chloroform method, then genotyping data were acquired by the TaqMan method with the ABI 7900HT Real Time PCR system according to the manufacturer's instructions (Applied Biosystem, Foster city, CA, USA). This was accomplished in a blinded fashion without knowledge of the workers' personal details or case status.
The sequences of the primers and probes for each SNP are available on request. Genomic DNA (50 ng) and 0.56 mix was used for each reaction, and amplification was performed under the following conditions: 50uC for 2 min and 95uC for 10 min followed by 45 cycles of 95uC for 15 sec and 60uC for 1 min. Negative controls were included in each plate to ensure accuracy of the genotyping. 10% of the samples were randomly selected for confirmation, and the results were 100% concordant.

Statistical analyses
Differences in the distributions of demographic characteristics, selected variables, and frequencies of genotypes of SELE polymorphisms between the CWP cases and controls were evaluated by use of Student's t-test (for continuous variables) or x 2test (for categorical variables). The Hardy-Weinberg equilibrium (HWE) was tested using a goodness-of-fit x 2 -test. Unconditional multivariate logistic regression (LR) analyses were accomplished to obtain odd ratios (ORs) and their 95% confidence intervals (95%CIs), with adjustments for possible confounders. For the stratified analysis, the age and dust-exposure cut off used were according to the median ages and dust-exposure years of the recruited patients and controls. Genotypes were coded as wild types (major-allele homozygote) and variants (minor-allele homozygote and heterozygote). The statistical power was calculated by using PS software. (http://biostat.mc.vanderbilt.edu/twiki/bin/ view/Main/PowerSampleSize). All statistical tests were two-sided at a significance level of 0.05 and were analyzed with SPSS software (version18.0).

CART analysis
A data mining tool, classification and regression tree (CART), with the rpart package of the R statistical software suite (version 2.11.1) was used. The CART method builds a decision tree via recursive partitioning and automatic selection of optimal cut-off points for variants. Binary recursive partitioning is observed in this classification tree model. A decision tree is a simple and effective method for sifting complex biological data for hidden explicit information. A parent node is split into two child nodes. The models aim to partition, recursively, input variables in order to maximize the purity of a terminal node. A decision tree is created by CART for identifying disease factors and susceptibility alleles and for inferring the mode of inheritance [17]. Figure 1 is a graphic display of decision criteria for each split, which predicts group memberships at the terminal nodes.

MDR analysis
Another data mining tool, the non-parametric multifactor dimensionality reduction (MDR) software (version 1.1.0) [18], was used to identify the potential locus-locus and geneenvironment interactions with trichotomized genotypes and trichotomized pack-years smoked. The fitness of an MDR model was assessed by estimating the testing accuracy and the crossvalidation consistency (CVC). Models having a testing accuracy of?0.5 were considered to be true positive. The cross-validations were conducted several times using different random seeds and the results averaged to avoid spurious results due to chance divisions of the data [19]. With a 10-fold cross-validation test, a model was conducted based on 9/10 of the data (training data) and evaluated by the remaining 1/10 of the data (testing data). The CVC was a measure of how many times of 10 divisions of the data that MDR found in the same best model. The sign test counted the number of the cases, k, where the testing accuracy was.0.5 of 10 crossvalidation cases. The model with the highest CVC and the highest testing accuracy was selected.

Characteristics of the study subjects
The demographic and clinical information is summarized in Table 1. There were no significant differences between the cases and controls in the distribution of age (P = 0.103), exposure years (P = 0.105), and job types (P = 0.534). The distribution of smoking status between cases and controls was parallel (P = 0.250), but the smoking amount (pack-years) in CWP cases was significantly less than that of controls (P,0.001). Furthermore, of the cases, the pneumoconiosis stages from I to III were 59.5%, 31.4%, and 9.0%, respectively. The primary information and allele frequencies observed are listed in Table 2. All genotyped distributions of control subjects were consistent with those expected from the Hardy-Weinberg equilibrium. The MAFs of these three polymorphisms were consistent with those reported in the HapMap database.
As shown in Table 3, logistic regression analysis revealed that the SELE rs5368 CT genotype, but not the TT genotype, significantly increased the risk of CWP, compared with the CC genotype (OR = 1.28, 95%CI = 1.02-1.60 for CT versus CC; and OR = 0.92, 95%CI = 0.60-1.39 for TT versus CC). However, no significant association with CWP was identified for the other polymorphisms examined. In the stratification analysis (Table 4), significant associations were observed between the SELE rs-5368 CT/TT genotypes and patients with stage II CWP (OR = 1.55, 95%CI = 1.12-2.15).
To consider potential interactions of cytokine gene polymorphisms on risk of CWP, we combined these three polymorphisms based on the numbers of variant (risk) alleles (i.e., rs4786A, rs5355T, rs5368T). As shown in Table 5, individuals with multiple risk alleles did not have a higher risk of CWP (Ptrend = 0.731).

Association of multiple factor interaction with CWP risk (CART analysis)
CART analysis was used to predict the interactions among risk factors of CWP. Interactions between three environment factors (exposed years, pack-years smoked, and job type) and the three SNPs were explored. The final tree structure contained four terminal nodes within rs5368 and pack-years smoked. Figure 1 showed that the initial split of the root node was pack-years

Association of multiple factor interaction with CWP risk (MDR analysis)
The MDR method was used to assess potential locus-locus and gene-environment interactions with three SNPs and three environment factors. As shown in Table 6, pack-years smoked was the strongest factor for predicting CWP risk with the highest CVC (100%) and testing accuracy (56.49%). We also observed that the two-factor model including smoking and SELE rs_5368 was the most accurate model (57.52%), with a perfect CVC of 10 that was statistically significant (P = 0.0107). Models including three or four factors showed decreases in testing accuracy and CVC.

Discussion
Genetic and environmental factors are involved in the development of CWP. In this case-control study, we explored, in a Chinese population, associations between three functional polymorphisms in SELE and the risk of CWP and the interaction between genetic and environmental factors. Of three SNPs (SELE rs_4786, SELE rs_5355, and SELE rs_5368), the SELE rs5368 CT genotype was associated with increased risk of CWP (OR = 1.28, 95% CI = 1.02-1.60), and the interaction between rs_5368 and smoking was detected. The fact that an association was found with a heterozygote genotype but not with the homozygote may be due to the small sample size. This also indicates that there might be a threshold effect for the production of the E-selectin protein or problems associated with heterodimer activity.
Previous studies on the association between SELE rs_5368 and soluble E-selectin (sE-selectin) levels have been controversial. Chen et al. [20] revealed significant association of SELE rs_5368 with soluble sE-selectin level in a Chinese population while Miller et al. [21] found no evidence of association between the SELE rs_5368 and circulating sE-selectin level. Interestingly, a recent study  conducted in Taiwanese individuals aimed to elucidate the association between SELE SNPs and the plasma levels of sEselectin and metalloproteinases 9 (MMP9). This study provided a compelling story that the minor alleles of rs5368 were significantly associated with higher plasma level of MMP9 [22]. In CWP, collagen degradation does not keep pace with collagen production, resulting in extracellular accumulation of fibrillar collagen. MMPs are responsible for extracellular collagen degradation by recognition of specific cleavage sites that have characteristic imino acid content [23,24]. In addition, MMP-9 can further degraded fragments of cleaved fibrillar collagen [25]. Thus, the SELE rs_5368 may exert influence over the plasma levels of sE-selectin and MMP9 to participate in the pathogenesis of CWP. Besides referring to the existing researches, we also exploited Exonic splicing enhancers (ESEs) finder, (release2.0: http://exon. cshl.edu/ESE/) which allows scanning of nucleotide sequences to predict putative ESEs responsive to the human Ser/Arg-rich proteins (SR proteins) SF2/ASF, SC35, SRp40 or SRp55 [26]. The searching result of rs_5368 in ESE finder namely shows that the rs5368 C allele forms two motifs which could bind separately to SRp40 and SRp55, while the rs5368 T allele binds only to SRp55 [27]. Thus, the SELE rs_5368 may influence the splicing of SELE and generate different splice variants of SELE.
Cell adhesion molecules are involved in the development of various inflammatory diseases [28]. In general, inhibition or loss of cell adhesion molecules attenuates the inflammatory response in experimental models [29][30][31][32]. The selectin family consists of three cell-surface molecules expressed by leukocytes (L-selectin), vascular endothelium (E-selectin), and platelets (P-selectin). E-selectin expression is induced within several hours after activation with inflammatory cytokines. Inhibition or loss of E-selectin leads to a reduction in neutrophil rolling and to acute emigration in models of inflammatory, such as the Arthus reaction, dermal inflammation, and peritonitis models [33][34][35].
Yoshizaki et al [7] showed that, in bleomycin-induced mice, fibrosis was evident in mice lacking adhesion molecules. L-selectin deficiency inhibited lung fibrosis, but P-selectin deficiency and Eselectin deficiency augmented the fibrosis [29]. Another study involving E-selectin 2/2, P-selectin 2/2 mice also revealed an inhibitory role of E-selectins in the development of bleomycininduced pulmonary fibrosis [36]. Both studies conducted flow cytometric analysis between cell adhesion molecule-deficient mice treated with bleomycin and WT mice and found that loss of cell adhesion molecule function selectively altered the trafficking pattern of fibrogenic Th2 and Th17 cells and anti-fibrogenic Th1 cells to the lung. E-selectin loss may induce dominant Th2 and Th17 cell infiltration. This pattern of leukocytes in the BAL resulted in differential production of cytokines. Lack of E-selectin reduced IFN-c and increased IL-4, IL-6, IL-17, and TGF-b1 [29,37,38]. IFN-c, one of the Th1 cytokines, has an antifibrotic effect on pulmonary fibrosis [39]. In addition, we should pay attention to natural killer T (NKT) cells which are innate memory cells. NKT cells plays important roles in the initiation and regulation of the immune response. Horikawa M et al [36] revealed that NKT cell infiltration into the lung was dependent on E-selectin expression. These results provide additional clues to understanding the complexity of the pathogenesis of pulmonary fibrosis.
To evaluate the contribution of genetic and environmental factors in CWP risk in the present study, data mining approaches were used. The first split in CART and the best one-factor model in MDR both indicated that smoking was the predominant risk factor for CWP. In the CART analysis, we found that the SELE rs_5368 polymorphism significantly interacted with pack-years smoked; a similar result was obtained by the MDR method. For non-smokers, the CWP risk was attributable more to the SELE rs_5368 variant genotype. Use of CART and MDR analysis to explore the interaction function is a major strength in our study. For fibrosis, a common but complex multifactorial diseases, interpreting interactions between genetic factors and the environment is a challenge. Traditional parametric statistical methods, such as LR analysis, are of limited use and could result in an increase of type I errors and a decreased power in detecting interactions [18]. Furthermore, we used the CART and MDR methods, which can reduce the chances of making type I errors and improve the statistical power, to identify potential geneenvironment interactions [40,41].
Several limitations of this study should be addressed. First, since this was a population-based, case-control study, we could not rule out the possibility of selection bias of subjects. Second, our sample size was only moderate, further studies are required for replicating our results in larger and different ethnic populations. Third, we selected all the eligible SNPs in UTR region and non-synonymous SNPs rather than a tagging SNP approach. Thus, we did not evaluate the entirety of polymorphic variations across SELE. Additional representative SNPs should be included to identify useful markers to predict the risk of CWP. Finally, Since three SNPs were tested, we should apply an appropriate multiple testing correction, such as the Bonferroni correction, otherwise the significant association between the SELE rs_5368 polymorphism and CWP risk should be interpreted with caution.
In conclusion, the present study indicates that the functional SELE rs_5368 polymorphism, which interacts with smoking, is associated with an increased risk of CWP in a Chinese population. Further functional researches and validation studies with diverse populations are warranted to confirm our findings.