A Large Scale Gene-Centric Association Study of Lung Function in Newly-Hired Female Cotton Textile Workers with Endotoxin Exposure

Background Occupational exposure to endotoxin is associated with decrements in pulmonary function, but how much variation in this association is explained by genetic variants is not well understood. Objective We aimed to identify single nucleotide polymorphisms (SNPs) that are associated with the rate of forced expiratory volume in one second (FEV1) decline by a large scale genetic association study in newly-hired healthy young female cotton textile workers. Methods DNA samples were genotyped using the Illumina Human CVD BeadChip. Change rate in FEV1 was modeled as a function of each SNP genotype in linear regression model with covariate adjustment. We controlled the type 1 error in study-wide level by permutation method. The false discovery rate (FDR) and the family-wise error rate (FWER) were set to be 0.10 and 0.15 respectively. Results Two SNPs were found to be significant (P<6.29×10−5), including rs1910047 (P = 3.07×10−5, FDR = 0.0778) and rs9469089 (P = 6.19×10−5, FDR = 0.0967), as well as other eight suggestive (P<5×10−4) associated SNPs. Gene-gene and gene-environment interactions were also observed, such as rs1910047 and rs1049970 (P = 0.0418, FDR = 0.0895); rs9469089 and age (P = 0.0161, FDR = 0.0264). Genetic risk score analysis showed that the more risk loci the subjects carried, the larger the rate of FEV1 decline occurred (P trend = 3.01×10−18). However, the association was different among age subgroups (P = 7.11×10−6) and endotoxin subgroups (P = 1.08×10−2). Functional network analysis illustrates potential biological connections of all interacted genes. Conclusions Genetic variants together with environmental factors interact to affect the rate of FEV1 decline in cotton textile workers.


Introduction
According to an official statement from the American Thoracic Society (ATS) [1], cigarette smoking is not the sole meaningful cause of chronic obstructive pulmonary disease (COPD). There is also strong evidence suggesting a causal relationship between COPD and occupational exposures (e.g. endotoxin [2]) or genetic syndromes (e.g. a 1 -antitrypsin deficiency [3]). Generally released from bacterial lysis, endotoxin is ubiquitous in the airborne environment [4]. High endotoxin levels are observed in cotton textile mills processing cotton. Exposure to cotton dust leads to chronic respiratory disease and excessive loss of pulmonary function [5][6][7]. Recently, genome-wide association studies (GWAS) [8][9][10] and large-scale meta-analysis of GWAS [11][12][13] of COPD also offered mechanistic insight into pulmonary function regulation due to individual genetic heterogeneity (e.g. 6p21.32 region) in non-occupationally exposed populations.
Even for occupationally exposed populations, there is evidence that single nucleotide polymorphisms (SNPs) may play an important role in pulmonary function while taking endotoxin effect into account. Our previous study in 20-year longitudinal cohort suggested that Tyr113His and His139Arg polymorphisms in microsomal epoxide hydrolase (mEH) gene [14], rs1800629 in tumor necrosis factor (TNF) gene and rs909253 in lymphotoxin alpha (LTA) gene [15], may modify the association between endotoxin exposure and annual decline of FEV 1 in cotton textile workers, though no statistically significant marginal effects of those polymorphisms were observed.
In this study, we aimed to explore the association between the rate of FEV 1 decline and genetic heterogeneity in previously unexposed newly-hired young healthy female cotton textile workers entering endotoxin-exposed work areas in yarn preparation based on large scale genomic data. To the best of our knowledge, there is no large scale association study of lung function in occupational settings.

Ethics Statement
The institutional review boards of the Harvard School of Public Health, the Putuo District People's Hospital, and the Human Resources Administration of China approved the study. All participants gave written informed consent before the study.

Study Population
The study population was derived from the Newly-Hired Chinese Textile Workers Study, an intensive, repeated measures prospective cohort study in the cotton textile industry consisting of 384 adults in Shanghai, China. All subjects were self-reported Han Chinese ethnicity, females, and lifelong non-smokers, asymptomatic of cardiopulmonary disease and naive to occupational endotoxin exposure before entering cotton textile mill. Detailed information on subject selection, methods for testing pulmonary function, and exposure assessment has been described previously [16][17][18]. A baseline examination took place in March of 1997, and follow-up examinations occurred at 3 months, 12 months, and 18 months later. Only subjects with DNA samples that met quality assurance requirements for genotyping and analysis (n = 301; refer to quality control section for details) were included in this study. Among the 301 subjects, one hundred and sixty-three (54.2%) subjects with average age of 18.4 years (range 16.0-28.8 years) were newly employed, while 138 (45.8%) subjects with average age of 33.1 years (range 16.6-46.7 years) were currently working in converted silk mills. Sixty-two (20.6%) subjects were followed up to 3 months, 42 (14.0%) subjects were followed up to 12 months, and 197 (65.4%) subjects were followed up to 18 months.

Exposure Assessment and Pulmonary Function Tests
Briefly, airborne cotton dust was sampled with a settled vertical elutriator (General Metalworks Corp., Mequon, WI) fixed in work area using 37 mm PVC filters, and endotoxin concentration was analyzed using chromogenic Limulus amoebocyte lysate assay, as previously described [19]. The workers remained in the same area throughout the observation. We recorded the endotoxin concen-tration for all work areas at each follow-up. Spirometric measurements were conducted by a trained technician at cotton mills using an 8-liter water-sealed field spirometer (W.E. Collins Co., Braintree, MA) calibrated twice each day with a 3-liter syringe. Forced expiratory spirograms were carried out before and after work shifts on the first day back to work after a two day rest, and each worker performed up to seven trials to produce three acceptable curves [16]. The present analysis focused on change rate in pre-shift FEV 1 over the 18 month period of follow-up.

DNA Isolation and SNP Genotyping
DNA samples were extracted from whole blood using a DNA purification kit (Gentra Systems, Inc., Research Triangle, NC). Of the 384 DNA samples collected from subjects who participated in the baseline examination, 55 samples were excluded due to inadequate DNA content or low DNA quality. Genotyping was performed using the Illumina HumanCVD Beadchip (IBC array) by laboratory personnel without the knowledge of phenotype information. The IBC array incorporates about 50,000 SNPs to efficiently capture genetic diversity across .2,000 genic regions related to cardiovascular, inflammatory and metabolic phenotypes. Genetic variation within the majority of these regions is captured at density equal to or greater than that afforded by genome-wide genotyping products [20]. Of the 329 samples that were sent for genotyping, 312 were genotyped successfully.

Quality Control
We performed systematic quality control procedures to filter both unqualified samples and SNPs before the association analysis. Briefly, SNPs were removed for a total of 49,094 genotyped ones if they were located in non-autosomal chromosomes (1,126), had a call rate #95% (954), minor allele frequency (MAF) ,0.05 (19,357), P,0.001 for Hardy-Weinberg equilibrium (HWE) test (46). For these 312 samples, one subject was removed with a genotyping call rate ,95%. None was ambiguous for genetic sex. Seven subjects failed the cryptic relatedness check. We also detected outliers (three individuals were excluded) and population stratification using EIGENSTRAT 4.2 [21], a method based on principal component analysis. As a result, a total of 301 subjects and 27,611 SNPs were left for subsequent analysis. Quality control procedures were implemented in PLINK (version 1.07) [22].

Statistical Methods
As presented in Figure S1 in File S1, the FEV 1 declines linearly with increased exposure time, but the rate of FEV 1 decline varies among individuals. We aimed to evaluate the association between SNP and the rate of FEV 1 decline. Firstly, we estimated the rate of monthly FEV 1 declines for each individual in a linear regression model through Equation 1. For the i th individual, FEV 1ij is the observed FEV 1 value at different months (month j = 0, 3, 12 and 18), u 0i is baseline FEV 1 and e ij is the random error at j th months. The estimated slope (ES i ) reflected the rate of FEV 1 decline as a function of month for each individual. The mean and median R 2 of the linear model for all 301 subjects are 0.63 and 0.73 respectively indicating the present model is not bad to describe the monthly decline rate of FEV 1 . ES i was then used as the outcome in the following association analysis.
Secondly, we assessed the associations between ES and each SNP by fitting multiple-variable linear regression models (Equa-tion 2) adjusted for height, age, FEV 1 at baseline and the average level of log-transformed endotoxin exposure. In Equation 2, e i is the random error for i th individual. Two genetic models (dominant and additive) of inheritance for each SNP were considered respectively. In additive model, SNPs with low frequency (#5%) of rare homozygous were excluded for lack of statistical robustness [23].
The bootstrap re-sampling analysis was then performed to control the false positive discovery for significant SNPs [24]. We generated 2,000 bootstrap samples, and SNPs were tested 2,000 times respectively. The results are not likely to be falsely positive if the SNP has more than 1,600 times (80%) bootstrap P value less than the predefined threshold (e.g. 0.005 for 10 SNPs).
Gene-gene and gene-environment interactions were tested using Equation 3. Here, b A and b B are the main effects of factor A and B, respectively. b AB is the interaction.
We also constructed a genetic risk score (GRS) by counting the number of risk genotypes of top 10 SNPs that the individuals carried [25]. The association between the GRS and the rate of FEV 1 decline was also tested in multiple-variable linear regression models to evaluate the join effects of multiple genetic factors. Additionally, stratification analysis of GRS by age and endotoxin level was performed.
Due to the linkage disequilibrium (LD) structure among SNPs, a standard Bonferroni correction would yield a significance level of approximately 1.81610 26 , which is very conservative. And, the Human CVD array has a dense gene-centric design, similar studies have used a less stringent level around 1610 24 previously [26,27]. In this exploratory study, we controlled the false discovery rate (FDR) and the family-wise error rate (FWER) in study-wide level by permutation method (see details in Appendix S1). The FDR was set to be 0.10, meanwhile the FWER were set to be 0.15 (corresponding threshold for each SNP was 6.29610 25 ). We imputed un-genotyped SNPs using Minimac software [28] based on LD information from hg/19 1000 Genomes database (with ASN as the reference set, released November 2010). Gene functional network was built using MetaCore TM online software [29] (GeneGo, Inc., Carlsbad, CA) to illustrate potential biological connections of interacted genes identified in this study. Statistical analysis and data management were carried out through: PLINK (Version 1.07) [22] and R software (version 2.14.0; The R Foundation for Statistical Computing). Manhattan plots were generated using Haploview (version 4.2) [30].

Results
The mean decline of FEV 1 for the overall 301 subjects was 6.79625.09 ml per-month. Individuals were classified to different subgroups by age, height, FEV 1 at baseline and endotoxin level respectively and the rate of FEV 1 decline (ES) in each subgroup was presented in Table 1. The large standard deviation (SD) of ES reflects a large individual variation of ES among subjects, which probably be caused by different environmental exposure, demographic characteristics or genetic background. We fitted a multivariable linear regression model (Equation 2) and predicted the ES for each subject. The SD of the predictive ES was presented in Table S1 in File S1. When deducted the effects of non-genetic factors, the SD of predictive ES ranges from ,4 to ,6. We assumed that the genetic factors might contribute to the remained variation of ES.

Results of Individual SNP Analysis
In additive model, the frequency of rare homozygosity of the top-10 SNPs was less than 2.5% (data not shown) and none was significant. Thus, only SNPs identified by the dominant model are presented ( Table 2, Tables S2 and S3 in File S1). Two risk SNPs (rs1910047 and rs9469089) were significant with P,6.29610 25 . Individuals carrying the TT genotype of rs1910047 had average 4.32 ml decline of FEV 1 per-month, while those carrying the TA or AA genotypes had 18.32 ml decline of FEV 1 per-month. After adjustment of covariates, individuals carrying the AA genotype had more rapid FEV 1 decline (15.17 ml/moth) compared with those carrying the TA or AA genotype (P = 3.70610 25 , FDR = 0.0778). For rs9469089, individuals carrying the GC or CC genotype also had more rapid FEV 1 decline (12.50 ml/month) than those carrying the GG genotype (P = 6.19610 25 , FDR = 0.0967).
Eight SNPs (rs32588, rs4855881, rs10515978, rs601675, rs11761231, rs10129426, rs10201627 and rs1049970) with P,5610 24 were considered suggestive for accelerated changes of FEV 1 . Two SNPs (rs32586 and rs32589) are in perfect LD with rs32588 (R 2 = 1.00). rs3749073 is also in high LD with rs10201627 (R 2 = 0.98) ( Table S4 in File S1). The bootstrap re-sampling analysis was then performed for the top-10 SNPs to internally validate the results. All top-10 SNPs had a bootstrap P value,0.005 more than 1,600 times out of 2,000 bootstrap samples (over 80% significant), indicating that those SNPs were not likely to be false-positive results.
Manhattan plot using P values derived from the dominant model are presented in Figure S2 in File S1. A small genomic control inflation factor (l) of 1.026 indicated little inflation of the large scale genotyping study results due to population stratification ( Figure S3 in File S1). We again did a sensitivity analysis with further adjustment of first eight PCs generated by EIGENSTRAT 4.2. However, none PCs was significant in the model (P values range from 0.24 to 0.82). As presented in Table S5 in File S1, the results do not vary much whenever PCs adjusted or not. After imputation, a cluster of adjacent SNPs in high or low LD with targeted SNP were observed with P value from 10 25 to 10 23 (Appendix S2 and Table S6 in File S2). We also tested those 10 SNPs in additive model ( Table S7 in File S1). All SNPs were significant at 0.005 level, except for rs10515978 (P = 0.0159). Considering 73 subjects are young women who may still have been in the lung function growth period of their development, we did another sensitivity analysis only in those subjects with age $18 years. As a result, the effects of SNPs do not vary much between all samples and only adult samples ( Table S8 in File S1).

Results of Genetic Risk Score Analysis
As presented in Figure 2 and Table S9 in File S1, the more risk loci the subjects carried, the larger the rate of FEV 1 decline occurred (P trend = 3.01610 218 ). Bootstrap re-sampling analysis showed that bootstrap P trend was less than 1610 215 with 92.75% of 2,000 bootstrap samples. Additionally, we performed stratification analysis of GRS by age and endotoxin level (Table S10 in File S1). Though, the rate of FEV 1 decline became larger with the increase of GRS, the association differed among age subgroups or endotoxin subgroups, which indicated that there were interactions between GRS and age (P = 7.11610 26 ) or GRS and endotoxin (P = 1.08610 22 ) adjusted for other covariates (Figure 3).

Discussion
Host genetic factors probably influence susceptibility to the rate of FEV 1 decline after occupational exposure to endotoxin. This exploratory study is the first to provide evidence that genetic factors may play an important role in determining the rate of FEV 1 decline using a large scale gene-centric aproach. We found that two SNPs (rs1910047 and rs9469089) were significantly associated with the rate of FEV 1 decline, as well as eight suggestive associated SNPs. Further analysis indicated that potential genegene interaction and gene-environment interaction were ubiquitous in the genetic architecture of complex traits.
rs1910047 is located at 12p24.21, upstream of two genes: T-box 5 (TBX5, 280 kb upstream) and T-box 3 (TBX3, 7 kb upstream). T-box genes encode transcription factors that can mediate the production of branching signals by the lung mesenchyme in the embryonic mouse lung. TBX4;TBX5 double heterozygous mutants showed decreased lung branching, indicating that T-box genes may play an important role in the process of lung growth [31,32]. In addition, the Tbx5 was reported to be associated with asthma susceptibility in the developing lung in rat models [33]. Thus, it is biological plausible that SNP variation may result in aberrant activity of the TBX5 gene, which may affect lung function, accelerating FEV 1 decline. TBX3 is a downstream target of Wnt/ beta-catenin pathway, which is a key regulator in cell proliferation and differentiation [34]. Recent evidence has suggested that TBX3 is over-expressed in a number of cancers including lung cancer [34], as well as in transformed lung fibroblast cells [35].
rs9469089 is located at 6p21.32, within the first intron of RNF5 (encoding membrane-bound ubiquitin ligase). RNF5 controls the membrane fraction of ATG4B and limits LC3 (ATG8) processing, aberrant of which may limit basal levels of autophagy and influence susceptibility to bacterial infection [36]. Previous studies have collectively suggested that RNF5 negatively regulates the virus-triggered immune response [37,38]. Besides, rs9469089 is about 2 kb downstream of advanced glycosylation end productspecific receptor (AGER), which is consistently reported to be associated with lung function [11][12][13].
rs1049970 which interacted with rs1910047 is a missense variant of the cadherin 5, type 2 (CDH5) gene that may play an important role in endothelial cell biology through control of the cohesion and organization of the intercellular junctions. Deficiency or truncation of Cdh5 induced endothelial apoptosis and abolished transmission of the endothelial survival signal in a mouse Table 4. The results of gene-environment interaction analysis for top-10 SNPs (P,0.05). model [39]. Both two SNPs are involved in cell stability, including proliferation, differentiation and apoptosis. Thus, it is biologically conceivable that these two SNPs, or those in LD with them, may directly regulate targeted genes, together resulting in more rapid FEV 1 decline. It is noteworthy that four SNPs interacted with each other adjacently, producing three pairs of end-to-end interactions in this study (rs10515978_rs32588, rs32588_rs11761231 and rs11761231_rs4855881). rs10515978 is within the intron of lectin, mannose-binding, 1 (LMAN1), mutational inactivation of which was a frequent and early event potentially contributing to colorectal tumorigenesis [40]. rs32588 is within the exon of peroxisome proliferator-activated receptor gamma, coactivator 1 beta (PPARGC1B), which was associated with airway hyperreactivity in asthmatic patients [41]. rs11761231 was reported to be associated with rheumatoid arthritis in Caucasians [42][43][44]. rs4855881 is located at 3p21.31, within the intron of Nacylaminoacyl-peptide hydrolase (APEH). Deletions of which are commonly at the 3p region, and have been found in various types of carcinomas, including lung cancer [45].
Functional network analysis reveals potential biological connections among these four pairs interacted genes. Fourteen nodes as well as other ones directly or indirectly connect to the four functional paths and form the network. Among the fourteen key nodes, genetic variation in the gene encoding p53 can impair the response to cell damage and increase the loss of alveolar epithelial cells [46]. Also, it's reported that p53 may modify the effect of diminishing PM10 exposure on lung function decline [47]. Vascular endothelial growth factor receptor (VEGFR) was reported to mediate the anti-inflammatory, anti-protease, and anti-apoptosis effects of the lung, hence contribute to an attenuation of emphysema and destructive pulmonary function in COPD [48]. Signal transducer and activator of transcription 3 (Stat3) plays an essential role in the pathogenesis of IgG immune complex (IC)-induced acute lung injury [49]. Beoseanrceh morphogenetic protein 4 (BMP4) plays a potential regulatory role of lung fibroblast function [50]. Combined with our gene-gene interaction results, it is possible that the interaction among genes is associated with the rate of FEV 1 decline through some possible functional paths.
rs10129426 is about 4 kb downstream of BCL2-associated athanogene 5 (BAG5), which encodes anti-apoptotic protein that functions through interactions with a variety of cell apoptosis and growth related proteins [51]. According to the UCSC and TFSearch [52] databases, polymorphisms of this gene may influence transcription factor binding (GATA-binding factor 2) [53]. rs10201627 which is located in the micro-RNA binding sites of the 39-UTR of G protein-coupled receptor 55 (GPR55), was predicted to affect miR-26b binding. And it was found that expression of miR-26b was down-regulated in lung cancer tissues [54]. Moreover, rs3749073, as the sole non-synonymous polymorphism (Gly195Val) in the GPR55, is in high LD (r 2 = 0.98) with rs10201627. It was predicted that GPR55 with Val195 appeared to induce less phosphorylation of extracellular signal-regulated kinase (ERK) than Gly195 [55]. It was also reported that GPR55 promoted cancer cell proliferation by ERK [56].
Actually, among these top ten SNPs, five SNPs (rs1910047, rs1049970, rs4855881, rs10129426 and rs10201627) are related to cell proliferation or apoptosis and three SNPs (rs9469089, rs32588 and rs11761231) are related to the immune inflammatory response.
We then again investigated the marginal effects of rs1051740 (Tyr113His), rs1051741, which was in high LD with rs2234922 (His139Arg, not genotyped in our chip), rs1800629 and rs909253 in this study according to our previous studies of long-term pulmonary function decline [14,15]. None of them reached the statistical significance level. However, rs1051740, rs1051741 and Figure 2. The frequency distribution of genetic risk score (GRS) and the coefficient and 95% CI to genetic risk score in linear regression model adjusted for height, age and FEV 1 at baseline, and average log transformed endotoxin level. The more risk loci the subjects carried, the larger the rate of FEV 1 decline occurred (P trend = 3.01610 218 ). doi:10.1371/journal.pone.0059035.g002 rs909253 could still be significant modifiers in the association between endotoxin exposure and the rate of FEV 1 decline. The effect of endotoxin exposure was different among subjects carrying different genotypes of SNP. The P values of heterogeneity for these three SNPs were 4.83610 26 , 6.07610 23 and 1.04610 22 respectively. Although, the subjects in this current study were all newlyhired young healthy female cotton workers, and therefore different from those prospectively followed for 20 years in our previous studies, the results still support what we reported previously. Additionally, four gene-gene interactions and four gene-environment interactions in this study again provided evidence that multiple factors might contribute to the rate of FEV 1 decline interactively.
Overall, the more risk loci carried by the cotton worker, the larger the rate of FEV 1 decline over 18 months. However, younger workers (age ,25 years) had more accelerated decline per-month than relatively older ones (age $25 years) when GRS increased. Moreover, for those carrying same number of risk loci, younger workers had more rapid decline of FEV 1 than relatively older ones. This provided evidence that endotoxin exposure tended to cause more serious damage to lung function of young workers, even with same genetic background.
Because all the workers of the cohort were lifelong young female non-smokers with relatively pure occupational exposure to endotoxin, this population provided a unique opportunity to observe gene effect to the rate of FEV 1 decline when adjusted environmental effect. The results contribute scientific support to do further functional studies of those genes in relation to the pathogenesis of endotoxin-related airways disease.
However, it is important to recognize the limitations of our study. Firstly, there is no similar external population to validate the observed SNP associations. Although we have performed internal statistical validation by bootstrap re-sampling procedures to minimize false positive discovery, we could not exclude the possibility of false discoveries among our findings. Besides, we controlled both FDR and FWER in study-wide level (0.10 and 0.15 respectively), which guaranteed the confidence of what we found. Secondly, the sample size was not large enough to detect some SNPs with modest effects. Thus, study with enlarged sample size were warranted. Finally, airborne endotoxin concentrations were estimated from sampling airborne cotton dust at fixed positions in work areas, rather than from sampling the air in the personal breathing zone of each participant, since there is no personal sampling technology developed for these kinds of exposure. The lack of personal air sampling data may be a possible source of exposure misclassification for this study. Thus, we used average of repeated measurement of endotoxin exposure level for each subject.
In summary, we performed the first large scale exploratory study of genetic variation in the rate of FEV 1 decline for newlyhired young healthy female cotton textile workers exposed to endotoxin. Genetic variants that play a role in immune inflammatory response, cell proliferation and cell apotosis, together with environmental factors interact to affect the rate of FEV 1 decline after initiation of exposure and this decrement happens over short time. Additional replication and gene function studies are necessary to confirm what we found.

(DOCX)
File S2 The association results of imputed SNPs and genotyped SNPs 500 kb around targeted SNPs (Table  S6).

(XLSX)
Appendix S1 Description of the FWER and FDR controlling procedure. (DOCX) Appendix S2 Regional plot of the top-10 SNP using P values derived from multi-variable linear regression model adjusted for height, age and FEV1 at baseline, and average log transformed endotoxin level.

(PDF)
Appendix S3 Sixty-three nodes in the gene-gene interaction functional network. (DOCX)