Susceptibility to Chronic Mucus Hypersecretion, a Genome Wide Association Study

Background Chronic mucus hypersecretion (CMH) is associated with an increased frequency of respiratory infections, excess lung function decline, and increased hospitalisation and mortality rates in the general population. It is associated with smoking, but it is unknown why only a minority of smokers develops CMH. A plausible explanation for this phenomenon is a predisposing genetic constitution. Therefore, we performed a genome wide association (GWA) study of CMH in Caucasian populations. Methods GWA analysis was performed in the NELSON-study using the Illumina 610 array, followed by replication and meta-analysis in 11 additional cohorts. In total 2,704 subjects with, and 7,624 subjects without CMH were included, all current or former heavy smokers (≥20 pack-years). Additional studies were performed to test the functional relevance of the most significant single nucleotide polymorphism (SNP). Results A strong association with CMH, consistent across all cohorts, was observed with rs6577641 (p = 4.25×10−6, OR = 1.17), located in intron 9 of the special AT-rich sequence-binding protein 1 locus (SATB1) on chromosome 3. The risk allele (G) was associated with higher mRNA expression of SATB1 (4.3×10−9) in lung tissue. Presence of CMH was associated with increased SATB1 mRNA expression in bronchial biopsies from COPD patients. SATB1 expression was induced during differentiation of primary human bronchial epithelial cells in culture. Conclusions Our findings, that SNP rs6577641 is associated with CMH in multiple cohorts and is a cis-eQTL for SATB1, together with our additional observation that SATB1 expression increases during epithelial differentiation provide suggestive evidence that SATB1 is a gene that affects CMH.


Introduction
The secretion of mucus is a natural part of the airway defense against inhaled noxious particles and substances. Chronic mucus hypersecretion (CMH) is a condition of overproduction of mucus and defined as the presence of sputum production during at least three months in two consecutive years without any explaining origin whereas airway obstruction is not a prerequisite [1].
Smoking is a risk factor for CMH, i.e. the prevalence of CMH in the general population is reported to be 7.4% in current smokers, 3.7% in ex-smokers and 2.4% in never smokers [2]. CMH is the key presenting symptom in chronic bronchitis, one of the three main sub-groups of chronic obstructive pulmonary disease (COPD), a complex disease characterized by the presence of incompletely reversible and generally progressive airflow limitation [3]. Moreover, CMH is a risk factor for the development of COPD [4,5].
Worldwide, COPD affected 65 million people in 2004 and more than 3 million people died of COPD in 2005, representing 5% of all deaths. It is predicted that COPD will be the third leading cause of death worldwide in 2030 [6]. COPD markedly reduces quality of life and is responsible for high healthcare costs. For instance, the combined (direct and indirect) yearly costs of COPD and asthma in the United States of America were projected at $68 billion in 2008 [7]. CMH is not only associated with COPD but also with an increased duration and frequency of respiratory infections, excess decline in forced expiratory volume in 1 second (FEV 1 ) and increased hospitalization and mortality rates in the general population [4,5,8,9].
It is not known why only a minority of all smokers develops CMH, yet a plausible explanation is the presence of a genetic predisposition for CMH, as evidenced by familial aggregation of mucus overproduction and higher prevalence of CMH in monozygotic than in dizygotic twins [10][11][12]. Little is known about the identity of the genes that predispose to CMH. One publication suggested that CTLA4 is associated with chronic bronchitis in COPD [13].
The aim of our study was to identify genetic factors for CMH, thereby obtaining a better insight into the origins of this disorder.

Ethics Statement
The Dutch ministry of health and the Medical Ethics Committee of the hospital approved the study protocol for all Dutch centers. Ethics approval and written informed consent was obtained from all participants in all studies participating. For detailed information, see Supplement S1.

Subjects and genotyping
We performed GWA studies in participants of the NELSONstudy (n = 3,729), a male population-based lung cancer screening study investigating heavy smokers ($20 pack-years) [24].
Replication of SNPs with p#10 24 was attempted in six cohorts participating in 'COPD Pathology: Addressing Critical gaps, Early Treatment & diagnosis and Innovative Concepts' (COPACETIC) and in five non-COPACETIC cohorts. Caucasian subjects with $20 pack-years smoking with genotype-, spirometric-and demographic data were included.
An overview of the CMH definitions used in this study is presented in Table 1. A brief description of the included cohorts and details according to the period of data collection, type of population, genotyping platforms and genetic imputation software are presented in in Table 2.

Strategy
We searched for SNPs associated with CMH by using a twostage strategy followed by a replication stage and meta-analysis ( Figure 1).

Statistical analysis
General characteristics of CMH-cases and controls were compared using Student's t-and Mann-Whitney-U tests for continuous variables and x2 tests for dichotomous variables with SPSS 20.0. Sample and SNP quality control (QC), regression-and meta-analysis were performed with PLINK 1.07 [25]. QC criteria are described in Supplement S1.
Logistic regression analysis under an additive model was used to identify SNPs associated with CMH. SNPs with a p-value, 10 24 were included for replication. When two SNPs were in strong linkage disequilibrium (r2$0.8), the SNP with the lowest p-value was further analyzed.
SNPs in COPACETIC cohorts and in LifeLines were analyzed using logistic regression with adjustment for sex and smoking (ex-/ current smoking). In LifeLines, imputed SNPs with an info-score ,0.3 (imputation quality score) were removed. SNPs in non-COPACETIC cohorts were analyzed by the cohort investigators using the same model.
Meta-analysis was performed on SNPs across NELSON and the 11 replication cohorts. The Cochran's Q test was used to test for heterogeneity in the meta-analysis.
We performed multivariate logistic regression analysis, adjusted for pack-years and lung function, to associate CMH with the risk allele of rs6577641 in the identification cohort.
Functional relevance of SATB1 and rs6577641, our highest ranked-SNP We performed 4 functional studies with the identified top-SNP. Details on their methods are given in Supplement S1.
We assessed: 1) whether rs6577641 is an eQTL, by analyzing the association of SATB1 expression levels with rs6577641 genotypes in lung tissue from three independent cohorts recruited from Laval University, University of British Columbia, and University of Groningen as described previously [14]; 2) CMH-associated mRNA expression in airway wall biopsies from 77 COPD participants in the GLUCOLD-study [15]; 3) the association of homozygous genotypes for rs6577641 with a) immunohistochemical staining (IHC) for SATB1 and b) the fraction of mucus positivity on bronchial tissue explanted from COPD or lung cancer subjects that underwent lung surgery; 4) SATB1 expression levels during mucociliary differentiation of primary bronchial epithelial cells cultured at air-liquid interface [26].

Populations
Characteristics of the identification and replication populations are presented in Table 3. Subjects with CMH were more often current smokers and had worse lung function, except for populations including subjects with COPD only.

Identification analysis
After QC, 492,700 SNPs and 2,512 individuals (717 CMH cases, 1,795 controls) from the NELSON study remained. Logistic regression analysis was performed including these individuals Figure 1. Study design. We performed GWA studies in the NELSON cohort and in additional healthy controls. CMH was analyzed using logistic regression with adjustment for center (Groningen and Utrecht). Since current smoking can affect the presence of CMH, we additionally performed the GWAS in the NELSON cohort correcting for center and smoking. SNPs with a p-value,10-4 present in both GWA studies were selected for replication. To test for generalizability of associations with CMH in other populations, we compared our results with data in CMH-cases and controls with a smoking history of $20 pack-years with eleven replication populations using logistic regression with adjustment for sex and current smoking. Finally, we performed a meta-analysis on shared SNPs across the NELSON identification population and the 11 replication populations. doi:10.1371/journal.pone.0091621.g001 supplemented with 590 additional healthy controls, adjusting for center. The QQ-plot provided no evidence of population stratification (l = 1.0185). 77 SNPs were associated with CMH with a p-value,10 24 . CMH was associated with current smoking in our identification cohort (p,0.001). Therefore, we performed a second GWA adjusting for center and current/ex-smoking (717 CMH-cases, 1,795 controls). The QQ-plot showed no evidence of population stratification (l = 1.0056). We observed 64 SNPs with a p-value,10 24 . Genome wide association for CMH ordered by chromosome is shown in the Manhattan plot. Figure 2 shows QQplots (A, C) and genome wide association signals for CMH ordered by chromosome (Manhattan-plots, B and D) of these sequential analyses. We identified 36 SNPs associated with CMH with a pvalue,10 24 in both analyses Table 4. Of these, 32 SNPs were included for replication and 4 SNPs were removed because they were in strong linkage disequilibrium (r 2 .0.8) with another associated SNP.

Replication of associated SNPs
Genotyping of SNP rs4775569 failed in the COPACETIC populations, and was removed for further analysis. CMHassociated top-SNPs for each cohort are presented in Table 5, with a complete overview in Table 6. When applying Bonferroni correction in the meta-analysis (p = 1.61610 23 for 31 SNPs), we found a strong association with one SNP: N rs6577641, a SNP located on chromosome 3 in intron 9 of the special AT-rich sequence-binding protein 1 locus (SATB1) (combined p-value = 4.25610 26 , OR = 1.17; 1.10-1.26).
The SATB1 SNP rs6577641 had the lowest p-value for association with CMH in the meta-analysis. Figure 3 shows the forest plot of rs6577641 in the identification and replication cohorts and meta-analysis.
We assessed the percentage of subjects with CMH in each genotyping group for rs6577641 in NELSON-total and stratified for current and ex smokers ( Figure 4). Multivariate logistic regression analysis, corrected for pack-years and FEV 1 %predicted, showed that CMH was significantly associated with the number of G-alleles in the 1,385 current smokers (reference = AA: heterozygous mutant (AG) p = 0.001; OR = 1.50, homozygous mutant (GG) p = 0.001; OR = 1.80) but not in 1,127 ex-smokers (reference = AA: heterozygous mutant (AG) p = 0.380; OR = 1.18, homozygous mutant (GG) p = 0.143; OR = 1.42). Functional relevance of SATB1 and rs6577641 1) Transcriptional regulation of SATB1 mRNA expression We analyzed the association of SATB1 expression levels in lung tissue with rs6577641 genotype in 3 independent data sets of the Universities of Groningen, Laval and UBC [14]. A cis-acting effect of rs6577641 on SATB1 expression was identified and present in all three datasets (n = 1,095), with the same direction of effect across all three SATB1 probes on the array. The (susceptibility) G allele increased expression, the (protective) A allele reduced expression (p = 4.3610 29 ) in the meta-analysis across the three datasets and across all three SATB1 probes measured (Table 7).
2) SATB1 mRNA expression and CMH We compared SATB1 expression in baseline airway wall biopsies of COPD patients with (n = 38) and without (n = 39) CMH in GLUCOLD [15]. CMH was significantly associated with SATB1 expression levels (corrected for ex-/current smoking; p = 0.0045; Figure 5). After stratification, the same direction of effect was present in ex-and current smokers. However, this association reached statistical significance in current smokers (p = 0.021) and not in ex-smokers (p = 0.132), probably due to a difference in power as 46 subjects were current smokers versus 33 ex-smokers.
3) Genotype related protein expression and mucus positivity in bronchial epithelium SATB1 protein expression has previously been observed in IHS analysis of bronchial epithelial cells [16]. Therefore, we stained SATB1 on paraffin embedded lung tissue biopsies of individuals from the Groningen population contributing to the eQTL analysis. We observed clear nuclear staining for SATB1 in bronchial epithelial cells. No significant difference for % of strong positive, positive and weak positive cells was observed between the protective (AA, n = 9) and risk (GG, n = 14) rs6577641 genotypes (11.8%65.8 versus 12.7%66.9, p = 0.74).
We determined whether the fraction of mucus positive bronchial epithelium was different in subjects with different homozygous rs6577641 genotypes and performed PAS-staining on tissue biopsies from the same cohort. We observed no significant difference between individuals with the homozygous protective (AA, n = 10) and risk (GG, n = 7) alleles (19.7%611.9 versus 14.3%69.6, p = 0.34). 4) SATB1 expression levels during bronchial epithelial cell mucociliary differentiation We investigated whether SATB1 expression was induced during mucociliary differentiation of primary human bronchial epithelial

Discussion
Since not every ex-or current heavy smoker suffers from chronic mucus hypersecretion (CMH), we aimed to identify genetic variants conferring susceptibility to CMH. Therefore, we performed the first GWA study on CMH, the key presenting symptom in chronic bronchitis. CMH was associated with 36 SNPs at the p,10 24 significance level in the identification cohort. In the meta-analysis combining our identification and replication cohorts, strong association was observed with rs6577641, a SNP located on chromosome 3 in intron 9 of SATB1. Although the association of rs6577641 with CMH did not reach conventional genome-wide significance, its effect was in the same direction and was significant (4.25610 26 ) at nominal levels (1.61610 23 ) across eleven study populations, showing the robustness of this finding. The detected odds ratio for this SNP suggests an additional risk of Table 4. SNPs associated with CMH with a p-value,10 24 , present in GWAS-I and in GWAS-II, in the NELSON identification cohort.  Table 5. Meta-analysis of top SNPs associated with CMH in replication cohorts, in identification and replication cohorts and corresponding direction of effect in all cohorts and associated feature and gene(s).  Table 2; OR is odds ratio; Q is p-value for heterogeneity;

Meta-analysis across replication cohorts
# means corresponding SNP is located in an intron in this gene. doi:10.1371/journal.pone.0091621.t005 Table 6. Meta-analysis of top SNPs associated with CMH across replication cohorts and across identification and replication cohorts, corrected for smoking and sex.  17% per G allele to develop CMH in a population of ex-and current heavy smokers. Multivariate regression analysis, stratified for current an exsmoking, showed essentially the same effect sizes and direction of the association of CMH and the risk allele of rs6577641. It is likely that lack of power is the reason for not reaching the level of significance in ex-smokers.

Meta-analysis across replication cohorts
These data strongly suggest that SATB1 plays a role in the susceptibility to CMH in subjects with a history of heavy smoking ($20 pack-years) within the general population. Moreover, rs6577641 has a cis-eQTL effect on SATB1 lung tissue expression, the risk allele at rs6577641 (G) increasing and the A-allele reducing expression of SATB1 significantly. Additionally, we found a higher SATB1 expression in bronchial biopsies of COPDpatients with CMH. We found no differences between the GG and AA genotypes for protein expression of SATB1 in airway epithelium by IHC in a small sample from our lung tissue registry. Finally, we demonstrate that SATB1 mRNA expression is induced during mucociliary differentiation in ALI cultures of human bronchial epithelial cells of 2 donors supporting our eQTL findings. Interestingly, expression of the mucin gene MUC5AC was also induced during this culture period, with a slightly delayed kinetics compared to SATB1. Together these data strongly suggest that SATB1 is induced during differentiation of bronchial epithelial cells and affects chronic mucus hypersecretion.
The forest plot clearly shows that the effect of SNP rs6577641 is lower in cohorts including COPD patients only (GLUCOLD, Rucphen, COPDGene, ECLIPSE and Norway) than in the other cohorts. Additional meta-analysis of COPD-cohorts and general population based cohorts separately confirmed this (COPD cohorts, combined p-value = 0.236, OR = 1.07 and general population based cohorts, combined p-value = 5.18610 27 , OR = 1.26). This suggests genetic heterogeneity of CMH in subjects with and without COPD.
The SNP most significantly associated with CMH, rs6577641, is located in an intron of SATB1. SATB1 is a transcription factor and chromatin (re)organizer important for controlling the expression of many genes in a tissue or cell-type specific fashion, for instance in differentiating thymus T-cells [17] or differentiating skin keratinocytes [18]. Expression of SATB1 has been observed in normal human bronchial epithelial cells by immunohistochemistry and lower levels were observed in non-small lung cancer cells [16]. In our study, we also showed the presence of SATB1 in bronchial epithelial cells by IHC staining of lung tissue. However, no significant differences were found between patients homozygous for the protective and risk alleles, for either specific SATB1 staining or for PAS staining, the latter specifically detecting mucus. This inability to detect a genotype effect on protein staining may  be due to lack of power, as we found a large variation in SATB1 and PAS protein expression in the relatively small number of lung tissue samples. Other explanations include possible expression regulation of SATB1 by smoke exposure which could be a dynamic process not readily detected at the protein level by any single-time point analysis such as IHC staining on lung biopsies. Alternatively SATB1 expression levels may vary throughout the lungs or the technique used here is not sensitive enough to detect relatively small differences in protein levels. To further explore the association of SATB1 protein and its underlying regulation, it would be of interest to perform longitudinal investigations on lung tissue samples of subjects with and without CMH, or time series of in vitro cultured epithelial cells from donors with a specific genotype and cigarette smoke exposure. This would also allow further studies on epigenetic regulation with methylation, microRNA or histone modifications.
The lack of association between the SATB1 protein and rs6577641 might additionally be due to the location of mucus positive cells in lung tissue. Mucus is produced both by goblet cells and submucosal glands, which we did not investigate further. Normal mucus consists of 97% water and 3% solids including 30% mucins. In case of dysregulation of mucus production, the concentration of solids in mucus may increase up to 15%. A further step therefore could involve investigating mucins/proteins present in mucus, e.g. MUC5AC is predominantly produced by goblet cells in proximal airways and MUC5B by secretory cells throughout the airways and by submucosal glands.
How does SATB1 expression contribute to CMH? SATB1 is known to be a genome organizer, a tissue specific chromatin remodeling protein with a property to modifying chromatin architecture by formation of loops, allowing contact of condensed genomic DNA to regulatory transcription proteins [19]. Thus SATB1 can control gene expression of a series of target genes located within a single locus at a specific chromosomal location [20]. This has for instance been elegantly shown in case of differentiating keratinocytes [18], where Satb1 expression regulates genes located in the keratinocyte-specific loci, leading to adaptation of a specific cell fate of the differentiating keratinocytes. Similarly, a mechanism by which SATB1 could contribute to CMH is the induction of a gene expression program during differentiation of bronchial epithelial cells, leading to adaptation of a cell fate specific for mucus producing cells in the submucosal glands or a goblet cell phenotype in the bronchial epithelium. Involvement of Satb1 in pneumocyte differentiation was previously observed by Baguma et al. in mice [21]. We observed induction of SATB1 expression in bronchial epithelial cells differentiating under ALI culture conditions. Further research will need to test whether a specific gene expression profile is induced by SATB1 expression in differentiating bronchial epithelial cells. SATB1 is also highly expressed in thymocytes, but absent in mature non-activated T cells [22]. Moreover, Satb1 has been shown in mice to be essential for expression of T helper 2 (Th2) cells important in the regulation of genes encoding interleukin 4, 5 and 13 [19]. In Satb1-deficient mice, development of thymocytes stopped after the CD4 + /CD8 + stage with deregulation of many genes [23]. Conversely, in case of excessive SATB1-production an excess of Th2 cells may be formed which all produce IL-13, which may contribute to increased mucus production. Therefore, a putative role of SATB1 in T-cells for the CMH phenotype should not be disregarded.
Strength of our study is the fact that we were able to replicate our findings in different populations, ranging from cohorts consisting of individuals with severe airflow limitation to cohorts mainly consisting of healthy smokers. There are some limitations, e.g. the presence of CMH was not based on actual measurements of the amount of sputum produced but based on questionnaires that were not completely similar in all study cohorts. Underreporting of CMH occurs since those experiencing CMH become accustomed to these symptoms, believing they are smoking related or because they are embarrassed to admit to cough and sputum. Table 7. Meta-analysis of the effect of rs6577641 on mRNA expression levels of SATB1 in the lung*.

Probe Gene Symbol
Affymetrix Probe ID Z-score Groningen Z-score Laval Z-score UBC Z-Score Meta-Analysis *To assess the effect of the SNP rs6577641 on gene expression, a Kruskal-Wallis test was performed. This test generates a p-value, but does not give a direction of the effect. To assess the direction of the effect, a Spearman's correlation test was performed. Next, a Z-score was calculated for each center and a meta-analysis performed for each of the three SATB1 probes across all centers. Finally, a meta-analysis for all three SATB1 probes was performed across all centers. This generated a Z-score of 25.87 and a corresponding p-value of 4.3*10 29 , indicating that the susceptibility G allele of the SNP rs6577641 increases SATB1 expression. doi:10.1371/journal.pone.0091621.t007 We demonstrated that SATB1 mRNA expression is induced during mucociliary differentiation in ALI cultures of HBE cells in a small dataset (n = 2). However, these data seem reliable as they are supported by eQTL data from lung tissue. Despite this drawback, we consistently found evidence for association of SATB1 with CMH in the populations studied, showing the robustness of our finding. Moreover, we corroborated this finding by functional studies in lung tissue, airway wall biopsies of COPD patients and epithelial cultures. More extensive research is needed to investigate which factors induce SATB1 expression in airway epithelium. In summary, we performed identification analyses and metaanalyses using data from almost 7,000 participants to identify genes involved in susceptibility for CMH. It is remarkable that we found a genetic association for CMH given this phenotype is partly subjectively determined and not well delineated. Moreover, despite cohort differences to define CMH and severity of airflow limitation, we found consistent effects of SNP rs6577641 on CMH. This confirms that the CMH phenotype, despite the fact that it is self-reported, is a robust phenotype irrespective of the presence or absence of airflow limitation. The association of rs6577641 on chromosome 3 at the SATB1 locus with CMH was supported by functional studies including gene expression findings, demonstrating SATB1 to be associated with CMH.
Chronic mucus hypersecretion is a bothersome symptom for many people, it increases in prevalence with aging and affects quality of life, exacerbations of symptoms due to respiratory infections and ultimately increases mortality. The involvement of SATB1 in CMH offers opportunities to better understand the process leading to CMH, and future development of tailored medicines.