Different Genes Interact with Particulate Matter and Tobacco Smoke Exposure in Affecting Lung Function Decline in the General Population

Background Oxidative stress related genes modify the effects of ambient air pollution or tobacco smoking on lung function decline. The impact of interactions might be substantial, but previous studies mostly focused on main effects of single genes. Objectives We studied the interaction of both exposures with a broad set of oxidative-stress related candidate genes and pathways on lung function decline and contrasted interactions between exposures. Methods For 12679 single nucleotide polymorphisms (SNPs), change in forced expiratory volume in one second (FEV1), FEV1 over forced vital capacity (FEV1/FVC), and mean forced expiratory flow between 25 and 75% of the FVC (FEF25-75) was regressed on interval exposure to particulate matter <10 µm in diameter (PM10) or packyears smoked (a), additive SNP effects (b), and interaction terms between (a) and (b) in 669 adults with GWAS data. Interaction p-values for 152 genes and 14 pathways were calculated by the adaptive rank truncation product (ARTP) method, and compared between exposures. Interaction effect sizes were contrasted for the strongest SNPs of nominally significant genes (pinteraction<0.05). Replication was attempted for SNPs with MAF>10% in 3320 SAPALDIA participants without GWAS. Results On the SNP-level, rs2035268 in gene SNCA accelerated FEV1/FVC decline by 3.8% (pinteraction = 2.5×10−6), and rs12190800 in PARK2 attenuated FEV1 decline by 95.1 ml pinteraction = 9.7×10−8) over 11 years, while interacting with PM10. Genes and pathways nominally interacting with PM10 and packyears exposure differed substantially. Gene CRISP2 presented a significant interaction with PM10 (pinteraction = 3.0×10−4) on FEV1/FVC decline. Pathway interactions were weak. Replications for the strongest SNPs in PARK2 and CRISP2 were not successful. Conclusions Consistent with a stratified response to increasing oxidative stress, different genes and pathways potentially mediate PM10 and tobacco smoke effects on lung function decline. Ignoring environmental exposures would miss these patterns, but achieving sufficient sample size and comparability across study samples is challenging.


Introduction
Lung function is an important determinant of respiratory health and life expectancy [1,2,3,4]. Its longitudinal course is affected by different environmental exposures such as active tobacco smoking, environmental tobacco smoke exposure [5], possibly workplace exposures to dusts and fumes [6,7,8,9] as well as ambient air pollution [10]. Both air pollution and tobacco smoke are known to contain free radicals and to induce their direct formation at the tissue level causing damage of cell walls, proteins and DNA, and chronic tissue inflammation and remodeling in the long run [11,12]. Upon exposure, different protein systems including those scavenging reactive oxygen species (ROS) are up-regulated, and the level of response is influenced by variation in underlying genes. Likewise, polymorphisms in oxidative stress related candidate genes like gluthathione s-transferases (GSTs), microsomal epoxide hydroxylase (EPHX), or heme-oxygenase 1 (HMOX-1), have been associated with lung function decline and chronic obstructive pulmonary disease (COPD), a disease characterized by accelerated, progressive lung function loss [13,14,15,16]. But most of these candidate genes have not been consistently replicated across studies and populations according to a recent review [15]. Similarly, genomewide association studies (GWAS) of lung function partially struggled with replication [17,18]. Further, in GWAS on lung function level or COPD prevalence [17,18,19,20,21,22] association signals in known oxidative-stress genes were not strong [23].
Reasons for non-replication could be genetic heterogeneity across populations, or also sub-phenotypes of disease [24]. However, it is also possible that differences in environmental factors, and hence presence of gene-environment interaction play a role. To the best of our knowledge, only one published genomewide interaction study examining the effect of farming exposure on childhood asthma has taken into account gene-environment interaction in respiratory disease to date [25]. This gap in the scientific literature is probably due to increased sample size requirements when assessing gene-environment interactions with classical analysis methods [26,27]. However, their importance in respiratory disease has previously been shown in candidate gene studies focusing on single genes and SNPs therein [28,29,30], as well as follow-up studies of GWAS [31,32].
Analysis methods such as pathway-or gene-set analyses [33] can at least partly overcome sample size restrictions by reducing the dimensionality of the data, and thus offer a promising alternative study approach. Based on biological knowledge of genes and their organization into molecular pathways, the longitudinal course of lung function might be better explained by accumulating interaction signals between environmental exposures and multiple SNPs of the same gene, or different genes involved in the same canonical pathway contributing to a functional entity in the organism.
We thus aimed to investigate to which extent oxidative-stress related genes and pathways interact significantly with interval exposure to ambient particulate matter of mean diameter ,10 mm (PM10) or active tobacco smoking on natural lung function decline using genome-wide data from non-asthmatic adults of the Swiss Study on Air Pollution and Lung and Heart Diseases in Adults (SAPALDIA). SNP-level interaction signals were integrated onto upper biological levels to identify significantly interacting genes and pathways. The impact of PM10 exposure on lung function decline was contrasted to tobacco smoking by comparing patterns of associations at the gene-and pathway level, as well as interaction effect sizes for the strongest interacting SNP within genes.

Ethics Statement
All participants gave written informed consent. The study was approved by the Overall Regional Ethics Commission for Clinical Medicine (Swiss Academy of Medical Sciences, Basel, Switzerland) and the responsible cantonal ethics committees of each study centre (Ethics Commissions of the cantons Aarau, Basel, Geneva, Grisons, Ticino, Valais, Vaud, and Zürich).

Study Population
SAPALDIA is a population-based cohort study established in 1991 to assess the effects of long-term exposure to ambient air pollution on respiratory health, with a first follow-up examination in 2002. Participants were residents from 8 communities throughout Switzerland aged 18-60 years at baseline. Details of the study design and methodology were published elsewhere [10,34,35].
The current work is based on up to 669 non-asthmatic participants with genome-wide data fulfilling quality control criteria and complete data on sex, age, height, PM10-and smoking exposure (see Figure S1). Participants without genomewide data served as replication sample.

Spirometric Measurements
Spirometry was performed without bronchodilation. Identical spirometry protocols and devices (Sensormedics model 2200, Yorba Linda, USA) were used in 1991 and 2002 [36,37]. Participants were in an upright sitting position and performed three to eight forced expiratory lung function maneuvers according to American Thoracic Society quality criteria [38]. At least two acceptable measurements of forced vital capacity (FVC) and forced expiratory volume in the first second (FEV 1 ) were obtained. Forced expiratory flow between 25 and 75% of the FVC (FEF  ) was recorded.
In the present study we studied the decline of FEV 1 , the ratio FEV 1 /FVC and FEF  between 1991 and 2002, as measures of airway obstruction, calculated by subtracting the first measurement from the second (measurement at SAPALDIA2-measurement at SAPALDIA1).

Health Questionnaire Data
Smoking information was assessed by questionnaire. At each examination, never smokers were defined as having smoked less than 20 packs of cigarettes or 360 g of tobacco in their life, exsmokers as having quit smoking at least 30 days before the interview, and current smokers as those who reported active smoking [39]. Packyears smoked between baseline and follow-up examination were used for comparison with interval PM10 exposure, and were calculated by dividing the number of cigarettes per day by 20 (giving number of cigarette packs) and multiplying the result with years of exposure.

Air pollution Exposure
Similarly to calculating packyears, interval PM10 exposure was defined by summing individual average home outdoor exposure to PM10 over each year of follow-up, giving estimates in (mg/m 3 ) * years. Annual average exposures were calculated by using exposure estimates from Gaussian Dispersion models on a 200m6200m grid throughout Switzerland for years 1990 and 2000, and interpolating historical trends from fixed air pollution monitoring stations. Participants were assigned individual annual exposure estimates via their geo-referenced residence addresses, taking account of residence changes during follow-up. Details on exposure modeling are given elsewhere [40].

SNP Genotyping and Imputation
Blood for DNA-analysis was drawn in 2002 in participants giving consent to genetic analyses [34].
Genome-wide genotyping was done on the Illumina Human 610quad BeadChip in the framework of the EU-funded GABRI-EL study [41], a large consortium aiming to uncover genetic and environmental causes of asthma. The current work focused on the non-asthmatic portion of participants. 5679589 successfully genotyped autosomal SNPs were imputed to 2.5 Mio using MACH v 1.0 software [42] and the HapMap v22 CEPH reference panel of Utah residents with ancestry from northern and western Europe [43].
Strict quality control (QC) was applied by excluding samples with ,97% genotyping success rate, non-European origin, cryptic relatedness or sex-inconsistencies, as well as SNPs with Hardy-Weinberg equilibrium p-value,10 24 , call rate ,97%, minor allele frequency (MAF) ,5% or low imputation quality (Rsq,0.5). A total of 291689681 SNPs withstood QC, and genome-wide data was finally available in 669 non-asthmatic individuals with environmental exposure data.
Replication genotyping was attempted for two interacting SNPs (rs360563 in gene CRISP2, and rs12190800 in PARK2) with MAF.10%. Genotyping was done using the iPLEX Gold MassARRAY (SEQUENOM, San Diego, USA) on the whole SAPALDIA study population including the analysis sample, as the costs for manual sample selection outweigh those of additional genotyping. The replication sample consisted of 3320 successfully genotyped participants with complete data for covariates and all three lung function parameters (see Figure S1).

Definition of Oxidative-stress Genes and Pathways
Oxidative stress related genes were defined as either coding proteins that directly scavenge or endogenously produce ROS, their immediate regulators, or key genes in cascades triggered by oxidative stress. They were identified by searching the Gene Ontology database [44] with the term ''response to oxidative stress'' and GeneCards with ''oxidative stress'' in the pathway field of the advanced search option (http://www.genecards.org/index. php?path = /Search/Advanced/, accessed November 2010). Resulting gene lists were further enriched by literature reviews [45,46,47,48,49]. By feeding the gene lists into Ingenuity Pathway Analysis (IngenuityH Systems, www.ingenuity.com), 14 molecular pathways related to oxidative stress and environmental exposures of interest were identified ( Table 1).
Gene regions were defined by retrieving transcription start and end positions in the 'gene track' of the UCSC browser (http:// genome.ucsc.edu/) [50], genome build 18 (March 2006), and adding 20 kilo-bases to each end. Referring to dbSNP version 126, available SNP data was matched to gene regions. Data was available for 152 autosomal genes (Table 1), of which 46 mapped once to a pathway, 33 twice, and 37 three times or more. Thirtysix genes did not map to one of the 14 pathways, but were related to oxidative stress based on their function. Details on gene size, SNP-coverage and pathway mapping are given in Table S1 (see Table 1. Mapping of candidate oxidative-stress genes to molecular pathways of interest.

PATHWAY GENES
Arachidonic Acid Metabolism Aryl Hydrocarbon Receptor Signaling fMLP Signaling in Neutrophils  Table S1). Gene specific allele dosage files in MACH format were used for analysis.

Statistical Analysis
Characterization of study population. The distribution of sex, age, baseline lung function parameters, their change during follow-up as well as packyears exposure during follow-up was tabulated according to categories of smoking status (never, former and current smokers) and interval PM10 exposure (high versus low exposure, defined by the median value) ( Table 2). To assess a potential impact of loss to follow-up on our results, our study population consisting of up to 669 non-asthmatic adults with high quality genome-wide data and complete information on model covariates was compared to non-asthmatic participants examined at follow-up without genome-wide data (n = 3833), and to those completing only baseline examination (n = 1299) by means of descriptive tables and tests for independent samples (see Table S2).

Gene-and pathway-environment interaction
analysis. The interaction of genetic variation and exposure to PM10 or tobacco smoke on lung function decline was assessed in different stages. First, SNP level analyses on decline in FEV1, FEV1/FVC and FEF25-75 were done for each gene separately using multiple linear regression in ProbABEL v0.1.3 (http://www.genabel.org) [51] with robust sandwich-estimation of standard errors. Models specified an additive SNP-effect, main effects for packyears smoked and interval PM10 exposure between surveys, and an interaction term between the SNP-variable and either exposure. They adjusted for sex, age and height at follow-up, packyears smoked up to baseline, principal components of population We used a slightly modified version of the Adaptive Rank Truncation Product (ARTP) method described by Yu and colleagues [52] to calculate gene-and pathway level p-values. Briefly, according to the method, SNPs are sorted in ascending order of interaction strength, and SNP-interaction p-values are multiplied up to several pre-specified truncation points which depend on the number of SNPs in the gene. The statistical significance of these products is derived using the empirical distribution of products observed in the original and permutated datasets. For each gene, the strongest product p-value across all truncation points is readjusted using again its empirical distribution, to result in the gene-level p-value. Using the gene-level pvalues in observed and permutated datasets, the same procedure can be applied to calculate pathway-level p-values. Details on the ARTP method, the applied modifications and truncation point definitions are presented in Figure S2 and Methods S1.
SNP-level analyses were run 10000 times, always after having newly permuted gene-specific SNP-allele-dosages across participants. SNP-level interaction p-values of the observed and permutated datasets were used for calculating gene-and pathway-level p-values. According to Yu et al. [52], results from simulation studies suggest the ARTP-method yields type I error rates close to 5%. We thus additionally corrected for 152 tests at the gene and 14 tests at the pathway level in a first look. In a second line of investigation, a non-stringent nominal threshold of a = 0.05 was chosen for further exploring gene-and pathway-level interaction signals due to our restricted sample size.
Comparing the impact of PM10 versus tobacco smoking. Emerging patterns of interaction were compared between exposures at the pathway-and gene-level. In pathways with nominally significant interactions, gene-level p-values were plotted against each other to identify the relative contributions to the pathway signal.
For the SNP with the strongest interaction signal in each nominally significant gene regression analyses were repeated with exposure centered to the median. Effect estimates were scaled to represent an exposure contrast of one interquartile range (IQR), and interaction effect sizes were compared between PM10 and tobacco smoke exposure. For SNP rs2035268 in gene SNCA, which was one of the top interaction signals in FEV1/FVC decline, genotype specific estimates for PM10 and packyears exposure were calculated to exemplify the effect modification by genotype. To this purpose, imputed allele dosages were coded as genotypes as follows: dosage ,0.5 genotype TT, 0.5# dosage ,1.5 genotype GT, and dosage $1.5 genotype GG. Reparametrization of exposure variables into genotype specific ones was employed to avoid model-overspecification and instable estimation in small genotype strata (rs2035268: MAF 5%).
Statistical power. Power calculations were done using QUANTO software [53] version 1.2 specifying a gene-environment study on independent individuals. Details of the power calculation are given in Methods S1. The most important aspect of the calculation was that a two-sided significance threshold of 5% was used (i.e. no multiple testing correction was included), since all 12679 SNP-estimates were further processed for deriving geneand pathway level p-values without filtering by association strength. In our first analysis with 650 subjects, we have at least 75% power to detect a SNP*environment interaction that accounts for 1% of the total variance and that power increases to 99% when the SNP*environment interaction accounts for 5% of the total variance. In the replication analysis with n = 3320, estimated power is 99% in both cases. Statistical power is expected to be higher for the gene and pathway level analysis, but that increase in power could not be quantified since p-values for interaction at the gene (or pathway) level are obtained from individual p-values for interactions with SNPs belonging to the gene (or pathway), and the effect of interaction may vary among SNPs.

Characteristics of Study Population
Regarding the distribution of sex, age and lung function according to categories of smoking and PM10 exposure, our study sample on average presented decreasing lung function values and accelerated lung function decline with increased smoking ( Table 2). The percentage of females decreased with smoking exposure. Compared to participants assessed only at baseline, our study sample had slightly better lung function values, substantially less current smokers, was slightly less exposed to PM10 and tobacco smoke, and was older and leaner (see Table S2).

Gene-level Analysis
In the gene-level analysis, nominally interacting genes differed between PM10 and packyears exposure across the parameters of lung function decline ( Table 3). Genes interacting with PM10 exposure partially overlapped for FEV 1 /FVC and FEF  decline (genes CRISP2, ERCC1, LPO, MPO, and SNCA). After correcting for performing 152 gene-level tests (a Bonferroni = 0.05/ 152 = 3.29*10 24 ), the interaction between gene cysteine-rich secretory protein 2 (CRISP2) located on chromosome 6p12.3 and interval PM10 exposure on FEV 1 /FVC decline remained significant (p interaction = 3.0610 24 ). A marginally significant interaction was seen for gene SNCA on chromosome 4q21 with the same outcome and exposure (p interaction = 4.0610 24 ). Interactions observed for packyears exposure did not withstand multiple testing corrections.
P-values of interaction for all tested genes are given in Table  S3.

Comparison of Interactions with PM10 Versus Packyears Exposure
The comparison of interaction effect sizes for PM10 and packyears exposure was based on regression estimates for the strongest interacting SNP only within each nominally significant gene. Table 4 presents estimates for FEV 1 /FVC decline, where significant and marginally significant gene-level interactions have been detected for genes CRISP2 and SNCA, respectively. Estimates for decline in FEV 1 and FEF  are presented in Table S4 and  Table S5.
The C-allele of SNP rs360563 in gene CRISP2 accelerated FEV 1 /FVC decline by 1.1% per IQR change in PM10 exposure over 11 years (Table 4). Similarly, the G-allele of SNP rs2035268 in SNCA was associated with an accelerated decline by 3,8% per allele and IQR change in exposure. Genotype specific exposure estimates were calculated for rs2035268. Within genotypes GT and GG of SNP rs2035268, a change in IQR of PM10 was associated with a signficant acceleration of FEV 1 /FVC decline by 3.9%, opposed to a small and non-signficiant acceleration by 0.2% in baseline genotype TT ( Table 5). In contrast, a change in IQR of packyears smoked was associated with a significant acceleration by 1.1% in the baseline TT genotype stratum, but not in the GT/ GG strata.
For FEV 1 -and FEF 25-75 decline, interaction effect sizes for the strongest interacting SNPs in nominally significant genes tended to be considerably larger with packyears compared to PM10 exposure. Further, packyears exposure frequently presented significant main effects besides the interaction with SNPs (Table  S4 and Table S5).
In models including only main effects but no interaction between SNPs and exposure, an IQR of 9.8 packyears was significantly associated with accelerated decline in FEV 1 /FVC by 1%, and in FEV 1 by 50 ml (data not shown). Respective estimates for PM10 were non-significant. SNP main effects remained non-significant and their beta estimates largely unaffected by the exclusion of interaction terms.

Replication of Significant Associations
Replication genotyping was done for CRISP2 SNP rs360563 (MAF of 49.8%) and rs12190800 in PARK2 (MAF 16%), but their interaction with PM10 exposure on FEV 1 /FVC and FEV 1 decline could not be confirmed in the remainder of the SAPALDIA population (p interaction = 0.63 and 0.50 respectively, n = 3320 for both). Thereby, MAFs in the replication sample corresponded to those in the discovery sample, and both SNPs were in Hardy-Weinberg equilibrium.

Discussion
To the best of our knowledge, this is the first study assessing gene-environment interactions on lung function decline using analysis methods that accumulate interaction effects along a broadly defined set of candidate genes and pathways. Our results suggest that different oxidative stress genes could be involved in mediating the adverse effects of ambient air pollution and tobacco smoke exposure on lung function decline.
We can currently only hypothesize about the reason for observing different patterns of interaction between the two environmental exposures. A possible explanation would be that ambient particulate matter pollution and tobacco smoke, although sharing many constituents, also differ in their composition, which possibly affects the overall and relative relevance of the different pathways. A probably more important explanation is that levels of oxidative stress imposed by ambient PM10 exposure are much lower than those induced by active tobacco smoking. Experimental studies have shown that different levels of oxidative stress trigger dose-dependent, specific activations of pathways on the cellular level in response to the oxidant burden [54]. Li and colleagues delineated a stratified oxidative stress model while studying the biological effects of particulate matter exposure on human and mouse cell lines exposed to solutions of Diesel exhaust particles (DEP) and concentrated ambient air particles (CAP) sampled in a highly polluted area [55,56]. According to their observations, at the lower end of exposure pivotal ROSscavenging enzymes like heme oxygenase-1 are induced, representing the activation of protective cell-mechanisms. Intermediate Table 3. Nominally significant gene-environment interactions by outcome and exposure.  , while high exposure levels impact on mitochondrial permeability, and result in cytotoxicity and apoptosis. Thereby CAP were mostly representing the lower to mid-level of exposure, inducing oxidative-stress enzymes and inflammation, but not apoptosis (as observed with DEP). In contrast, tobacco smoke exposure is known to induce the whole spectrum of cellular reactions, from oxidative stress response and inflammation [57,58] up to DNA-damage [58], apoptosis [59,60,61,62] as well as cellular necrosis [61]. Although in the light of limited sample size, we cannot provide statistical evidence of exposure-specific interaction patterns with genes and pathways in our current study, it is interesting to see that many of the top-ranking genes interacting with packyears exposure are involved in signal transduction or apoptosis (Table 3 BCL2, CASP6, MAP2K1,  NFkB1, TGFBR2, TP53). Only two such genes showed interaction signals with PM10 exposure (CHUK, RAC1), and many of the others related to scavenging or production of ROS (CRISP2, CYP1A2, EPX, GLRX, GLRX2, GPX5, LPO, MPO, PRDX3). These observations are consistent with the stratified oxidative stress model. The observation of larger interaction effect sizes at the level of SNPs for FEV 1 and FEF 25-75 decline, as well as the frequent presence of significant main effects further support higher oxidative stress levels induced by tobacco smoke than PM10 exposure.
Another important observation was that the effect of genetic variation related to oxidative stress appeared to be mediated predominantly by the interactions with environmental exposures, as hardly any SNP main-effects were observed. This is in line with the findings of genome-wide studies on lung function performed to date [17,18,21], where oxidative-stress related candidate genes did not produce strong signals. But their design was cross-sectional and importantly, these analyses focused on SNP-main effects. Exposure specific gene-effects might thus be missed as they can cancel out when averaged over the whole population (which happens in a gene main effect analysis). Disregarding geneenvironment interaction might also explain part of the missing heritability in complex disease genetics.
Our study had several limitations. First, the limited number of nonasthmatic adults with available genome-wide data restricted our power to detect associations at the gene and pathway levels. In this context, we faced the problem of finding studies with genome-wide genotyping and comparable data on both phenotypes and environmental exposures. This issue is particularly imminent regarding ambient air pollution exposure. As a consequence, small sample size did not allow us to identify further strong interaction signals to followup, while the observed ones could not be replicated in the remainder of the study population. Our gene and pathway level results are thus of more exploratory nature. Limited power is also known to inflate effect estimates when the strongest association signals are selected for further follow-up (so-called ''winner's curse'' [63]), thus our interaction effect estimates on the SNP-level are likely overestimated for both exposures. But the relative difference in effect size between exposures is probably less affected by this phenomenon. In case of differential overestimation, the true difference would likely be larger, as observed PM10 effects were smaller and therefore would be more affected than packyears effects. Further, follow-up participants were healthier than those completing only baseline examination. Our results are thus applicable to an adult general population sample of good health. Environmental exposure and genetic susceptibility might possibly have affected health and thus participation of our study dysfunction pathway interacting with PM10 and packyears exposure between surveys on FEV 1 /FVC decline. (B) Genes of the methane metabolism pathway interacting with PM10 and packyears exposure between surveys on FEF  decline. (C) Genes of the apoptosis signaling pathway interacting with PM10 and packyears exposure between surveys on FEV 1 decline. doi:10.1371/journal.pone.0040175.g001 Table 4. Effect estimates of the strongest interacting SNP from each nominally significant gene on FEV 1 /FVC decline (n = 650).

Exposure
Chrom Position Gene SNP type All1 All2 Freq All1 Beta interaction (SE), P Beta SNP (SE), P Beta exposure (SE), P subjects. But in this case, true effects would likely be underestimated in our present study [64]. Finally, SNP-coverage was low for certain genes (see Table S1), and the well-known gene-deletions in glutathione S-transferases are difficult to tag by SNP-genotyping as they represent copy number variations. This makes it difficult to interpret respective results. On the other side, a comparison of imputed SNP data for rs360563 (gene CRISP2) with genotypes measured during replication in the initial study sample showed a high concordance indicating high imputation quality (see Table S6). The absence of strong interactions on the pathway level is likely due to our primary focus on function while selecting candidate genes, which limited pathway coverage. Butgenes in a pathway mayalso differently interact with exposure, or compensate for each other. Further, regulatory genomic regions could be located farer away than the chosen flanking segments of 20 kilobases. Detecting interactions in pathways is thus more challenging. Strengths of our study were the population based design comprising non-asthmatic adults of a wide age-spectrum, the detailed data on individual tobacco smoke and particularly PM10 exposure, and the high quality of longitudinal lung function data. Finally, the application of analysis methods which exploit interaction signals below the significance threshold of a pure SNP-level analysis provided new insight into a possible differential involvement of genes according to exposure specific oxidative stress levels.

Conclusions
Applying a gene-and pathway-level analysis, we observed that PM10 and packyears exposure potentially interact with different genes on lung function decline, consistent with a stratified response to different oxidative stress levels. Our study thus points to the importance of considering interactions with environmental factors in the search formolecular pathways underlying lung function decline in response to exogenous inhalants. But it is also a good example of the challenges faced by gene-environment interaction studies today: While studies with partial genome-wide data, and hence often small sample size, can beneficially use the remainder of the study population as highly comparable replication sample, their potential to identify sufficient variants to follow-up is limited. In contrast, large studies or study consortia are more powerful in the discovery stage, but suffer from data heterogeneity as finding suitable replication studies with comparable phenotypic, genetic and environmental exposure data is difficult. This results in a challenging trade-off between sample size and data homogeneity. Figure S1 Follow-up of participants and selection of study population. (TIF) Figure S2 Scheme of analysis steps in the ARTPmethod. The ARTP method developed by Yu and colleagues [52] assumes that an analysis at the SNP-level has been performed on the originally observed data, followed by a reanalysis on permutated datasets, i.e. p-values of association for original and permutated datasets are available for each SNP. The ARTP procedure then entails the following 4 steps: 1. Order p-values from single SNP analysis in ascending order, 2. Calculate products of ranked p-values at different truncation points depending on gene length (light and dark green arrows in the graph), 3. Adjust product p-values using permutation distribution (1 st and 2 nd yellow arrow from the right), 4. Select the minimum of the adjusted products (red arrow) and readjust (1 st yellow arrow from the left). The readjusted product minimum represents the gene-level p-value. For each permutated dataset, an adjusted product minimum can be calculated as well. The procedure can then be repeated using the resulting, original and permutation gene-level p-values to yield p-values of the pathway. (TIF)

Supporting Information
Table S1 Characteristics of selected oxidative-stress related genes and mapping to candidate pathways. (XLS)

Table S2
Comparison of study sample to non-asthmatic participants lost to follow-up, and those followed-up w/o genome-wide data. P A-C p-value for comparisons of characteristics between baseline and analysis sample; P B-C p-value for comparisons of characteristics between follow-up and analysis sample; a Chi-squared tests for proportions, two sample T-tests for means, and ranksum-test for medians, b n = 650 with complete baseline and follow-up data, c in ever-smokers only. (XLS)   Methods S1 Details on the ARTP method specifications and power calculations.

(DOC)
Text S1 Overview of the SAPALDIA study team as of July 2011. (DOC) Data S1 Outcome and exposure specific regression results of all 12679 SNPs. Effect estimates are derived from multiple linear regression models specifying SNP main effects, interval PM10 and packyears exposure (centered to the median) and an interaction between SNP and either PM10 or packyears exposure. An additive genetic model was assumed. Adjustments were made for sex, age and height at follow-up, packyears smoked up to baseline, population ancestry, and study area. Beta-estimates are in units of milliliters for FEV 1 , percentages for FEV 1 /FVC, and milliliters per second for FEF  , Betas represent declines per effect allele and/or for an exposure contrast of one interquartile range (IQR) over 11 years. All estimates are taken from the same interaction model. Positive values mean that the respective decline is attenuated, opposed to acceleration with negative values. Rows are sorted according to chromosome and position. All1: allele 1 (effect allele), All2: allele 2 (baseline allele); FreqAll1: frequency of allele 1, MAF: minor allele frequency. n: number of observations in the model. Beta_int/se_int/p_int: beta estimate/standard error/p-value of the SNP*environment interaction term. (XLS)