Genome-Wide Association Studies of Asthma in Population-Based Cohorts Confirm Known and Suggested Loci and Identify an Additional Association near HLA

Rationale Asthma has substantial morbidity and mortality and a strong genetic component, but identification of genetic risk factors is limited by availability of suitable studies. Objectives To test if population-based cohorts with self-reported physician-diagnosed asthma and genome-wide association (GWA) data could be used to validate known associations with asthma and identify novel associations. Methods The APCAT (Analysis in Population-based Cohorts of Asthma Traits) consortium consists of 1,716 individuals with asthma and 16,888 healthy controls from six European-descent population-based cohorts. We examined associations in APCAT of thirteen variants previously reported as genome-wide significant (P<5x10−8) and three variants reported as suggestive (P<5×10−7). We also searched for novel associations in APCAT (Stage 1) and followed-up the most promising variants in 4,035 asthmatics and 11,251 healthy controls (Stage 2). Finally, we conducted the first genome-wide screen for interactions with smoking or hay fever. Main Results We observed association in the same direction for all thirteen previously reported variants and nominally replicated ten of them. One variant that was previously suggestive, rs11071559 in RORA, now reaches genome-wide significance when combined with our data (P = 2.4×10−9). We also identified two genome-wide significant associations: rs13408661 near IL1RL1/IL18R1 (P Stage1+Stage2 = 1.1x10−9), which is correlated with a variant recently shown to be associated with asthma (rs3771180), and rs9268516 in the HLA region (P Stage1+Stage2 = 1.1x10−8), which appears to be independent of previously reported associations in this locus. Finally, we found no strong evidence for gene-environment interactions with smoking or hay fever status. Conclusions Population-based cohorts with simple asthma phenotypes represent a valuable and largely untapped resource for genetic studies of asthma.


Introduction
Asthma, characterized by episodic breathlessness, chest tightness, coughing and wheezing, is estimated to affect 300 million people worldwide [1] and is associated with morbidity and economic costs that are comparable with other common chronic diseases [2,3]. As with many common diseases, asthma risk is determined by both genetic and environmental factors, and estimates of heritability (the proportion of variability in risk within a population due to inherited factors) range from 35 to 90% in twin studies [4,5,6]. However, only a handful of genetic variants have thus far been validated as associated with asthma risk at stringent levels of statistical significance.
The first genome-wide association (GWA) study for asthma [7] was conducted in 2007, with 994 child-onset asthmatics and 1,243 non-asthmatics. This study implicated a locus near ORMDL3a gene not previously suspected to have a role in asthma susceptibility. Although the initial association did not reach a widely used threshold for genomewide significance in discovery samples (P,5610 28 ), the association was subsequently replicated, including numerous independent studies, particularly in European-derived and Hispanic populations [8,9,10,11]. Subsequent GWA studies of asthma identified variants in PDE4D [12] and DENND1B [13] associated with asthma at genome-wide significant levels, although the proposed variant in DENND1B showed considerable heterogeneity of associations between different populations. In a GWA study of severe asthma, two other loci narrowly missed genome-wide significance [14]: one in the HLA region, and one near RAD50, (a locus previously shown to be associated with total IgE levels) [15]. A GWA study of eosinophil counts with follow-up testing in asthma case-control studies identified variants near IL1RL1/IL18R1 and IL33 as being associated with asthma [16]. More recently, a meta-analysis of 23 GWA studies by the GABRIEL consortium [17] studied 10,365 physician-diagnosed asthmatics and 16,110 participants without asthma. They observed associations with asthma, particularly childhood-onset asthma, at multiple loci (ORMDL3/GSDMB, IL1RL1/IL18R1, HLA-DQ, IL33, SMAD3, and IL2RB). Several additional loci in this study narrowly missed the genome-wide significance threshold. Most recently, GWA studies in largely non-European ancestries [18,19] confirmed some of the loci discovered in Europeans, and identified additional associations at PYHIN1, USP38-GAB1, an intergenic region of 10p14, and a gene-rich region of 12q13. A very recent GWA study of severe asthma in Europeans found strong evidence for two previously established loci (ORMDL3/GSDMB and IL1RL1/IL18R1) in patients with severe asthma but did not identify any novel loci [20].
Many of the above findings have come from GWA studies on samples specifically established for the investigation of allergy and asthma, often with rich phenotypic information on asthma and related diseases. While such detailed phenotypic information is extremely valuable, collection of such samples is resourceintensive, potentially limiting the sample size. As for all other polygenic traits, sample sizes of current GWA studies of asthma are a limiting factor in the search for genetic risk factors [21]. Expanding GWA studies to include additional cohorts that may not have such detailed phenotypic information could increase the power to detect associations.
One route to increasing power may be found in the numerous population-based cohorts with existing genome-wide genotype data that also have basic information on doctor-diagnosed asthma, but have not been comprehensively analyzed for associations with asthma. The idea of reanalyzing existing GWA data is similar to the approach adopted by consortia studying quantitative traits that are routinely measured in many studies, such as height and weight [22,23]. However, unlike anthropometric measures, asthma is a disease where diagnostic criteria may not always be consistent [24,25,26]. As such, it is uncertain whether the rather minimal phenotype of a self-report of doctor-diagnosed asthma is sufficient to be useful in genetic studies of asthma.
In this paper, we test whether a self-report of doctor-diagnosed asthma in population-based cohort could be used to replicate findings previously reported by other asthma GWA studies or identify novel genetic associations with asthma. To achieve this aim, we formed the Analysis in Population-based Cohorts for Asthma Traits (APCAT) genetics consortium, which currently includes 18,604 adults of European ancestry (1,716 cases and 16,888 controls) from six population-based cohorts with GWA data. To study further the top hits emerging from the metaanalyses of APCAT (Stage 1), we utilized in-silico replication data (Stage 2) from 15,286 adults of European ancestry (4,035 cases and 11,251 controls). We were able both to replicate known signals and to identify new associations, indicating that genetic studies of asthma in population-based cohorts are likely to be useful complements to more focused studies of asthma.

Results
The APCAT genetics consortium includes 1,716 individuals with asthma diagnosed (ever) by a physician, and 16,888 nonasthmatic controls, from six population-based cohorts ( Table 1). Genome-wide genotyping was conducted on available platforms and, after standard quality control (see Methods), subsequently imputed to ,2.5 million autosomal SNPs using the HapMap CEU reference panel (Table S1). Association studies with asthma used an additive genetic model and were adjusted for sex, ancestry-informative principal components and (in non-birth cohorts) for age; we also controlled for relatedness in family-based cohorts. We meta-analyzed the study-specific results using a fixed effect model and considered ,2.2 million SNPs with imputation quality .0.30 and minor allele frequency .5%. We applied genomic control at the individual study level and again after metaanalysis to correct for inflation of test statistics due to any systematic bias. The individual study genomic control inflation factors were modest (l GC #1.02 for all studies; Table S1). We also explored whether stratifying the individual cohorts by smoking status or by the presence of allergic symptoms prior to meta-analysis substantially affected our results. We observed strong correspondences between the unstratified and stratified analyses ( Figure S1), and therefore primarily report on the unstratified results as a simpler main analysis with slightly improved power.

Analysis of genetic variants previously associated with asthma
To test whether the population-based cohorts and the phenotype of self-reported doctor-diagnosed asthma could be used to detect associations, we analyzed thirteen SNPs in nine genomic loci that had shown genome-wide significant associations in previously published GWA studies of European ancestry individuals. Encouragingly, the APCAT results ( Table 2) are directionally consistently for all thirteen SNPs, and nominally replicate (one-tailed P,0.05) ten of these thirteen SNPs. We were unable to assess the PYHIN1 variant reported as associated with asthma in African-Americans [18] as this variant is monomorphic in European populations. For variants discovered in Japanese individuals [19,27], it is harder to interpret replication in our samples because of differences in LD. Nevertheless, we examined the associations in APCAT for the reported lead SNPs and additional SNPs in LD in the HapMap JPT sample. For two out of the four loci, the associations showed directional consistency, with one association nominally replicated (rs1701704 in IKZF4, p = 0.00132, see Table S2).
We also examined three SNPs where previous evidence was strongly suggestive (P,5610 27 for association with asthma) but not genome-wide significant ( Table 2), and one of them, rs11071559 in RORA, is strongly supported in our data (Reported P = 1.1x10 27 ; P APCAT = 0.0031). The combined evidence of association at RORA (P = 2.4610 29 ) now surpasses the genomewide significance threshold, and therefore rs11071559 represents a new genome-wide significant association with asthma. We note that the RORA locus is 6.4 Mb away from a known asthma variant (rs744910 near SMAD3), but these two loci are independent, because there is low linkage disequilibrium between these two variants (pairwise linkage disequilibrium, r 2 = 0.01 in HapMap CEU panel) and because the association estimates for these two variants are virtually unchanged when conditioned on each other (Table S3).

Search for novel asthma risk loci in APCAT
Having established the validity and utility of our populationbased studies, we next examined the results of a genome-wide meta-analysis of the studies within APCAT ( Figure S2), with the goal of identifying novel associations with asthma. An inspection of the quantile-quantile plots ( Figure S2) and the low genomic inflation factor (l = 1.01) of the meta-analyses indicate that there is little evidence of confounding by population stratification or other technical artifacts.
To test whether some of the top results from the APCAT analysis could represent valid associations with asthma, we considered the most strongly associated 14 SNPs from independent loci within the APCAT results (Stage1 in Table 3). We obtained in silico replication data for 14 top SNPs from several additional studies of asthma, including two population based cohorts. These replication studies consisted of the 1958 British Birth Cohort (B58C), Australian Asthma Genetics Consortium (AAGC), the second survey of the European Community Respiratory Health Survey (ECRHS2) and the European Prospective Investigation of Cancer in Norfolk (EPIC-Norfolk) ( Table S1). The results from the replication studies (Stage2) and a meta-analysis of these results with the APCAT data (Stage 1 + Stage 2) are summarized in Table 3.
The most strongly replicated SNP is rs13408661 near the IL1RL1 and IL18R1 genes, with the combined P value reaching genome-wide significance (P Stage1 = 3.9610 26 ; P Stage2 = 3.2610 25 ; P Stage1+Stage2 = 1.1610 29 ; Figure 1). This SNP lies approximately 31.1 kb from rs3771166 (pairwise r 2 = 0.157) identified by GABRIEL [17] and 2.6 kb from rs1420101 (r 2 = 0.053) identified by an earlier study [16] as genetic risk factors for asthma. Conditioning rs13408661 on either rs3771166 or rs1420101 did not substantially reduce the signal of association in APCAT (Table S3), implying that rs13408661 represents a different signal of association at this locus. Interestingly, a previous study focusing on previously reported risk loci for asthma in a subsample of the Australian samples in this report [8] had also suggested that an association with rs10197862, which is tightly correlated with rs13408661 (r 2 = 0.932), was independent of other associated variants in the region. Quite recently, a study of ethnically diverse sample of Americans [18] identified a genomewide significant association at rs3771180/rs10173081, which are also tightly correlated with rs13408661 (r 2 = 0.907 and 1). Therefore, the combined data, including our conditional analyses, indicate that rs13408661 truly represents an additional genomewide significant signal of association with asthma at this locus. Our second strongest signal is at rs9268516 (P Stage1 = 1.2610 27 ; P Stage2 = 1.0610 23 ; P Stage1+Stage2 = 1.1610 28 ; Figure 2) in the HLA region on Chromosome 6, approximately 246 kb away from the rs9273349 variant (r 2 = 0.324) identified by GABRIEL [17]. Conditional analysis (Table S3) indicates that the association at rs9273349 cannot completely explain the association at rs9268516, suggesting either the presence of multiple signals at this locus or the presence of a third variant (partially correlated with both rs9273349 and rs9268516) that explains the association at both of these variants.
None of the other variants had compelling evidence of replication in our Stage2 data ( Table 3), although rs7861480 near IFNE showed a trend in the same direction (P = 0.064). We also examined the evidence for association of these 14 SNPs or their proxies in the GABRIEL data (estimates downloaded from http:// www.cng.fr/gabriel/results.html), excluding B58C and ECRHS (which participated in Stage2 of the APCAT study) and the occupational asthma cohorts. Consistent with our Stage2 results, we saw directionally consistent evidence of association at rs13408661 (P = 7.0610 -6 ) in the IL1RL1/IL18R1locus and rs9268516 in the HLA region (P = 0.069); rs7861480 near IFNE showed directional consistency but was not significantly associated with asthma in GABRIEL (P = 0.19).
We also estimated the variance explained in the APCAT by the most strongly associated SNP at each of the previously reported loci. Under a liability threshold [28], and assuming a prevalence of 9% (the prevalence in APCAT), these SNPs together explain ,1.6% of the population variance in asthma risk (Table S4). Of course, additional variance is explained by multiple signals at each locus (such as at IL1RL1/IL18R1and HLA regions), variants at additional loci (such as at RORA), and variants yet to be discovered by additional genetic studies.

Search for interactions with smoking and allergic status in APCAT
To our knowledge, there has been no genome-wide search for gene-environment interactions with smoking status or allergic status, two important modifiers for risk of developing asthma. We scanned ,2.2 million SNPs for gene-environment effects for smoking exposure in APCAT by comparing, for each SNP, the estimated association statistics with asthma in current smokers to the estimates in never smokers. The strongest evidence for interaction with smoking status did not reach genome-wide significance [estimated pooled odds ratio for interaction between rs1007026 (nearest genes are MOCS1 and DAAM2) and smoking status = 1.89; 95% confidence interval 1.43 to 2.49,  Figure 1. Regional association and forest plots for rs13408661 in the IL1RL1/IL18R1 locus. For the regional plot, the lead SNP is indicated by a purple diamond, and the degree of linkage disequilibrium (r 2 ) of other SNPs in the region to the lead SNP is indicated by the color scale. Genes are shown below, and estimated recombination rate is indicated by the blue lines. Note that the regional plot is based on Stage1 (pooled) estimates only. For the forest plot, the estimated odds ratio and 95% confidence interval for each individual study is shown by the boxes (scaled to sample size) and lines; pooled estimates and 95% confidence intervals are indicated by diamonds. doi:10.1371/journal.pone.0044008.g001  Figure S3]. Similarly, we scanned for interactions with allergic status in APCAT by comparing, for each SNP, the estimated association statistics for asthma in individuals with hay fever (the most commonly available measure of allergic risk factors in the APCAT studies) to the estimates in individuals without hay fever. The locus with the strongest evidence of interaction with allergic status did not reach genome-wide significance either [estimated pooled odds ratio for interaction between rs17136561 (located in SLC22A23 which overlaps with PSMG4 and TUBB2B) and hay fever status = 1.64; 95% confidence interval 1.33 to 2.02, P = 2.3610 26 ; Figure S3]. We did not pursue either of these loci further.
We also investigated whether the associations of the previously known or suggestive loci, and the signals emerging from this paper (rs13408661 near the IL1RL1/IL18R1 genes and rs9268516 in the HLA region), differed in association by smoking or allergic status. The direction of association for asthma in the never smokers and also in non-allergic individuals (i.e. the healthy subgroups) were generally consistent with the unstratified analysis, but with weaker signals as expected with reduced sample sizes (Table S5). A formal test for heterogeneity between smoking strata indicated no significant differences. Similarly, there was no significant heterogeneity for the allergic strata except possibly for rs2284033 in IL2RB (P heterogeneity = 0.038), where opposite directions of association with asthma were observed in the two allergic strata, and for rs11071559 in RORA (P heterogeneity = 0.059) where the signal for asthma association appears to be seen predominantly in the allergic individuals.

Participants and Studies
Cases and controls for the discovery study were drawn from six population-based studies of individuals of European ancestry: FINRISK [29], Framingham Heart Study [30], Health 2000 [31], Helsinki Birth Cohort [32], Northern Finland Birth Cohort of 1966 [33] and Young Finns Study [34]. All cohorts were genotyped using commercially available genotyping arrays and SNPs which passed QC filters were used to impute up to 2.5 million SNPs using HapMap CEU as the reference. Participants for the in silico replication were drawn from the 1958 British Birth Cohort (B58C) [7], the Australian Asthma Genetics Consortium (AAGC) [35], European Community Respiratory Health Survey followup (ECRHS) [36] and EPIC-Norfolk [37]. Study characteristics are given in Table 1 and Supplementary Methods; genotyping and imputation details are given in Table S1. The most strongly associated SNPs from APCAT were checked for validity after re-analysis of data from the GABRIEL (A Multidisciplinary Study to Identify the Genetic and Environmental Causes of Asthma in the European Community) study [17] (excluding B58C, ECRHS2 and cohorts with occupational asthma) made available at www.cng.fr/gabriel (Table 3).

Phenotype definition and stratification
Cases were defined as individuals who had given an affirmative questionnaire response to the question ''Have you ever been diagnosed with asthma?'' (exact wording varied among questionnaires -see Supplementary Methods). The remaining subjects served as healthy controls if they did not affirmatively respond to any of the following: self-reported asthma without a physician diagnosis, chronic obstructive pulmonary disease, emphysema, chronic bronchitis, chronic cough associated with wheeze, other lung disease, or FEV 1 ,70% of predicted. Individuals with reports of chronic obstructive pulmonary disease, emphysema, chronic bronchitis, or other lung diseases were also excluded from the cases.
We also conducted two stratified analyses of asthma: smokingstratified and allergy-stratified. Allergic status was defined using an affirmative response to the question ''Have you ever had hay fever or other allergic nasal symptoms?'' (exact wording varied among questionnaires), as this was the most uniformly available information on allergy in APCAT. Participants were divided into three smoking categories: never smokers, ex-smokers if smoked regularly more than a year ago or current smokers if currently smoking or smoked regularly in past year.

Statistical analyses
The association statistic for each SNP oriented towards the forward strand and was calculated assuming an additive genetic model adjusting for sex, ancestry-informative principal components, and (in non-birth cohorts) for age. In the family-based Framingham Heart Study, the association analysis was done controlling for family structure using the GWAF package in R [38]. The data for FINRISK, Health 2000, the Helsinki Birth Cohort, and the Young Finns Study were analyzed together with an adjustment term for cohort. The data from this combined dataset along with data from the Framingham Heart Study and the Northern Finland Birth Cohort of 1966 were meta-analyzed and verified using the fixed effect inverse-variance method implemented in METAL [39] and R. Genomic control was applied at the individual study level as well as in meta-analysis stage. Approximately 2.2 million SNPs with imputation quality (info) .0.30 and minor allele frequency (MAF) .5% were analyzed.

SNP selection for in silico replication
We selected a locus (defined as a region 500 kb wide) for further follow-up if it either contained a single SNP with P,10 26 or multiple SNPs with P,10 25 . The ''sentinel SNP'' was defined as the SNP with the most significant P value and was included in the replication list.

Discussion
We completed a genome-wide association study of a simple asthma phenotype -self-report of ever having been diagnosed by a physician with asthma -from 18,604 participants from six population-based studies comprising the APCAT consortium. We performed follow-up analyses of the top signals from APCAT in 15,286 additional individuals. These results provided strong evidence for an additional associated variant in the HLA region, a known asthma locus, and confirmed recent reports of multiple associations at the IL1RL1/IL18R locus. We also examined the evidence for association in APCAT of SNPs with previous genome-wide or suggestive evidence of asthma, and show that the results from our population-based studies validate and in one case newly establish genome-wide significant associations with asthma. Finally, we found no evidence of genes modifying the relation between smoking and asthma or the relationship between hay fever and asthma.
The present study has several strengths. First, it demonstrates the usefulness of a large untapped resource to complement genetic studies of asthma: population-based studies with genome-wide genotype data and a simple asthma phenotype: self-reported information on doctor-diagnosed asthma. We also present the first comprehensive search for genetic interactions with smoking status and with hay fever, two important modifiers of asthma development. Finally, we provide evidence for new genome-wide significant associations with asthma: one novel signal where there was prior suggestive evidence of association (RORA), one independent novel signal at a previously associated locus (the HLA region), and one previously associated locus where we demonstrate multiple independent signals (IL1RL1/IL18R1).
It is important to recognize some limitations of this present study. First, the constituent studies in APCAT studies have limited information on asthma, such as age of asthma onset or severity of symptoms, which prevents a more detailed investigation of associated loci. Second, although controls were carefully selected to exclude individuals with other respiratory diseases or abnormalities on spirometry (see methods) that may share pleiotropic risk alleles with asthma, the choice of controls in population-based cohorts is potentially subject to misclassification bias due to the inclusion of undiagnosed cases among the controls. We note that this problem is common to many study designs for diseases with variable age at onset such as asthma, but does not prevent the identification of new disease markers. Third, the power to detect novel associations in population based studies is restricted by the low prevalence of disease (our prevalence was ,9%) compared to a case-control study of an equivalent total sample size and an equal number of cases to controls. However, we note that the large number of additional available population-based studies similar to the ones in APCAT still represent a large untapped pool of genotyped cases. Fourth, we only examined SNPs with frequencies above 5%, so have not tested rarer variants for association to asthma. Finally, the use of self-reported diagnosis of asthma in the APCAT cohorts, even though it was doctor-diagnosed, may also have led to misclassification within cases.
Despite these limitations, we were able to independently validate 10 of 13 SNPs previously reported as being associated for asthma through GWA studies, which indicates that populationbased studies with simple asthma phenotypes can indeed complement ongoing genetic studies in asthma. For most (but not all) of these variants, the estimated effect sizes in APCAT are smaller than reported, which could be due to the ''winner's curse'' phenomenon [40], slightly greater misclassification in our cohorts, or other differences between this study and previous reports. However, the fact that most of the APCAT odds ratios fall within the 95% confidence intervals of the original reports and the pvalues typically becomes more significant when combined with the APCAT data suggests that the power gained by adding population-based studies still outweigh the effects of any misclassification there may be. When the reported estimates are combined with our data, the variant rs11071559 in RORA reaches genomewide significance. In the subsequent analysis, the association in RORA was perhaps more strongly associated with asthma in individuals with hay fever. This gene belongs to a subfamily of nuclear orphan receptors suggested to negatively regulate inflammatory response [41]. Interestingly, RORA deficient mice have diminished capacity to mediate allergic inflammatory response [41,42,43]. A recent paper found that RORA is critical for the development of nuocytes, which are part of the innate immune response and contribute to asthma response, in mice [44] Another recent study that looked at asthma candidate genes found that RORA is differentially expressed during lung development in both mouse and human [45].
By analyzing the results of meta-analysis of APCAT studies, we identified and successfully replicated associations of rs13408661 in the IL1RL1/IL18R1 region and of rs9268516 in the HLA region; the association with rs13408661 is in agreement with recent findings [8,18]. These two loci had originally been reported to contain other strongly associated variants [16,17], which we now show are distinct from the ones we identify. Observing multiple signals at a locus could either be due to multiple true causal variants, as seen in genetic studies for height and other polygenic traits [22,46], or due to a single causal variant that is not well tagged by either signal. While fine mapping of data in these regions would be required to resolve this problem conclusively, we note that the variants we identified were only modestly correlated with published variants and remain significant even after conditioning on known risk variants. Thus, our data provide two genome-wide significant signals of association at known loci that are distinct from the first variants originally reported to be associated with asthma at genome-wide significance, and in one case (HLA) is not accounted for by the known associations.
Finally, we present the first comprehensive search for genetic interactions with smoking status and with hay fever, two potentially important modifiers of asthma development. In our studies, we found no convincing evidence for SNPs that interacted with either smoking or hay fever. A larger meta-analysis would be required to confirm the absence or existence of such geneenvironment effects for asthma.
In conclusion, these results strongly suggest that GWA studies of population-based cohorts with simple asthma phenotypes are an effective approach to find novel asthma-associated variants, replicate signals identified in other studies, and provide estimates that are representative of the demography and disease spectrum. Such cohorts are an untapped resource that can be utilized to complement genetic studies of asthma. We anticipate that many more population-based studies could be leveraged to assist with discovery of asthma susceptibility loci, which could potentially lead to more effective or targeted therapies and preventions.