A family-based genome-wide association study of chronic rhinosinusitis with nasal polyps implicates several genes in the disease pathogenesis

Background The pathogenesis of chronic rhinosinusitis with nasal polyps is largely unknown. Previous studies have given valuable information about genetic variants associated with this disease but much is still unexplained. Our goal was to identify genetic markers and genes associated with susceptibility to chronic rhinosinusitis with nasal polyps using a family-based genome-wide association study. Methods 427 patients (293 males and 134 females) with CRSwNP and 393 controls (175 males and 218 females) were recruited from several Swedish hospitals. SNP association values were generated using DFAM (implemented in PLINK) and Efficient Mixed Model Association eXpedited (EMMAX). Analyses of pathway enrichment, gene expression levels and expression quantitative trait loci were then performed in turn. Results None of the analysed SNPs reached genome wide significant association of 5.0 x 10−8. Pathway analyses using our top 1000 markers with the most significant association p-values resulted in 138 target genes. A comparison between our target genes and gene expression data from the NCBI Gene Expression Omnibus database showed significant overlap for 36 of these genes. Comparisons with data from expression quantitative trait loci showed the most skewed allelic distributions in cases with chronic rhinosinusitis with nasal polyps compared with controls for the genes HLCS, HLA-DRA, BICD2, VSIR and SLC5A1. Conclusion Our study indicates that HLCS, HLA-DRA, BICD2, VSIR and SLC5A1 could be involved in the pathogenesis of chronic rhinosinusitis with nasal polyps. HLA-DRA has been associated with chronic rhinosinusitis with nasal polyps in previous studies and HLCS, BICD2, VSIR and SLC5A1 may be new targets for future research.


Introduction
Chronic rhinosinusitis (CRS) as defined by the European Position Paper on Rhinosinusitis and Nasal Polyps (EPOS) [1] is classified into chronic rhinosinusitis with nasal polyps (CRSwNP) and chronic rhinosinusitis without nasal polyps (CRSsNP). CRSwNP is a disease characterized by benign outgrowths from the middle meatus of the nasal cavity and chronic sinonasal inflammation. CRSwNP is a common chronic disease and depending on the geographical area, 2-4% of the population is afflicted [2][3][4]. The disease causes individual suffering and a decreased quality of life [5,6]. Risk factors include asthma, male sex and increasing age. The disease often requires a combination of surgical and medical treatment. However, CRSwNP often recurs even after therapy.
The aetiology of the disease is unknown. Several environmental factors have been suggested and previous studies have also shown an increased prevalence among relatives [7,8] and a higher rate of positive family history of CRSwNP among those affected [9][10][11], confirming a genetic susceptibility to the disease. Compared to the general population, having an afflicted family member increases the risk of disease five times [7]. Genetic studies on patients with CRSwNP could help to explain the pathogenesis of the disease and over time identify new drug targets leading to a more effective, individually tailored, therapy.
Genetic association can be explored using candidate gene studies or genome-wide association studies (GWAS). Candidate gene studies usually investigate a small number of singlenucleotide polymorphisms (SNPs) or other types of genetic variation, in order to find or reject associations between the genetic variants and the disease in question. These studies rely on previous knowledge and hypotheses regarding which SNPs to suspect and investigate. In comparison, a GWAS investigates hundreds of thousands of SNPs across the whole genome and is therefore not reliant on previous knowledge or hypotheses regarding the pathogenesis of the investigated disease or trait. A large number of GWASs have been performed for various complex diseases such as diabetes and asthma which has led to the finding of novel genetic pathways [12]. There is currently no published GWAS performed only on patients with CRSwNP but there is a pooling-based GWAS done on patients with CRS (both CRSsNP and CRSwNP) [13] as well as several studies of candidate genes [14]. These studies have implicated several genes and pathways such as the cystic fibrosis transmembrane conductance regulator gene (CFTR) [15,16] and, among others, genes involved in immunity [17][18][19][20], inflammation [21,22], tissue remodelling [20] and arachidonic acid metabolism [23] Even though GWA studies have been successful in detecting genetic variants associated with many common diseases, the inability to explain most of the estimated heritability makes linkage analysis an alternative to detect possible rare variants. To this date, no published linkage analysis have been performed on subjects with CRSwNP. However, one such study has been performed on 8 subjects with CRS (not specified whether any of them had CRSwNP) which found a linkage signal on chromosome 7q [24]. A combination of a GWAS and linkage analysis such as a family-based GWAS could be a more potent way of identifying both common and rare variants associated with CRSwNP [25].

Aim
The aim of this study is to identify SNPs and genes associated with CRSwNP susceptibility using a family-based approach.

Materials and methods Material
367 patients with CRSwNP (250 men and 117 women, mean age 52.3 years) from three Swedish ear, nose and throat clinics were recruited. These subjects were all known patients at their respective clinics, all of them fulfilled the EPOS criteria for CRSwNP [1] and had at least intermittently been on either intranasal or systemic corticosteroids, most of them had undergone at least one operation for the condition. In order to increase power level for genome-wide analysis, patients with associated diseases such as asthma or aspirin-exacerbated respiratory disease (AERD) were not excluded. A total of 453 first-degree relatives (218 men, 235 women, mean age 49.4 years) were also recruited.
The study was carried out in accordance with the Declaration of Helsinki and was approved by the Ethics Committee at the University of Gothenburg, Sweden. Written consent was obtained from all participants.

Investigation
Nasal endoscopy was performed on all participants using a 2.7 mm rigid endoscope (KARL STORZ) and the participants were subsequently phenotyped as either having CRSwNP or being free from this disease. Additional data about asthma and corticosteroid medication (used to counter symptoms from either the upper or lower airways) was obtained via a structured interview. Peripheral blood samples were collected from each individual. DNA was extracted from whole blood using an in-house protocol at KBiosciences (LGC Genomics, Hoddeston UK). The HiSeq Illumina platform was used for genotyping. 144 of the samples were run on Illumina Omni Express bead chips and the remaining 676 on the Illumina Core Exome array.
SNPs with low linkage disequilibrium among each other (LD) (r 2 <0.2) were selected for population structure analysis. As a reference, samples from the 1000 Genomes project, Phase 3 were retrieved. Principal component analysis was then performed using GCTA, a genomewide complex trait analysis software [26], and the first three principal components were investigated. As a measure of non-European admixture in each sample, we calculated the Euclidean distance E from that sample to the mass centre of the CEU reference group. Individuals with E > 5 SD E were removed (2 samples).
Only autosomal markers shared in both genotyping platforms were retained. Finally, principal components analysis was performed to check for batch effects. Visual inspection of sample scores along the first three principal components showed no differences between batches.

Association testing
Two methods were used to generate SNP association values. First, we used DFAM, implemented in PLINK, which combines the transmission disequilibrium test (TDT), the sibling TDT and an allelic test for unrelated cases and controls in a single Cochran-Mantel-Haenszel test for each marker [27]. The second method was EMMAX (Efficient Mixed Model Association eXpedited), implemented in Golden Helix SNP & Variation Suite v8.3.4 [28]. This method involves computing an empirical relatedness matrix of the samples, and using this relatedness as a covariate in linear regression for each marker [29]. We performed this test using additive, dominant and recessive models, and the smallest p-value from the three models was assigned to each SNP. A commonly used level of significance in conventional GWA-studies on unrelated subjects is 5x10 -8 [30] but many variants that do not reach this level of significance can still be true disease-influencing variation. The decision was therefore made to perform post-GWAS analyses for the 1000 SNPs with the lowest p-values for each method rather than adhering to a strict threshold for the level of significance.

Pathway enrichment analysis
The top 1000 markers with the most significant association p-values were combined into intervals of SNPs in high LD (defined by pairwise r 2 >0.25). This process was performed separately for DFAM and EMMAX association results.
INRICH software was then used to detect possible pathway enrichment within these regions [31]. To calculate the significance of such overlaps, the process was repeated 50,000 times with random genomic regions, however matched in size and SNP density. INRICH analysis was performed separately using DFAM or EMMAX results and Gene Ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG)-based gene-sets. The 20 gene-sets with the highest enrichment p-values were retrieved from each of these setups. INRICH produced a list of genes which were located close to the top GWAS 'hits' in the genome and that share functional annotations. All genes retrieved in this way from the four INRICH analyses (DFAM +GO, EMMAX+GO, DFAM+KEGG, EMMAX+KEGG) were combined together, creating a list of target NP genes implicated in this study.

Gene expression data
Publicly-available gene expression data, collected by Plager et al. [32], was retrieved from NCBI Gene Expression Omnibus (GEO) database ( [33]; series accession number GSE23552). Per authors' recommendation, two samples (aCRSm1 and aCRSm2) were excluded, leaving 20 case samples (all from patients with CRSwNP) and 17 control samples from either allergic rhinitis patients or healthy individuals. Expression levels between the case and control groups were compared using the GEO2R interface. All genes corresponding to probes with significant difference in expression levels (Benjamini-Hochberg FDR < 0.05) comprise the differentiallyexpressed gene set.

Expression quantitative trait loci (eQTL) analysis
Two datasets were used for eQTL analysis: Blood eQTL from Westra et al. [34] and Multiple Tissue Human Expression Resource (MuTHER) project [35]. These datasets are produced by microarray genotyping and expression profiling of selected tissue samples. SNP variations are then associated with gene expression patterns, resulting in a list of regulatory SNPs for each gene.
In MuTHER project, the regulatory effects of each SNP were determined in adipose, skin tissues and lymphoblastoid cell lines (LCL). For each SNP-gene pair we have retained either LCL or skin data, corresponding to the tissue with more significant regulatory effect (i.e. lower p-value in skin samples meant that LCL data was discarded for that SNP-gene pair). We also excluded all SNPs with p-values > 0.05 or absolute effect size (regression coefficient β) of < 0.01. FDR of 0.5 was used as a cut-off for the Blood dataset, with no additional limits on effect size.
To check for directed eQTL enrichment, all eQTL SNPs for each gene of interest were extracted and classified according to the direction of their regulatory effect (up-regulating or down-regulating). The frequency of the allele bearing the reported regulatory effect was then determined in our GWAS cases and controls using PLINK [27]. The marker was then assigned to a bin depending on whether the regulatory allele shows higher frequency in cases or in controls. In this way, a 2x2 contingency table was constructed for each gene, where all SNPs fall into one of four bins (up-regulating + less frequent in cases; up-regulating + more frequent in cases; down-regulating + less frequent in cases; down-regulating + more frequent in cases). Fisher's test was used to test whether the regulatory effect and frequency difference are dependent.
However, Fisher's test does not account for the effect of LD between SNPs. Therefore, the empirical significance was calculated using an iterative procedure. Genes were ordered according to the number of eQTL SNPs remaining after all filters; genes found in the differentiallyexpressed NP set (as described in the previous section) were removed; for each gene of interest with n SNPs, 500 genes with the same number n of SNPs are retrieved; if less than 500 genes have the required number of SNPs, genes with n+1 (then n+2, n+3. . .) SNPs are also retrieved, and n SNPs are randomly selected for analysis in those genes. Each gene is analysed in the same manner as the target gene. Resulting empirical distribution of p-values is used to determine the empirical significance for the gene of interest.
The workflow of the analysis is shown in Fig 1.

Genetic analysis
Six samples were removed due to heterozygosity > 3 SDs above or below mean, three samples were removed due to > 100 Mendelian errors and one sample due to a mismatch between genotyped and indicated sex. The final dataset after quality-control consists of 782 individuals and 233 409 SNPs. Additional data about the subjects who passed quality control are presented in Table 1.  Table 2 lists the 30 SNPs with the strongest associations from the DFAM analysis and Table 3 lists the 30 SNPs with the strongest association values from the EMMAX analysis. None of tested SNPs reached a significance of 5x10 -8 , the top ranking SNP from the DFAM analysis was rs4629180 with a p-value of 1.47x10 -6 , the top ranking SNP from the EMMAX analysis was rs2491026 with a p-value of 0.00014.
From the pathway enrichment analysis we extracted the top 20 gene-sets from each of the four INRICH analyses resulting in a combined list of 138 target CRSwNP genes from this study (S1 Table).
Out of our 138 target genes from the INRICH analysis, 36 genes showed a significant difference in mRNA expression levels between nasal polyp tissue and normal tissue (from Plager et al. [32]) (Table 4) Finally, the eQTL analysis for the Blood eQTL dataset showed significantly skewed distributions of eQTLs in cases with CRSwNP compared to controls for HLCS (empirical p-value 0.014) HLA-DRA (empirical p-value 0.02) and BICD2 (empirical p 0.046) ( Table 5). The same analysis performed with MuTHER eQTL dataset showed significantly skewed eQTL distribution in cases for VSIR (empirical p-value 0.006), HLCS (empirical p-value 0.014) and BICD2 (empirical p-value 0.016). SLC5A1 also had a skewed distribution with an empirical p-value of 0.052, these results are provided in Table 6.

Discussion
None of the SNPs in this study reached the suggested genome-wide significance level of 5x10 -8 . However, by using pathway enrichment and post-GWAS analyses, we identified five interesting The present study is the first GWAS performed only on subjects with CRSwNP. Using family relationship data and non-transmitted genetic variation for control, as in the TDT, is both a strength and a potential weakness. In most chromosomal regions, we expect the related individuals to be more similar compared with completely unrelated controls. Therefore, when there are differences between related individuals these are more likely to be due to the disease than to general differences in a population (population stratification). In practice this is a strength because it would be expected to lead to less false positive results. However, due to the increased risk of CRSwNP among relatives [7] and the increased prevalence of polyps with higher age [4], some of the relatives who we have defined as not having polyps could develop CRSwNP later in life and therefore be falsely classified as controls in this study. This could possibly have led to missed markers and genes of potential importance. However, most of the relatives in this study are middle-aged or older (mean age 49.4 years) and the prevalence of nasal polyps among them is 13% which makes it unlikely that more than a few percent of the relatives are falsely classified as phenotype negative. One could also argue that there could be subjects with asymptomatic polyps among the 55 relatives who had nasal polyps during endoscopy. However, 21 of them knew they had polyps beforehand and only 34 were unaware of this. The heritability of CRSwNP makes it much more likely that first-degree relatives of Family-based GWAS and CRSwNP patients with CRSwNP would inherit variants associated with CRSwNP than inherit variants associated with asymptomatic nasal polyps. In the event that there are many relatives with polyps but without the predisposition to develop CRS it would indeed influence the GWAS findings, however excluding relatives with polyps would most likely influence the result in a more negative way and severely limit the study. The HLCS gene was the most significant gene from the eQTL analysis in the Blood dataset and the second most significant in the MuTHER dataset. HLCS is under-expressed in nasal polyp tissue with significantly increased frequency of down-regulating alleles among cases in the eQTL analysis from both the MuTHER and the Blood datasets. This gene encodes the enzyme Holocarboxylase synthetase, which is important for biotin metabolism. Holocarboxylase synthetase itself has not been implicated in CRSwNP but a study from 2013 showed that the enzyme catalyses biotinylation of heat shock protein 72 thereby inducing the expression of the gene RANTES (regulated on activation normal T-expressed and presumably secreted) [37]. RANTES is implicated in multiple studies of CRSwNP; for example a study by Chao et al. found a positive correlation between plasma RANTES protein levels and severity of disease among patients with CRSwNP [38]. RANTES protein has also been detected in nasal polyps using immunological staining [39]. Although it seems counter-intuitive that HLCS is underexpressed when RANTES protein levels are higher in polyp tissue compared with controls, the up-regulation of RANTES might be a counter reaction to initially too low levels during an early disease phase. Further studies are needed to increase the knowledge about the role of HLCS in the pathogenesis of CRSwNP.
HLA-DRA is over-expressed in polyp tissue and has a significantly skewed distribution of eQTLs from the Blood dataset where cases have an increased number of down-regulating alleles. HLA-DRA is one of the major histocompatibility complex, class II genes.

Family-based GWAS and CRSwNP
Polymorphisms in this gene have been associated with the presence of nasal polyps in asthmatic patients [36]. Additionally, polymorphisms in other HLA class II genes have been linked to CRSwNP [18,40]. Using HLA typing on a series of 29 patients with nasal polyps, with or without asthma, Moloney and Oliver found a significant increase in the haplotype AI/B8 in patients with both nasal polyps and asthma [41]. BICD2 is over-expressed in nasal polyps and up-regulating alleles more common in controls and down regulating slightly more common in cases. The gene product is bicaudal D Family-based GWAS and CRSwNP homolog 2, which has been shown to induce microtubule movement [42]. It is also linked to dominant congenital spinal muscular atrophy [43]. The relationship between gene-expression and eQTLs is reversed for HLA-DRA and BICD2 where cases have an increased number of down-regulating eQTLs even though the gene is over-expressed in polyp tissue. Over-production in the diseased state could possibly be the result of the body compensating for an under-production in the pre-disease state, which hypothetically could have contributed to the development of the disease.
VSIR (V-set immunoregulatory receptor) is over-expressed in polyp tissue and up-regulating alleles from the MuTHER dataset are more common in our cases with CRSwNP compared with unaffected individuals. The gene codes for the protein V-type immunoglobulin domaincontaining suppressor of T-cell activation, a member of the Ig superfamily. An experimental study has suggested that it could facilitate tumour invasiveness by regulating cell surface membrane-type 1 matrix metalloproteinase [44]. Lines et al. published an article showing that it also acts a negative checkpoint regulator that suppresses T cell activation [45]. It has not been implicated in CRSwNP in previous studies.
Even though the empirical p-value is slightly higher than 0.05 (empirical p-value 0.052) this study also implicates SLC5A1 as borderline significant. SLC5A1 is under-expressed in polyp tissue and also has an increased frequency of down-regulating alleles in our CRSwNP cases. The gene-product, solute carrier family 5 (sodium/glucose cotransporter) member 1 (SGLT1), is part of a family of sodium-dependent glucose transporters. Once again, this gene has not been associated with CRSwNP but one article suggests a positive substrate cross-regulation of SGLT1 and CFTR [46]. The CFTR gene is highly associated with cystic fibrosis, which often has CRSwNP as one of its clinical features [47]. Furthermore, a study from Varon et al. found an association between CRSwNP (without any other clinical features of cystic fibrosis) and mutations in the CFTR locus [48].
In order to reach a power level necessary for genome-wide analysis we decided to include all patients with CRSwNP regardless of any other diseases which could be associated to this condition. Another reason for this is that the only largely accepted subgrouping of CRS is the division into either CRSsNP or CRSwNP. This is in all likelihood a gross simplification of the pathophysiological and genetic mechanisms behind these conditions and CRS as defined by EPOS is probably a result of a large number of different sub-diseases, each with their own genetic and/or environmental background of which little is known at the present. In this study we did not exclude participants based on potential subgroups since there is still uncertainty surrounding a division into subgroups other than CRSsNP and CRSwNP and we chose to focus on the phenotype CRSwNP itself. Similarly, we chose not to record allergy history due to the controversy surrounding allergy as a possible association to CRSwNP [49]. However, two possible genetic subgroups of CRSwNP; CRSwNP with concomitant asthma and aspirin-exacerbated respiratory disease (AERD) warrant attention in our minds.
Since 51.5% of the participants with nasal polyps also had asthma it is possible that some of the associations could be due to an association with asthma. However, this is less likely since 17.3% of the subjects in the healthy group without polyps also had asthma. The high number of participants with both CRSwNP and asthma could have diluted the association analysis if their SNP-profiles differ significantly from subjects with CRSwNP but without asthma and potentially have made us miss markers of importance in the association analysis. This situation is likely to be countered by the gene enrichment and pathway analyses. HLCS, BICD2, and SLC5A1 have not been connected to asthma, VSIR was implicated with lung function decline in non-asthmatic patients in a genome-wide study published in 2012 but this association could not be confirmed by replication [50]. HLA-DRA is associated to asthma [51] but also, as mentioned above, associated to the presence of nasal polyps in a cohort of asthmatic patients [36].
AERD is another condition linked to CRSwNP, a meta-analysis published in 2015 found a large variation in the prevalence of AERD among patients with CRSwNP among the included studies, the overall prevalence was 9.7% [52]. One of the included studies is from the same geographical region as 96% of our test subjects and found that the prevalence of AERD among subjects with CRSwNP was 6/82 but the numbers were thought to be too small for any meaningful statistical analysis [53]. Although some of our participants probably suffer from AERD, the overall number is in all likelihood too small to influence the association analysis and post-GWAS analyses significantly. HLCS, BICD2, SLC5A1 and VSIR have not been associated with AERD in previous studies. HLA-DRA has not been linked to AERD but other HLA class II genes have [54], however, the same study that linked HLA-DRA to the presence of nasal polyps among asthmatic patients found two HLA-DRA polymorphisms to be potential markers for nasal polyp development in aspirin-tolerant asthma compared to the AERD subgroup [36].
With these caveats in mind, this study is the first of its kind. It is currently the only GWAS performed on CRSwNP and the only study that explores linkage and family-based genomewide association with regards to this condition. Despite the issue of accurate phenotyping discussed above, this study suggests four novel genes as potential targets of interest for future research as well as once again implicate HLA-DRA.

Conclusion
This study suggests that HLA-DRA as well as four additional genes; HLCS, VSIR, BICD2 and SLC5A1, which have not been previously identified as associated with chronic rhinosinusitis with nasal polyps, could be important for the development of this disease.