Evaluation of Replication of Variants Associated with Genetic Risk of Otitis Media

The first Genome Wide Association Study (GWAS) of otitis media (OM) found evidence of association in the Western Australian Pregnancy Cohort (Raine) study, but lacked replication in an independent OM population. The aim of this study was to investigate association at these loci in our family-based sample of chronic otitis media with effusion and recurrent otitis media (COME/ROM). Autosomal SNPs were selected from the Raine OM GWAS results. SNPs from the Raine cohort GWAS genotyped in our GWAS of COME/ROM had P-values ranging from P = 0.06–0.80. After removal of SNPs previously genotyped in our GWAS of COME/ROM (N = 21) and those that failed Fluidigm assay design (N = 1), 26 SNPs were successfully genotyped in 716 individuals from our COME/ROM family population. None of the SNP associations replicated in our family-based population (unadjusted P = 0.03–0.93). Replication in an independent sample would confirm that these represent novel OM loci, and that further investigation is warranted.


Introduction
Inflammation of the middle ear, known as otitis media (OM), is a highly prevalent disease in the pediatric population worldwide. Children at higher risk of OM may develop recurrent or chronic otitis media (COME/ROM), a condition that may lead to multiple antibiotic treatments and at least one tympanostomy tube insertion surgery. Environmental factors are known to play a role in risk of OM; however, it is also known that there is a large genetic component to OM risk.
The first family study of COME/ROM showed a higher rate of OM than general population rates [1], indicating familial aggregation of OM and potential genetic contribution to risk. Estimated heritability of OM (the proportion of risk attributable to additive genetic factors) has been shown to vary from 0.2 to 0.73 in various populations [2][3][4][5][6][7]. Although the estimated heritability of OM is high, there have been few studies to discover putative candidate loci or genes. Identification of OM candidate genes may provide new understanding of the etiology of OM, increase precision for disease prediction as well as provide novel targets for prevention and treatment. This impact could reduce the amount of antibiotics prescribed and the number of pediatric surgeries.
Currently, there are few studies that have the ability to study the genetics of OM. The current studies of OM include a linkage study from the University of Pittsburgh (UPitt) [8], a GWAS from the University of Western Australia (UWA) [9], and linkage and GWAS from the University of Minnesota (UMN) [10,11]. The first GWAS of OM was conducted in Western Australia using the Western Australian Pregnancy (Raine) cohort to determine variants associated with acute and chronic OM. The Raine cohort was a longitudinal birth cohort, and OM cases were defined if clinical exam indicated the presence of inflamed, retracted, or scarred tympanic membrane; middle ear effusion; or tympanostomy tube surgery in the first three years of life. Parental self-report of at least three episodes of acute otitis media (AOM) by three years could also determine affected status. The Raine cohort analysis found no genome-wide association, but did identify two genes (CAPN14 and GALNT14) as strong candidates from the GWAS, and the BPIFA cluster of genes in a gene-based analysis. The most significantly associated SNPs were followed up in UWA's family study of OM (WAFSOM). A total of 20 SNPs within seven genes (CAPN14, GALNT14, GALNT13, BMP5, NELL1, TGFB3, and BPIFA1) were genotyped. None of these were found to be significantly associated with OM in the WAFSOM population. Though they did not find replication in this study, many of the candidates have biological plausibility, including CAPN14, GALNT14, and the BPFIA gene cluster, and warrant further investigation in an independent population of OM.
The second GWAS of OM was conducted at the University of Virginia using the UMN family population to determine variants associated with COME/ROM [11]. Children were recruited to this study if they had tympanostomy tube surgery for COME/ ROM, and their family members were also recruited. Participants were considered affected if they had at least two pieces of positive evidence of COME/ROM from an ear examination from an ENT, a tympanometric test, medical record, and self-reported history [10]. As in the Raine cohort OM GWAS, no SNP exceeded genome-wide significance for association with COME/ ROM. The most significantly associated SNP in this GWAS was rs1110060 in Kinesin family member 7 (KIF7). Significant replication of rs10497394 on chromosome 2 was found in the University of Pittsburgh (UPitt) family population of OM. This SNP is found in an intergenic region between CDCA7 and SP3, and is thought to play a role in regulation by altering binding of a transcription factor, epigenetic mark, or lamina associated domain, though functional studies are needed to determine the causal nature of this region.
A hallmark of gene discovery is replication of results. Fortunately, there is strong collaboration between groups in the genetics of OM. Critical to replication is the coordinated clinical phenotyping of study populations, as a failure to replication could be due to inconsistent phenotypes used for recruitment and lack of power due to sample size. This ongoing collaboration in the genetics of OM has achieved replication of a SNP on chromosome 2 from UMN's GWAS in UPitt's family population of OM. With evidence of association in the Raine cohort GWAS of OM, it was logical to investigate replication of the most significant SNPs from the Raine GWAS in the UMN family population.

Ethics Statement
This study was conducted with Institutional Review Board approval at the University of Minnesota and the University of Virginia, and adhered to the tenets of the Declaration of Helsinki.

UMN Family-based Population
The UMN family-based population of COME/ROM has been described in detail previously [1,10]. Briefly, probands, children who had tympanostomy tube surgery for COME/ROM, and their family members were recruited. Affected status of family members was determined using four data sources including ear examination, tympanogram, medical records, and self-reported history. Individuals from 143 families with phenotypic data and DNA available were enrolled in genetic studies. The sample includes 44 families with 5-10 members, 55 families with four members, 36 trios, and 8 families with less than three members. Recruitment for the UMN family-based population had continued since the GWAS was conducted, so the study population increased in number for the Fluidigm genotyping project. The same recruitment strategy and affected status definition was used for the continued recruitment. Demographic information for the family-based population genotyped in this replication project is described in Table 1. After initial quality control measures, this sample includes 41 families with more than five members, 56 families with four members, 48 trios, and 13 families with less than three members.
Definition of OM in the Raine cohort GWAS was completed using clinical examination and parental self-report from the first three years of life. Cases were defined if they had presence of inflamed, retracted or scarred TM, MEE or tympanostomy tubes, and over half of cases were defined by parental report of $3 episodes of AOM by the age of 3 yrs.

SNP Identification for Replication
We identified 46 autosomal SNPs to follow up in this replication study. The Raine cohort GWAS reported a list of their top 25 statistically significant SNPs from their GWAS, and a list of the top 25 statistically significant SNPs from a subset of participants with full covariate data. Three SNPs overlapped these two lists and one was located on the X chromosome, resulting in a total of 46 autosomal SNPs.

Investigation of Replication: Data Mining and Genotyping
From these 46 SNPs, we searched our GWAS of COME/ROM data to determine if any associations had already tested. To investigate the remaining SNPs from the Raine GWAS results, we genotyped the SNPs in the UMN family population using the Fluidigm SNPtype assay platform (http://www.fluidigm.com/ snptype-assays.html). Briefly, 100 ng/uL of each DNA sample was combined with Biotium 26 Fast Probe Master Mix, SNPtype Sample Loading Reagent (Fluidigm), and the reference dye ROX (Invitrogen Inc.). Each SNP type assay was mixed with 26 Assay Loading Reagent (Fluidigm). Both sample mixes and assay mixes were then loaded onto Fluidigm 96.96 Dynamic Genotyping Arrays, and nanofluidic circuitry loads and mixes the 96 loci with 96 samples in 9216 reaction chambers. Thermocycling was performed on Fluidigm's Stand-alone Thermal Cycler and fluorescence detection performed on the EP1 genotyping system (Fluidigm). SNP rs2704219, which did not pass quality control measures, was re-genotyped after a pre-amplification step performed with the following thermocycling conditions: 95uC for 15 minutes, then 14 cycles of 5 seconds at 95uC and 4 minutes at 60uC. Pre-amplified DNA was then diluted 1:100 in suspension buffer and genotyped as previously described.

Quality Control Measures and Association Test
Quality control measures, association testing, and imputation methods used in the UMN GWAS of COME/ROM have been previously described [11]. Briefly, removal of SNPs was based upon filtering for poor genotype clusters, low minor allele frequency (MAF,0.01), and genotypes inconsistent with Hardy We report our OR and allele frequency using the Risk Allele corresponding to the Risk Allele listed in the Raine cohort GWAS [9]. *Indicates SNP is intergenic and therefore reports the nearest gene.

'
Indicates the SNP was a top genotyped SNP from a subset of Raine study participants with full covariate data. doi:10.1371/journal.pone.0104212.t002 Table 3. Top SNPs from Raine cohort GWAS that were genotyped in our family population of COME/ROM. We report our OR and allele frequency using the Risk Allele corresponding to the Risk Allele listed in the Raine cohort GWAS [9]. *Indicates SNP is intergenic and therefore reports the nearest gene. **Indicates SNP is coding.

'
Indicates the SNP was a top genotyped SNP from a subset of Raine study participants with full covariate data. doi:10.1371/journal.pone.0104212.t003 Weinberg proportions (P,10 25 ). Samples and SNPs with excessive Mendelian errors (.0.6%) and/or low genotype call rates (,95%) were removed. We used the imputation method implemented in the software package MACH using HapMap3 with the CEU reference population [12,13]. This method uses Markov models to identify stretches of shared chromosomes between individuals, and then to infer intervening genotypes by contrasting study samples with densely typed HapMap samples. Genotype data from these 26 SNPs was analyzed in 596 individuals using the nuclear family-based Transmission Disequilibrium Test (TDT) [14]. P-values reported in Tables 2-3 are reported without adjustment for multiple comparisons.
Power was calculated using the TDT Power Calculator [15], varying SNP MAF and genetic relative risk (GRR). We assume the number of family members is 164 and Bonferonni correction for 26 markers corresponding to nominal error rate of a* = 0.05/ 26 = 0.0019 in all power calculations.

Results and Discussion
Demographic data on the UMN family-based population genotyped in this replication project and the Raine cohort are shown in Table 1. Initially, we searched our GWAS data for any SNPs that were already genotyped. We found no significant associations from these 21 SNPs ( Table 2) with P-values ranging from P = 0.06 (rs4512966 in COL4A2 and rs2839520, intergenic near UBASH3A) to P = 0.80 (rs4575213 in PRKG1). To investigate the remaining SNPs from the Raine GWAS results, we genotyped the SNPs (N = 27) in the UMN family population using the Fluidigm SNPtype assay platform, including one SNP (rs10242197) which had been genotyped in our GWAS for quality control purposes. One SNP (rs10776851 in CAMSAP1) did not pass Fluidigm assay design and one SNP (rs2704219 in ALDH1A2), did not pass quality control measures; however, rs2704219 was successfully re-genotyped using a different Fluidigm protocol to preamplify samples before genotyping. A total of 26 SNPs were genotyped in 596 subjects and available for analyses.
Data was analyzed using the family-based association test TDT [14]. No significant associations with any SNP with OM status were found after Bonferroni correction (significance threshold at P = 0.05/46 = 0.001) ( Table 3), with nominal unadjusted Pvalues ranging from P = 0.03 (rs11097383 in GRID2) to P = 0.93 (rs330787 in FBXO11 and rs9911978 in RPTOR). Comparing the OR observed in our GWAS versus the Raine cohort GWAS, we observed 19/46 SNPs (41.3%) with the same direction of effect. The failure to replicate the Raine cohort OM associations may be due lack of power to detect association with SNPs of modest effect, differences in phenotypic criteria and/or differences in study design. Using our family-based population, we have approximately 80% power to detect associations with SNPs with effect OR$2 when MAF ranges from 0.2 to 0.5 (Table 4). Also, inclusion criteria for case classification in the Raine cohort represent a less severe end of the OM spectrum than those used in the UMN family population. Lastly, differences in study designs, i.e. family-based vs. case-control, affect detection power, and may also reflect differences in underlying genetic architecture. Notably, associations with OM found using the nested case-control design of the Raine cohort were unable to be replicated in another family-based study, WAFSOM [9].
Due to the complexity of this disease, it is important that study sites use comparable definitions of affected status so that replication studies and meta-analyses can be successfully conducted. Though this is a significant task to achieve, the International Consortium of the Genetics of Otitis Media (OTIGEN) is striving to standardize inclusion criteria for OM studies to better understand the genetic components increasing risk for OM. Additionally, for these population studies to be more informative, recruitment of diverse populations is necessary. The majority of OM population studies thus far have recruited participants with European ancestry, but diverse populations will enable researchers to narrow regions of linkage disequilibrium (LD) containing the underlying causal variants. Another issue which causes these studies to be more difficult is that OM-free controls are very difficult to find since most children will have at least one episode of OM in the first three years of life. Trio and family studies could provide a solution to the problem of OM control availability and misclassification.
In this study, we attempted to replicate associations from the Raine cohort GWAS of OM in our family population of COME/ ROM recruited at the University of Minnesota. Though we did not find any of these SNPs to be significant in our population, this study illustrates the need for larger, diverse populations recruited with standardized inclusion criteria for cases.