Testing an optimally weighted combination of common and/or rare variants with multiple traits

Zhenchuan Wang; Qiuying Sha; Shurong Fang; Kui Zhang; Shuanglin Zhang

doi:10.1371/journal.pone.0201186

Abstract

Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.

Citation: Wang Z, Sha Q, Fang S, Zhang K, Zhang S (2018) Testing an optimally weighted combination of common and/or rare variants with multiple traits. PLoS ONE 13(7): e0201186. https://doi.org/10.1371/journal.pone.0201186

Editor: Qizhai Li, University of the Chinese Academy of Sciences, CHINA

Received: March 25, 2018; Accepted: July 10, 2018; Published: July 26, 2018

Copyright: © 2018 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number R15HG008209 to QS. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org). This research used data generated by the COPDGene study, which was supported by NIH grants U01HL089856 and U01HL089897. The COPDGene project is also supported by the COPD Foundation through contributions made by an Industry Advisory Board comprised of Pfizer, AstraZeneca, Boehringer Ingelheim, Novartis, and Sunovion. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introductions

Many large cohort studies collected many correlated traits that can reflect underlying mechanism of complex diseases. For example, the UK10K cohort study collected 64 correlated phenotypic traits [1]. Usually, complex diseases are characterized by multiple endophenotypes. For example, hypertension can be characterized by systolic and diastolic blood pressure [2]; metabolic syndrome is evaluated by four component traits: high-density lipoprotein (HDL) cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity, and diastolic blood pressure [3]; and schizophrenia can be diagnosed by eight neurocognitive domains [4]. Multiple correlated traits can be influenced by a gene simultaneously. Therefore, by joint analysis of multiple traits, we can not only gain more statistical power to detect pleiotropic variants [5–12], but also can better understand the genetic architecture of the disease of interest [13].

Several statistical methods have been developed to test the association between multiple traits and a single common variant. These methods can be roughly divided into three groups: dimension reduction methods [10, 13–15], regression methods [16–18], and combining test statistics from univariate analysis [9, 19–23]. However, due to the allelic heterogeneity and the extreme rarity of rare variants, the methods by analyzing one variant at a time for common variant association studies may not be ideal for rare variant association studies [24]. Recent genetic association studies show that complex diseases are affected by both common and rare variants [25–31]. Next-generation sequencing technology allows sequencing of the whole genome of large number of individuals, and makes rare variant association studies viable [32, 33]. Currently, statistical methods for rare variant association studies with a single trait have been developed. These methods summarize genotype information from multiple rare variants and can be divided into three groups: burden tests [24, 34–37], quadratic tests [38–41], and combined tests [42–45].

As we pointed out above, it is essential to develop statistical methods to test the association between multiple traits and multiple variants (common and/or rare variants). Very recently, a few statistical methods for this purpose are appeared [11, 46–50]. Casale et al. [47] proposed a set-based association test based on the linear mixed-model. This method enables jointly analyzing multiple correlated traits in rare variant association studies while accounting for population structure and relatedness. Wang et al. [11] proposed a multivariate functional linear model approach to test association between multiple traits and rare variants in a genomic region. In this approach, the genetic effects of variants are treated as smooth functions of genomic positions of these variants. Gene association with multiple traits (GAMuT) proposed by Broadaway et al. [46] provide a nonparametric test of independence between a set of traits and a set of genetic variants. This method compares the similarities of multiple traits with the similarities of genotypes at variants in a genomic region. Multivariate Rare-Variant Association Test (MURAT) proposed by Sun et al. [48] tests association between multiple correlated quantitative traits and a set of rare variants based on a linear mixed model. This method assumes that the effects of the variants follow a multivariate normal distribution with a zero mean and a specific covariance structure. Wu and Pankow [50] extended the commonly used sequence kernel association test (SKAT) [40] for a single trait to multiple traits and proposed multiple sequence kernel association test (MSKAT). Wang et al. [11] proposed an adaptive weighting reverse regression (AWRR) method. This method uses the score test based on the reverse regression, in which the summation of adaptively weighted genotypes is treated as the response variable and multiple traits are treated as independent variables.

In this article, we developed a new statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is based on the score test under a linear model, in which the weighted combination of variants is treated as the response variable and multiple traits including covariates are treated as independent variables. The statistic of TOWmuT is the maximum of the score test statistic over weights. The weights at which the score test statistic reaches its maximum are called the optimal weights. TOWmuT is applicable to different types of traits and can include covariates. Using extensive simulation studies, we compared the performance of TOWmuT with single-TOW [39], GAMuT [46], MSKAT [50], AWRR [11] and MANOVA [7]. Our results showed that, in all the simulation scenarios, TOWmuT is either the most powerful test or comparable to the most powerful test among the six tests. We also illustrated the usefulness of TOWmuT by analyzing a real whole-genome genotyping data from a lung function study.

Methods

We consider a sample with n unrelated individuals. Each individual has K potentially correlated quantitative or qualitative traits (1 for cases and 0 for controls for a qualitative trait) and has been genotyped at M variants in a genomic region. Let denote the k^th trait value of the i^th individual and denote the genotype score of the i^th individual at the m^th variant, where is the number of minor alleles that the i^th individual carries at the m^th variant. We first centralize and as and , where and . Let Y_i =(y_i1,…,y_iK)^T, X_i =(x_i1,…,x_iM)^T, Y =(Y₁,…,Y_n)^T, and X =(X₁,…,X_n)^T. For the i^th individual, we consider a linear combination of the variants , where w =(w₁,…,w_M)^T are weights and their values will be decided later.

Without covariates

We first describe our method without covariates. Consider the linear model (1) The score test statistic to test the null hypothesis H₀:β₁ = ⋯ = β_K = 0 is given by (2) where , , and .

To simplify the computation of Eq (2), we replace X^TX/n with the diagonal of X^TX/n and let A = diag(X^TX/n). This simplification was also used in the past by Pan [51] and Sha et al. [39]. Then σ² becomes and T_score becomes . We define the test statistic of TOWmuT as (3) Let W = A^1/2w, then , where λ_max(•) indicates the largest eigenvalue of a matrix. Let W⁰ denote the eigenvector of A^−1/2X^TY(Y^TY)⁻¹Y^TXA^−1/2 corresponding to the largest eigenvalue, then w⁰ = A^−1/2W⁰ is the optimal weights. Actually, we do not need to calculate w⁰ in order to calculate T_TOWmuT. If we let C = XA⁻¹X^T, then (4)

We use a permutation test to evaluate the p-value of T_TOWmuT. In details, we randomly shuffle the traits in each permutation. Note that C and (Y^TY)⁻¹ do not change in each permutation. Suppose that we perform B times of permutations. Let denote the value of T_TOWmuT based on the b^th permuted data, where b = 0 represents the original data. Then, the p-value of T_TOWmuT is given by (5)

With covariates

Assume that there are p covariates and z_i1,…z_ip denote the p covariates of the i^th individual. Consider the linear model (6) In the appendix, we showed that under model (6), the score test statistic with covariates to test the null hypothesis H₀:β₁ = ⋯ = β_K = 0 is given by (7) where , , , , , and denote the residuals of y_ik and x_im under (8) We can see the score test statistic with covariates (9) That is, replacing y_ik and x_im by their residuals and in the score test statistic without covariates T_score, it becomes the score test statistic with covariates .

Therefore, we define TOWmuT statistic with covariates as (10)

In summary, to apply TOWmuT with covariates, we adjust both trait value y_ik and genotypic score x_im for the covariates by applying linear regressions in (8) and apply TOWmuT without covariates to the residuals and .

Comparison of methods

We compare the performance of our proposed method with the following methods: Multivariate Analysis of Variance (MANOVA) [9], MSKAT [50], GAMuT [46], AWRR [11] and single-TOW [39]. In the following, we briefly introduce each of those methods using the notations in the method section.

MANOVA: Consider a multivariate multiple linear regression model: Y = Xβ+ε, where Y denotes the n×K matrix of phenotypes; X denotes the n×M matrix of genotypes; β is a M×K matrix of coefficients; ε is the n×K matrix of random errors with each row of ε to be i.i.d. MVN(0,Σ), where Σ is the covariance matrix of ε. To test H₀:β = 0, the likelihood ratio test is equivalent to the Wilk’s Lambda test statistic of MANOVA, that is, . Here Λ denote the ratio of the likelihood function under H₀ to the likelihood function under H₁, l(β,Σ) is the log-likelihood function, and , where is the maximum likelihood estimator (MLE) of β, and |•| denotes the determinant of a matrix. The test statistic has an asymptotic distribution.

MSKAT: MSKAT extends the commonly used SKAT [40] for single trait analysis to test for the joint association of rare variant set with multiple continuous traits.

GAMuT: GAMuT compares the similarity in multivariate phenotypes to the similarity in rare-variant genotypes in a genomic region by a machine-learning framework called kernel distance covariance.

AWRR: by collapsing genotypes using adaptive weights, AWRR uses the score test to test association based on the reverse regression, in which collapsed genotypes are treated as the response variable and multiple traits are treated as independent variables.

Single-TOW: Let denote the test statistic of TOW to test the association between the kth trait and the genotypes at the variants in a genomic region. The test statistic of single-TOW is given by T_single–TOW = min_1≤k≤K p_k, where p_k is the p-value of for k = 1,…,K. The p-value of T_single–TOW is estimated using a permutation procedure.

Simulations

In our simulation studies, we use the empirical Mini-Exome genotype data provided by the genetic analysis workshop 17 (GAW17) to generate genotypes. This dataset contains genotypes of 697 unrelated individuals on 3205 genes. Same as the simulation studies in Sha et al. [39] and Fang et al. [52], we choose four genes in the empirical Mini-Exome genotype data. These four genes are ELAVL4 (gene1), MSH4 (gene2), PDE4B (gene3), and ADAMTS4 (gene4). Each gene contains 10, 20, 30, and 40 variants, respectively. Then, we merge the four genes to form a super gene (Sgene) with 100 variants. We generate genotypes based on the genotypes of 697 individuals in the Sgene since the distribution of the minor allele frequencies (MAFs) in the Sgene are similar to the distribution of MAFs in all of the 3205 genes (Figure A in S1 File). To generate a qualitative trait, we use a liability threshold model based on a quantitative trait [44]. An individual is classified as affected if the individual’s trait is at least one standard deviation larger than the mean of the trait. This leads to a prevalence of 16% for the simulated disease in the general population. In the following, we only describe how to generate a quantitative trait.

We assume that all causal variants are rare (MAF < 0.01). We randomly choose n_c rare variants as causal variants, where n_c is determined by the percentage of causal variants among rare variants. We use n_r and n_p to denote the number of risk rare variants and protective rare variants, respectively, where n_r + n_p = n_c. Let and denote the genotypic scores of the q^th risk rare variant and the j^th protective rare variant for the i^th individual, respectively. We assume that genotypes impact on L traits. Let h and h_l denote the heritability of all the n_c rare causal variants for the L traits and the l^th trait among the L traits, respectively. We generate L random numbers t₁,…,t_L from a uniform distribution between 0 and 1. Then, the heritability of l^th trait among the L traits is . Given the heritability of the l^th trait h_l, we generate n_c random numbers from a uniform distribution between 0 and 1. The heritability of the m^th causal variant for the l^th trait is given by .

In our simulation studies, we consider two covariates Z₁ and Z₂, where Z₁ is a continuous covariate generated from a standard normal distribution, and Z₂ is a binary covariate taking values 0 and 1 with a probability of 0.5. We generate K traits by considering the factor model [10, 13, 21] (11) where y = (y₁,…,y_K)^T; e = (1,…,1)^T, λ = (λ₁,…,λ_K) is the vector involved genotypes; f = (f₁,…,f_R)^T ~ MVN(0,Σ), Σ = (1−ρ)I + ρA, A is a matrix with elements of 1, I is the identity matrix, and ρ is the correlation between f_i and f_j; R is the number of factors; γ is a K by R matrix; c is a constant number; ε = (ε₁,…,ε_K)^T is a vector of residuals; and ε₁,…,ε_K are independent, ε_k ~ N(0,1) for k = 1,…,K.

As in Wang et al. [10], we consider the following six models with different number of factors and different number of traits affected by genotypes. In these models, the within-factor correlation is c² and the between-factor correlation is ρ₁ = ρc².

Model 1: There is only one factor and genotypes impact on 6 traits with the same effect size. This is equivalent to set R = 1 and γ = (1,…,1)^T. In details,

(12)

Model 2: There are five factors and genotypes impact on 6 traits. We set R = 5 and γ = diag(D₁,D₂,D₃,D₄,D₅), where for i = 1,…,5. In details,

(13)

Model 3: There are two factors and genotypes impact on 6 traits. That is, R = 2 and γ = diag(D₁,D₂), where for i = 1,2. In details,

(14)

Model 4: There are five factors and genotypes impact on one trait. That is, R = 5 and γ = diag(D₁,D₂,D₃,D₄,D₅), where for i = 1,…,5. In details,

(15)

Model 5: There are only two factors and genotypes impact on one trait. That is, R = 2 and γ = diag(D₁,D₂), where for i = 1,2. In details,

(16)

Model 6: There is K factors and genotypes impact on 6 traits. That is, R = K, γ = I, and c = 1. In details,

(17)

Results

To evaluate the type I error rates of the proposed test TOWmuT, we set λ_k = 0 for k = 1,…,K in all of the 6 models. We consider different models, different sample sizes, different significance levels, and different types of traits. In our simulations we consider 10 traits (K = 10). In each simulation scenario, we estimate the p-values of TOWmuT using 1000 permutations and evaluate the type I error rates of TOWmuT using 10,000 replicated samples. For 10,000 replicated samples, the 95% confidence interval (CI) for the estimated type I error rates of nominal level 0.05 is (0.046, 0.054) and the 95% CI at the nominal level of 0.01 is (0.008, 0.012). Tables 1 and 2 summarize the estimated type I error rates of TOWmuT. From these two tables, we can see that 70 out of 72 (greater than 95%) estimated type I error rates are within the 95% CIs and the two estimated type I error rates not within the 95% CIs (0.05555 and 0.01295) are very close to the bound of the corresponding 95% CI, which indicates that TOWmuT is valid.

Download:

Table 1. The estimated type I error rates of TOWmuT for 10 quantitative traits under each model with covariates.

https://doi.org/10.1371/journal.pone.0201186.t001

Download:

Table 2. The estimated type I error rates of TOWmuT for the mixture of five quantitative traits and five qualitative traits under each model with covariates.

https://doi.org/10.1371/journal.pone.0201186.t002

For power comparisons, we consider different models, different types of traits, different percentages of protective variants, different values of heritability, different values of between-factor correlation, and different values of within-factor correlation. In each of the simulation scenarios, we estimate the p-values of TOWmuT, AWRR and single-TOW using 1,000 permutations and we estimate the p-values of MANOVA, GAMuT, and MSKAT using their asymptotic distributions. We evaluate the powers of all of the six tests using 1,000 replicated samples at a significance level of 0.05.

Fig 1 gives the power comparisons of the six tests (Single-TOW, MSKAT, AWRR, MANOVA, GAMuT, and TOWmuT) for the power as a function of the total heritability based on the six models for 10 quantitative traits. This figure shows that (1) TOWmuT is consistently the most powerful one among the six tests; (2) MANOVA is the second most powerful when genotypes impact on multiple traits (models 1–3 and 6) while AWRR is the second most powerful when genotypes impact on a single trait (models 4–5); (3) MSKAT is consistently less powerful than other multivariate tests probably because SKAT gives larger weights than that of TOW to only those variants with MAF in the range (0.01,0.035) and there are only 8% variants with MAF in the range (0.01,0.035) in Sgene which our simulations are based on; and (4) MSKAT and GAMuT have similar powers in all six models.

Download:

Fig 1. Power comparisons of the six tests (Single-TOW, MSKAT, AWRR, MANOVA, GAMuT and TOWmuT) for the power as a function of total heritability for 10 quantitative traits with covariates.

The sample size is 1000. The between-factor correlation is 0.3 and the within-factor correlation is 0.7. The percentage of the causal variants is 0.2. All causal variants are risk variants.

https://doi.org/10.1371/journal.pone.0201186.g001

Fig 2 gives the power comparisons of the five tests (Single-TOW, AWRR, MSKAT, GAMuT, and TOWmuT) for the power as a function of the total heritability for the mixture of 5 quantitative traits and 5 qualitative traits. We only compare the powers of five tests because MANOVA has inflated type I error rate in this case. This figure shows that (1) TOWmuT is consistently the most powerful one among the five tests; (2) AWRR is second most powerful when genotypes impact on multiple traits (models 1–3 and 6) while MSKAT and GAMuT are second most powerful when genotypes impact on a single trait (models 4–5); (3) MSKAT and GAMuT have similar powers in all six models; and (4) single-TOW is consistently less powerful than other four multivariate tests because we keep correlations between traits similar to that in Fig 1 such that correlations between original quantitative traits are larger than that in Fig 1.

Download:

Fig 2. Power comparisons of the five tests (Single-TOW, AWRR, GAMuT, MSKAT and TOWmuT) for the power as a function of heritability for the mixture of half quantitative traits and half qualitative traits with covariates.

The sample size is 1000. Covariance matrix of 10 traits is similar to that of 10 quantitative traits with between-factor correlation being 0.3 and the within-factor correlation being 0.7. The percentage of the causal variants is 0.2. All causal variants are risk variants.

https://doi.org/10.1371/journal.pone.0201186.g002

We also compare the powers of the six tests for the power as a function of the within-factor correlation for models 1–5 and between-factor correlation for model 6 for 10 quantitative traits (Figure B in S1 File). As shown in this figure, the power of single-TOW is robust to the between-factor correlation or the within-factor correlation since the minimum p-value-based approach is largely unaffected by the trait correlation [50]. However, with the increasing of the between-factor correlation or within-factor correlation, the power of other five tests essentially increases. Other patterns of the power comparisons are similar to those of in Fig 1.

Power comparisons of the six tests for the power as a function of the percentage of protective variants for 10 quantitative traits are given by Figure C in S1 File. This figure shows that the power of all six tests are robust to the percentage of protective variants, therefore, all of these methods are robust to the directions of the genetic effects. Other patterns of the power comparisons are similar to those of in Fig 1.

Application to the COPDGene

Chronic obstructive pulmonary disease (COPD) is a common disease in elderly patients that causes significant morbidity and mortality [53]. The Genetic Epidemiology of COPD Study (COPDGene) [54] was designed to identify genetic factors associated with COPD. In this COPDGene study, a total of more than 10,000 subjects have been enrolled including 2/3 non-Hispanic Whites (NHW) and 1/3 African-Americans (AA). In this analysis, we only include 5,430 NHW with no missing phenotypes. Each of the 5,430 NHW has been genotyped at 630,860 SNPs. Based on the literature studies of COPD [9, 55, 56], we chose BMI, Age, Pack-Years (PackYear) and Sex as covariates and selected seven quantitative COPD-related phenotypes. These seven phenotypes are FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), and Six-minute walk distance (6MWD) [9]. The correlation structure of the seven COPD-related phenotypes is given in Figure D in S1 File.

To evaluate the performance of our proposed method on a real data set, we applied six methods (TOWmuT, MANOVA, MSKAT, GAMuT, AWRR, and single-TOW) to the COPDGene of NHW population to test the association between each of 50-SNP blocks and the seven quantitative COPD-related phenotypes. To identify significant 50-SNP blocks associated with the phenotypes, we used Bonferroni correction to decide the significance level. The total number of 50-SNP blocks is 12617, therefore, the Bonferroni corrected significance level is 0.05/12617 ≈ 4×10⁻⁶. Table 3 summarized the significant blocks identified by at least one method. There were total six significant blocks in Table 3. All of the six blocks have been previously reported to be in association with COPD or lung functions [57–60]. PDSS1 and ABI1 are located between LOC107984176 and LOC105376467, which are Intergenic regions and contain the SNPs associated with pulmonary function [60, 61]. From Table 3, we can see that TOWmuT identified four blocks; AWRR identified two blocks; MANOVA, MSKAT and GAMuT identified one block; single-TOW did not identify any blocks. From these results, we can see that TOWmuT identified the most of significant 50-SNP blocks among the six methods, which is consistent with the results of our simulation studies.

Download:

Table 3. Significant blocks identified by at least one method (p-values less than 4×10⁻⁶) and the corresponding p-values in the analysis of COPDGene.

https://doi.org/10.1371/journal.pone.0201186.t003

Discussion

In this article, we developed TOWmuT to perform joint analysis of multiple traits in gene-based association studies. The motivations to develop this method are based on the following: (1) for complex diseases, multiple correlated traits are usually measured in genetic association studies; (2) there is increasing evidence demonstrating that pleiotropy is a widespread phenomenon in complex diseases [5]; and (3) there is a shortage of gene-based approaches for multiple traits. We used extensive simulation studies to compare the performance of TOWmuT with MANOVA, MSKAT, AWRR, GAMuT and Single-TOW. Our simulation results showed that TOWmuT has correct type I error rates and is consistently more powerful than other five methods we compared. Furthermore, the results from real data analysis showed that the proposed method has great potential in gene-based association study for complex diseases with multiple phenotypes such as COPD.

Recently, it has become a major focus of investigation to identify a small number of rare causal variants that contribute to complex diseases [62]. Several methods to pinpoint the causal variants have been developed for testing the association with a single trait. These methods include backward elimination (BE) method [63], hierarchical model method [63], and adaptive combination of p-values method [64]. To extend the TOWmuT method to identify a small number of causal variants which are associated with multiple traits, we can use the BE method. In each step, we remove one variant that has the smallest contribution to the association between multiple traits and the set of variants and then we evaluate the p-value for testing association between multiple traits and the remaining variants by TOWmuT. Causal variants are the set of variants corresponding to the smallest p-value.

The computation time required for running TOWmuT depends on the number of traits, the sample size, the number of permutations, and the number of variants in a genomic region. The running time of TOWmuT with 1000 permutations on a data set with 5000 individuals, seven traits, and 10 variants in a genomic region on a laptop with 4 Intel Cores @ 3.30GHz and 4 GB memory is about 0.14s. To perform real data analysis at a genome-wide level, we can first select genomic regions that show evidence of association based on a small number of permutations (e.g. 1,000), and then use a large number of permutations to test the selected regions.

Appendix

We use the same notations in the method section. Let Y = (Y_1,…,Y_n)^T, Z_i = (1z_i1,…,z_ip)^T, Z = (Z_1,…,Z_n)^T, and x = (x_1,…,x_n)^T. Under the linear model (18) the log-likelihood (up to a constant) is given by (19) where α = (α₀,…,α_p)^T, β = (β₁,…,β_K)^T, and ε₁,…,ε_n are independent and ε_i ~ N(0,σ²). Then, (20) (21) Let and denote the maximum likelihood estimates of α and σ² under null hypothesis H₀:β = 0. Then, and , where P = Z(Z^TZ)⁻¹Z^T. Let θ = (α^T,β^T)^T. The score and information matrix are and , where U = Y^T(I−P)x = Y^T(I−P)Xw. The score test statistic is given by (22) where V = Y^T(I−P)Y. Note that (I−P)² = I−P. We have , , , and , where and is the residual of x_im under the linear regression model (8); and is the residual of y_ik under the linear regression model (8). Therefore, (23)

Supporting information

S1 File. Supplementary information.

https://doi.org/10.1371/journal.pone.0201186.s001

(PDF)

Acknowledgments

The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org).

This research used data generated by the COPDGene study, which was supported by NIH grants U01HL089856 and U01HL089897. The COPDGene project is also supported by the COPD Foundation through contributions made by an Industry Advisory Board comprised of Pfizer, AstraZeneca, Boehringer Ingelheim, Novartis, and Sunovion.

Superior, a high-performance computing infrastructure at Michigan Technological University, was used in obtaining results presented in this publication.

References

1. The UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526(7571):82–90. pmid:26367797.
- View Article
- PubMed/NCBI
- Google Scholar
2. Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, et al. Genome-wide association study identifies eight loci associated with blood pressure. Nature genetics. 2009;41(6):666–76. pmid:19430483.
- View Article
- PubMed/NCBI
- Google Scholar
3. Zabaneh D, Balding DJ. A genome-wide association study of the metabolic syndrome in Indian Asian men. PloS one. 2010;5(8):e11961. pmid:20694148.
- View Article
- PubMed/NCBI
- Google Scholar
4. Gur RE, Nimgaonkar VL, Almasy L, Calkins ME, Ragland JD, Pogue-Geile MF, et al. Neurocognitive endophenotypes in a multiplex multigenerational family study of schizophrenia. Am J Psychiatry. 2007;164(5):813–9. pmid:17475741.
- View Article
- PubMed/NCBI
- Google Scholar
5. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. pmid:23752797.
- View Article
- PubMed/NCBI
- Google Scholar
6. Stephens M. A unified framework for association analysis with multiple related phenotypes. PloS one. 2013;8(7):e65245. pmid:23861737.
- View Article
- PubMed/NCBI
- Google Scholar
7. Yang Q, Wang Y. Methods for analyzing multivariate phenotypes in genetic association studies. J Probab Stat. 2012;2012:652569. pmid:24748889.
- View Article
- PubMed/NCBI
- Google Scholar
8. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods. 2014;11(4):407–9. pmid:24531419.
- View Article
- PubMed/NCBI
- Google Scholar
9. Liang X, Wang Z, Sha Q, Zhang S. An Adaptive Fisher's Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Sci Rep. 2016;6:34323. pmid:27694844.
- View Article
- PubMed/NCBI
- Google Scholar
10. Wang Z, Sha Q, Zhang S. Joint Analysis of Multiple Traits Using "Optimal" Maximum Heritability Test. PloS one. 2016;11(3):e0150975. pmid:26950849.
- View Article
- PubMed/NCBI
- Google Scholar
11. Wang Z, Wang X, Sha Q, Zhang S. Joint Analysis of Multiple Traits in Rare Variant Association Studies. Annals of human genetics. 2016;80(3):162–71. pmid:26990300.
- View Article
- PubMed/NCBI
- Google Scholar
12. Zhu H, Zhang S, Sha Q. Power Comparisons of Methods for Joint Association Analysis of Multiple Phenotypes. Hum Hered. 2015;80(3):144–52. pmid:27344597.
- View Article
- PubMed/NCBI
- Google Scholar
13. Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet. 2014;94(5):662–76. pmid:24746957.
- View Article
- PubMed/NCBI
- Google Scholar
14. Ferreira MA, Purcell SM. A multivariate test of association. Bioinformatics. 2009;25(1):132–3. pmid:19019849.
- View Article
- PubMed/NCBI
- Google Scholar
15. Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genetic epidemiology. 2008;32(1):9–19. pmid:17922480.
- View Article
- PubMed/NCBI
- Google Scholar
16. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature genetics. 2012;44(9):1066–71. pmid:22902788.
- View Article
- PubMed/NCBI
- Google Scholar
17. O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one. 2012;7(5):e34861. pmid:22567092.
- View Article
- PubMed/NCBI
- Google Scholar
18. Zhang Y, Xu Z, Shen X, Pan W, Alzheimer's Disease Neuroimaging I. Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage. 2014;96:309–25. pmid:24704269.
- View Article
- PubMed/NCBI
- Google Scholar
19. O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40(4):1079–87. pmid:6534410.
- View Article
- PubMed/NCBI
- Google Scholar
20. Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic epidemiology. 2010;34(5):444–54. pmid:20583287.
- View Article
- PubMed/NCBI
- Google Scholar
21. van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235. pmid:23359524.
- View Article
- PubMed/NCBI
- Google Scholar
22. Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genetic epidemiology. 2015. pmid:26493956.
- View Article
- PubMed/NCBI
- Google Scholar
23. Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N, et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet. 2015;96(1):21–36. pmid:25500260.
- View Article
- PubMed/NCBI
- Google Scholar
24. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21. pmid:18691683.
- View Article
- PubMed/NCBI
- Google Scholar
25. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. pmid:18509313.
- View Article
- PubMed/NCBI
- Google Scholar
26. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nature genetics. 2010;42(4):348–54. pmid:20208533.
- View Article
- PubMed/NCBI
- Google Scholar
27. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69(1):124–37. pmid:11404818.
- View Article
- PubMed/NCBI
- Google Scholar
28. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant …or not? Hum Mol Genet. 2002;11(20):2417–23. pmid:12351577.
- View Article
- PubMed/NCBI
- Google Scholar
29. Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nat Genet. 2008;40(1):17–22. pmid:18163131.
- View Article
- PubMed/NCBI
- Google Scholar
30. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19(R2):R145–51. pmid:20705737.
- View Article
- PubMed/NCBI
- Google Scholar
31. Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11(2):103–5. pmid:17292821.
- View Article
- PubMed/NCBI
- Google Scholar
32. Andres AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genetic epidemiology. 2007;31(7):659–71. pmid:17922479.
- View Article
- PubMed/NCBI
- Google Scholar
33. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11(1):31–46. pmid:19997069.
- View Article
- PubMed/NCBI
- Google Scholar
34. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5(2):e1000384. pmid:19214210.
- View Article
- PubMed/NCBI
- Google Scholar
35. Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2007;615(1–2):28–56. pmid:17101154.
- View Article
- PubMed/NCBI
- Google Scholar
36. Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86(6):832–8. pmid:20471002.
- View Article
- PubMed/NCBI
- Google Scholar
37. Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010;87(5):604–17. pmid:21070896.
- View Article
- PubMed/NCBI
- Google Scholar
38. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an unusual distribution of rare variants. PLoS Genet. 2011;7(3):e1001322. pmid:21408211.
- View Article
- PubMed/NCBI
- Google Scholar
39. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genetic epidemiology. 2012;36(6):561–71. pmid:22714994.
- View Article
- PubMed/NCBI
- Google Scholar
40. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. pmid:21737059.
- View Article
- PubMed/NCBI
- Google Scholar
41. Yang X, Wang S, Zhang S, Sha Q. Detecting association of rare and common variants based on cross-validation prediction error. Genetic epidemiology. 2017;41(3):233–43. pmid:28176359.
- View Article
- PubMed/NCBI
- Google Scholar
42. Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genetic epidemiology. 2013;37(1):110–21. pmid:23032573.
- View Article
- PubMed/NCBI
- Google Scholar
43. Lee S, Teslovich TM, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet. 2013;93(1):42–53. pmid:23768515.
- View Article
- PubMed/NCBI
- Google Scholar
44. Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genetic epidemiology. 2014;38(6):494–501. pmid:25065727.
- View Article
- PubMed/NCBI
- Google Scholar
45. Greco B, Hainline A, Arbet J, Grinde K, Benitez A, Tintle N. A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures. Eur J Hum Genet. 2015. pmid:26508571
- View Article
- PubMed/NCBI
- Google Scholar
46. Broadaway KA, Cutler DJ, Duncan R, Moore JL, Ware EB, Jhun MA, et al. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants. Am J Hum Genet. 2016;98(3):525–40. pmid:26942286.
- View Article
- PubMed/NCBI
- Google Scholar
47. Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nature methods. 2015;12(8):755–8. pmid:26076425.
- View Article
- PubMed/NCBI
- Google Scholar
48. Sun J, Oualkacha K, Forgetta V, Zheng HF, Brent Richards J, Ciampi A, et al. A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects. Eur J Hum Genet. 2016. pmid:26860061.
- View Article
- PubMed/NCBI
- Google Scholar
49. Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, et al. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genetic epidemiology. 2015;39(4):259–75. pmid:25809955.
- View Article
- PubMed/NCBI
- Google Scholar
50. Wu B, Pankow JS. Sequence Kernel Association Test of Multiple Continuous Phenotypes. Genetic epidemiology. 2016;40(2):91–100. pmid:26782911.
- View Article
- PubMed/NCBI
- Google Scholar
51. Pan W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic epidemiology. 2009;33(6):497–507. pmid:19170135.
- View Article
- PubMed/NCBI
- Google Scholar
52. Fang S, Zhang S, Sha Q. Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Annals of human genetics. 2013;77(6):524–34. Epub 2013/08/24. pmid:23968488.
- View Article
- PubMed/NCBI
- Google Scholar
53. Nazir SA, Erbland ML. Chronic obstructive pulmonary disease: an update on diagnosis and management issues in older adults. Drugs Aging. 2009;26(10):813–31. pmid:19761275.
- View Article
- PubMed/NCBI
- Google Scholar
54. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD. 2010;7(1):32–43. pmid:20214461.
- View Article
- PubMed/NCBI
- Google Scholar
55. Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, et al. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Syst Biol. 2014;8:78. pmid:24964944.
- View Article
- PubMed/NCBI
- Google Scholar
56. Han MK, Kazerooni EA, Lynch DA, Liu LX, Murray S, Curtis JL, et al. Chronic obstructive pulmonary disease exacerbations in the COPDGene study: associated radiologic phenotypes. Radiology. 2011;261(1):274–82. pmid:21788524.
- View Article
- PubMed/NCBI
- Google Scholar
57. Cho MH, Boutaoui N, Klanderman BJ, Sylvia JS, Ziniti JP, Hersh CP, et al. Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nature genetics. 2010;42(3):200–2. pmid:20173748.
- View Article
- PubMed/NCBI
- Google Scholar
58. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, Need AC, et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet. 2009;5(3):e1000421. pmid:19300482.
- View Article
- PubMed/NCBI
- Google Scholar
59. Figarska SM, Vonk JM, Boezen HM. NFE2L2 polymorphisms, mortality, and metabolism in the general population. Physiol Genomics. 2014;46(12):411–7. pmid:24790085.
- View Article
- PubMed/NCBI
- Google Scholar
60. Lutz SM, Cho MH, Young K, Hersh CP, Castaldi PJ, McDonald ML, et al. A genome-wide association study identifies risk loci for spirometric measures among smokers of European and African ancestry. BMC Genet. 2015;16:138. pmid:26634245.
- View Article
- PubMed/NCBI
- Google Scholar
61. Imboden M, Bouzigon E, Curjuric I, Ramasamy A, Kumar A, Hancock DB, et al. Genome-wide association study of lung function decline in adults with and without asthma. Journal of allergy and clinical immunology. 2012;129(5):1218–28. pmid:22424883
- View Article
- PubMed/NCBI
- Google Scholar
62. Capanu M, Ionita-Laza I. Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling. Front Genet. 2015;6:17. pmid:26005447.
- View Article
- PubMed/NCBI
- Google Scholar
63. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10(12):e1004729. pmid:25502226.
- View Article
- PubMed/NCBI
- Google Scholar
64. Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep. 2016;6:21824. pmid:26903168.
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. The UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526(7571):82–90. pmid:26367797.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, et al. Genome-wide association study identifies eight loci associated with blood pressure. Nature genetics. 2009;41(6):666–76. pmid:19430483.
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Zabaneh D, Balding DJ. A genome-wide association study of the metabolic syndrome in Indian Asian men. PloS one. 2010;5(8):e11961. pmid:20694148.
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Gur RE, Nimgaonkar VL, Almasy L, Calkins ME, Ragland JD, Pogue-Geile MF, et al. Neurocognitive endophenotypes in a multiplex multigenerational family study of schizophrenia. Am J Psychiatry. 2007;164(5):813–9. pmid:17475741.
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. pmid:23752797.
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Stephens M. A unified framework for association analysis with multiple related phenotypes. PloS one. 2013;8(7):e65245. pmid:23861737.
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Yang Q, Wang Y. Methods for analyzing multivariate phenotypes in genetic association studies. J Probab Stat. 2012;2012:652569. pmid:24748889.
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature methods. 2014;11(4):407–9. pmid:24531419.
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Liang X, Wang Z, Sha Q, Zhang S. An Adaptive Fisher's Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Sci Rep. 2016;6:34323. pmid:27694844.
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Wang Z, Sha Q, Zhang S. Joint Analysis of Multiple Traits Using "Optimal" Maximum Heritability Test. PloS one. 2016;11(3):e0150975. pmid:26950849.
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Wang Z, Wang X, Sha Q, Zhang S. Joint Analysis of Multiple Traits in Rare Variant Association Studies. Annals of human genetics. 2016;80(3):162–71. pmid:26990300.
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Zhu H, Zhang S, Sha Q. Power Comparisons of Methods for Joint Association Analysis of Multiple Phenotypes. Hum Hered. 2015;80(3):144–52. pmid:27344597.
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet. 2014;94(5):662–76. pmid:24746957.
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Ferreira MA, Purcell SM. A multivariate test of association. Bioinformatics. 2009;25(1):132–3. pmid:19019849.
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genetic epidemiology. 2008;32(1):9–19. pmid:17922480.
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref16] 16. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature genetics. 2012;44(9):1066–71. pmid:22902788.
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref17] 17. O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one. 2012;7(5):e34861. pmid:22567092.
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref18] 18. Zhang Y, Xu Z, Shen X, Pan W, Alzheimer's Disease Neuroimaging I. Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage. 2014;96:309–25. pmid:24704269.
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref19] 19. O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40(4):1079–87. pmid:6534410.
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref20] 20. Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genetic epidemiology. 2010;34(5):444–54. pmid:20583287.
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref21] 21. van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235. pmid:23359524.
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref22] 22. Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genetic epidemiology. 2015. pmid:26493956.
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref23] 23. Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N, et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet. 2015;96(1):21–36. pmid:25500260.
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref24] 24. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21. pmid:18691683.
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref25] 25. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40(6):695–701. pmid:18509313.
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref26] 26. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nature genetics. 2010;42(4):348–54. pmid:20208533.
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref27] 27. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69(1):124–37. pmid:11404818.
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref28] 28. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant …or not? Hum Mol Genet. 2002;11(20):2417–23. pmid:12351577.
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref29] 29. Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nat Genet. 2008;40(1):17–22. pmid:18163131.
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref30] 30. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19(R2):R145–51. pmid:20705737.
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref31] 31. Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11(2):103–5. pmid:17292821.
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref32] 32. Andres AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genetic epidemiology. 2007;31(7):659–71. pmid:17922479.
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref33] 33. Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11(1):31–46. pmid:19997069.
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref34] 34. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5(2):e1000384. pmid:19214210.
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref35] 35. Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res. 2007;615(1–2):28–56. pmid:17101154.
View Article
PubMed/NCBI
Google Scholar

[138] View Article

[139] PubMed/NCBI

[140] Google Scholar

[ref36] 36. Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, et al. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86(6):832–8. pmid:20471002.
View Article
PubMed/NCBI
Google Scholar

[142] View Article

[143] PubMed/NCBI

[144] Google Scholar

[ref37] 37. Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010;87(5):604–17. pmid:21070896.
View Article
PubMed/NCBI
Google Scholar

[146] View Article

[147] PubMed/NCBI

[148] Google Scholar

[ref38] 38. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, et al. Testing for an unusual distribution of rare variants. PLoS Genet. 2011;7(3):e1001322. pmid:21408211.
View Article
PubMed/NCBI
Google Scholar

[150] View Article

[151] PubMed/NCBI

[152] Google Scholar

[ref39] 39. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genetic epidemiology. 2012;36(6):561–71. pmid:22714994.
View Article
PubMed/NCBI
Google Scholar

[154] View Article

[155] PubMed/NCBI

[156] Google Scholar

[ref40] 40. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. pmid:21737059.
View Article
PubMed/NCBI
Google Scholar

[158] View Article

[159] PubMed/NCBI

[160] Google Scholar

[ref41] 41. Yang X, Wang S, Zhang S, Sha Q. Detecting association of rare and common variants based on cross-validation prediction error. Genetic epidemiology. 2017;41(3):233–43. pmid:28176359.
View Article
PubMed/NCBI
Google Scholar

[162] View Article

[163] PubMed/NCBI

[164] Google Scholar

[ref42] 42. Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genetic epidemiology. 2013;37(1):110–21. pmid:23032573.
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref43] 43. Lee S, Teslovich TM, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet. 2013;93(1):42–53. pmid:23768515.
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref44] 44. Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genetic epidemiology. 2014;38(6):494–501. pmid:25065727.
View Article
PubMed/NCBI
Google Scholar

[174] View Article

[175] PubMed/NCBI

[176] Google Scholar

[ref45] 45. Greco B, Hainline A, Arbet J, Grinde K, Benitez A, Tintle N. A general approach for combining diverse rare variant association tests provides improved robustness across a wider range of genetic architectures. Eur J Hum Genet. 2015. pmid:26508571
View Article
PubMed/NCBI
Google Scholar

[178] View Article

[179] PubMed/NCBI

[180] Google Scholar

[ref46] 46. Broadaway KA, Cutler DJ, Duncan R, Moore JL, Ware EB, Jhun MA, et al. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants. Am J Hum Genet. 2016;98(3):525–40. pmid:26942286.
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref47] 47. Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nature methods. 2015;12(8):755–8. pmid:26076425.
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref48] 48. Sun J, Oualkacha K, Forgetta V, Zheng HF, Brent Richards J, Ciampi A, et al. A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects. Eur J Hum Genet. 2016. pmid:26860061.
View Article
PubMed/NCBI
Google Scholar

[190] View Article

[191] PubMed/NCBI

[192] Google Scholar

[ref49] 49. Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, et al. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genetic epidemiology. 2015;39(4):259–75. pmid:25809955.
View Article
PubMed/NCBI
Google Scholar

[194] View Article

[195] PubMed/NCBI

[196] Google Scholar

[ref50] 50. Wu B, Pankow JS. Sequence Kernel Association Test of Multiple Continuous Phenotypes. Genetic epidemiology. 2016;40(2):91–100. pmid:26782911.
View Article
PubMed/NCBI
Google Scholar

[198] View Article

[199] PubMed/NCBI

[200] Google Scholar

[ref51] 51. Pan W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic epidemiology. 2009;33(6):497–507. pmid:19170135.
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar

[ref52] 52. Fang S, Zhang S, Sha Q. Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families. Annals of human genetics. 2013;77(6):524–34. Epub 2013/08/24. pmid:23968488.
View Article
PubMed/NCBI
Google Scholar

[206] View Article

[207] PubMed/NCBI

[208] Google Scholar

[ref53] 53. Nazir SA, Erbland ML. Chronic obstructive pulmonary disease: an update on diagnosis and management issues in older adults. Drugs Aging. 2009;26(10):813–31. pmid:19761275.
View Article
PubMed/NCBI
Google Scholar

[210] View Article

[211] PubMed/NCBI

[212] Google Scholar

[ref54] 54. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD. 2010;7(1):32–43. pmid:20214461.
View Article
PubMed/NCBI
Google Scholar

[214] View Article

[215] PubMed/NCBI

[216] Google Scholar

[ref55] 55. Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, et al. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Syst Biol. 2014;8:78. pmid:24964944.
View Article
PubMed/NCBI
Google Scholar

[218] View Article

[219] PubMed/NCBI

[220] Google Scholar

[ref56] 56. Han MK, Kazerooni EA, Lynch DA, Liu LX, Murray S, Curtis JL, et al. Chronic obstructive pulmonary disease exacerbations in the COPDGene study: associated radiologic phenotypes. Radiology. 2011;261(1):274–82. pmid:21788524.
View Article
PubMed/NCBI
Google Scholar

[222] View Article

[223] PubMed/NCBI

[224] Google Scholar

[ref57] 57. Cho MH, Boutaoui N, Klanderman BJ, Sylvia JS, Ziniti JP, Hersh CP, et al. Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nature genetics. 2010;42(3):200–2. pmid:20173748.
View Article
PubMed/NCBI
Google Scholar

[226] View Article

[227] PubMed/NCBI

[228] Google Scholar

[ref58] 58. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, Need AC, et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet. 2009;5(3):e1000421. pmid:19300482.
View Article
PubMed/NCBI
Google Scholar

[230] View Article

[231] PubMed/NCBI

[232] Google Scholar

[ref59] 59. Figarska SM, Vonk JM, Boezen HM. NFE2L2 polymorphisms, mortality, and metabolism in the general population. Physiol Genomics. 2014;46(12):411–7. pmid:24790085.
View Article
PubMed/NCBI
Google Scholar

[234] View Article

[235] PubMed/NCBI

[236] Google Scholar

[ref60] 60. Lutz SM, Cho MH, Young K, Hersh CP, Castaldi PJ, McDonald ML, et al. A genome-wide association study identifies risk loci for spirometric measures among smokers of European and African ancestry. BMC Genet. 2015;16:138. pmid:26634245.
View Article
PubMed/NCBI
Google Scholar

[238] View Article

[239] PubMed/NCBI

[240] Google Scholar

[ref61] 61. Imboden M, Bouzigon E, Curjuric I, Ramasamy A, Kumar A, Hancock DB, et al. Genome-wide association study of lung function decline in adults with and without asthma. Journal of allergy and clinical immunology. 2012;129(5):1218–28. pmid:22424883
View Article
PubMed/NCBI
Google Scholar

[242] View Article

[243] PubMed/NCBI

[244] Google Scholar

[ref62] 62. Capanu M, Ionita-Laza I. Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling. Front Genet. 2015;6:17. pmid:26005447.
View Article
PubMed/NCBI
Google Scholar

[246] View Article

[247] PubMed/NCBI

[248] Google Scholar

[ref63] 63. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10(12):e1004729. pmid:25502226.
View Article
PubMed/NCBI
Google Scholar

[250] View Article

[251] PubMed/NCBI

[252] Google Scholar

[ref64] 64. Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep. 2016;6:21824. pmid:26903168.
View Article
PubMed/NCBI
Google Scholar

[254] View Article

[255] PubMed/NCBI

[256] Google Scholar

Figures

Abstract

Introductions

Methods

Without covariates

With covariates

Comparison of methods

Simulations

Results

Application to the COPDGene

Discussion

Appendix

Supporting information

S1 File. Supplementary information.

Acknowledgments

References