Family-Based Bivariate Association Tests for Quantitative Traits

Lei Zhang; Aaron J. Bonham; Jian Li; Yu-Fang Pei; Jie Chen; Christopher J. Papasian; Hong-Wen Deng

doi:10.1371/journal.pone.0008133

Abstract

The availability of a large number of dense SNPs, high-throughput genotyping and computation methods promotes the application of family-based association tests. While most of the current family-based analyses focus only on individual traits, joint analyses of correlated traits can extract more information and potentially improve the statistical power. However, current TDT-based methods are low-powered. Here, we develop a method for tests of association for bivariate quantitative traits in families. In particular, we correct for population stratification by the use of an integration of principal component analysis and TDT. A score test statistic in the variance-components model is proposed. Extensive simulation studies indicate that the proposed method not only outperforms approaches limited to individual traits when pleiotropic effect is present, but also surpasses the power of two popular bivariate association tests termed FBAT-GEE and FBAT-PC, respectively, while correcting for population stratification. When applied to the GAW16 datasets, the proposed method successfully identifies at the genome-wide level the two SNPs that present pleiotropic effects to HDL and TG traits.

Citation: Zhang L, Bonham AJ, Li J, Pei Y-F, Chen J, Papasian CJ, et al. (2009) Family-Based Bivariate Association Tests for Quantitative Traits. PLoS ONE 4(12): e8133. https://doi.org/10.1371/journal.pone.0008133

Editor: Philip Awadalla, University of Montreal, Canada

Received: August 9, 2009; Accepted: October 6, 2009; Published: December 2, 2009

Copyright: © 2009 Zhang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The study was partially supported by grants from the National Institutes of Health (R01AR050496, R21 AG027110, R01 AG026564, P50 AR055081 and R21 AA015973). The investigators of this work also benefited from grants from the National Science Foundation of China, Huo Ying Dong Education Foundation, Shanghai Leading Academic Discipline Project (S30501), HuNan Province, Xi'an Jiaotong University, and the Ministry of Education of China. Computing support was partially provided by the High Performance Computing Cluster Center at Xi'an Jiaotong University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Recent technological advances in genotyping along with the capacity to detect increasingly large numbers of single nucleotide polymorphisms (SNPs) have created great demand for developing new strategies to identify genes that underlie phenotypic variation. The availability of high-throughput SNP genotype data is prompting the development of genetic association analyses, including family based association tests (FBAT).

For family data sets, such as the Framingham heart study [1], multiple phenotypes are usually recorded. While most of the current analyses focus only on traits individually, explicitly modeling genetic and environmental correlations among traits can theoretically extract more information and consequently provide a greater power of test. In linkage studies, it has been shown that joint analyses of two correlated traits may substantially improve power for localizing genes that jointly influence complex traits, and for evaluating their effects [2]–[7]. In association studies, however, only a limited few methods are available [8]–[10]. Therein, Lange et al. [10] proposed a multivariate generalized estimating equations (GEEs) based method, termed FBAT-GEE. The method FBAT-GEE makes no assumptions on phenotypic distributions and thus can be applied to phenotypes with arbitrary distributions. For quantitative traits, Lange et al. [9] also proposed a generalized principal component analysis (PCA), termed FBAT-PC, which is more powerful than FBAT-GEE.

Both the methods FBAT-GEE and FBAT-PC possess the property of protection against population stratification by a transmission disequilibrium test (TDT)-like strategy. Despite its potential merit, this property comes at the cost of a substantial loss of statistical power by the fact that genotypes at each single marker need to be decomposed in order to correct for population stratification and test association simultaneously. The loss of power may become problematic in the context of genomewide association studies (GWAS) where it is critical to achieve a genomewide significance level in order to judge a positive finding.

Alternatively, the issue of population stratification can be handled at the population level by studying population structures from genotype data of multiple markers [11]–[17]. Among these approaches, Principal component analysis based methods [12], [14], [16], [17] summarize individual genetic background information. PCA-based methods are proven to be both computationally fast and statistically effective. The extensions of PCA to familial data are also proposed by Zhu et al. [14] and by us previously [18]. Thus, with the availability of large numbers of genotyped markers, a more flexible scheme that would enhance the feasibility of applying FBAT would be to correct for population stratification from multiple markers rather than from any single marker.

In this study, under the framework of the variance-components (VCs) model [19], [20], we develop a method for tests of association by joint analysis of two correlated quantitative traits in families. Specifically, Individual genotype scores and phenotypes are adjusted by the use of the principal component analysis to guide against potential population stratification. A score test is proposed with the residual of genotypes and of phenotypes. Statistical properties of the proposed method are investigated through extensive simulations under a variety of conditions, and its performance is compared with existing both univariate and bivariate methods.

Methods

Multivariate Variance-Components Pedigree Model

We describe the problem in the variance-components (VCs) [8], [19], [20] framework which is a powerful tool for modeling phenotypic variation for continuous traits in families. We describe the model in the context of multivariate phenotypes, although we consider only bivariate situation in our analysis.

Assume that there are N nuclear families with n_i individuals in the i-th family (i = 1, …, N). Let y_ij = (y_ij₁, …, y_ij_m)′ be the vector of m (m = 2 for bivariate) phenotypes for individual j (j = 1, …, n_i) in family i. Further, let Y_i = (y_i₁′, …, y_{in_i}′)′ be the vector of phenotypes for all members in family i. Under the additive mode of inheritance, the genotype score g_ij is defined as 0, 1 and 2 for genotypes “11”, “12” and “22”, respectively. In the variance-components model, genetic components contributing to phenotypes are decomposed into the fixed effects, e.g., the effects at the specified locus, and the random effects, e.g., the effects of unknown polygenes. Specifically, the phenotype vector y_ij can be described by the following multivariate variance-components model(1)where denotes the vector of grand means for the m phenotypes; x_ij is a m×t design matrix for t covariates, e.g., age, sex, and known environment factors, to the m phenotypes, and β_x is the vector of corresponding covariate effects; g_ij is a m×m design matrix for genotype scores with the m principal diagonal elements being g_ij and the other elements being 0, and β_g the corresponding additive major gene effects. At last, α_ij and ε_ij are vectors of length m denoting, respectively, the additive polygenic effects and the residual effects.

Here, the covariate effects and the major gene effects are modeled as fixed, whereas the polygene effects and the residuals are modeled as random. Let and follow multivariate normal distributionswhere A and E are the m×m variance-covariance matrices accounting for polygenic () and environmental () variation among the traits, respectively, so that(2)The covariance matrix of y_ij, , has elementswhere G, A, and E are the m×m variance-covariance matrices accounting for major gene (), polygenic () and environmental () variations, respectively.

The phenotype vector for the family i, Y_i, will then follow a multivariate normal distribution with the mean vector(3)and the covariance matrix(4)where is the mean vector with length mn_i for the family i; is the design matrix for covariates, and is the design matrix for genotypes at the tested locus; is the n_i×n_i identity-by-descent (IBD) matrix at the tested locus (estimated from the genotype data) and is the n_i×n_i kinship coefficient matrix (estimated from the relationships among individuals), both of which can be calculated from pedigree structures and available genotype data. Finally, I is the n_i×n_i identity matrix and is the Kronecker-product operator for matrices.

Correcting for Population Stratification

When the issue of population stratification exists, the model described above may not provide a valid test. We previously proposed to extend the principal component analysis to include familial data [18]. The method is briefly outlined as follows: founders in each family are selected to form an unrelated sample on which principal component analysis is performed with available genotype data. The calculated principal components are used to estimate these founders' genetic background information and to adjust their genotype scores and phenotypes, as described by Price et al. [12]. Principal components for non-founders in each family are inferred according to those for their founder relatives through a TDT strategy. The inferred principal components are then used to adjust non-founders' genotypes and phenotypes. The approach is also extended to the scenarios where parental information is missing. Denote the adjusted genotypes and phenotypes with an asterisk (*), and we rewrite the equation (3) as(5)

Tests of Association

With the assumptions of independent families and of within-family multivariate normality distributed phenotypes, the likelihood function of the adjusted genotype and phenotype data is written asEvidence of association is assessed by a statistical hypothesis testing of the null hypothesis H₀: (no association) versus the alternative hypothesis H₁: (evidence of association). Generally, the hypothesis can be tested by a likelihood ratio test (LRT) where for each marker the maximal likelihoods under the null and alternative hypotheses are estimated respectively. However, the LRT is rather computationally intensive when large numbers of markers are involved, making it prohibitive for large-scale scans. Here we propose a multivariate score test as the extension of that proposed by Chen & Abecasis [21]. The set of parameter in the likelihood function is . We first fit the model under the null hypothesis (without ), from which we obtain the maximal likelihood estimates (MLE) of and , denoted by and , respectively, for family i. Under the null hypothesis, the score vector with respect to β_g is defined asand the corresponding variance-covariance matrix isThe score test statistic is then defined aswhich asymptotically follows a distribution with degree of freedoms (df) being the rank of , which is equal to m unless there are linear dependences between the phenotypes. The statistic T is valid regardless of the presence of population stratification.

Data Simulations

Statistical properties and performances of the proposed method were investigated via extensive simulation studies. For genotype data, we simulated 998 SNP markers, with the allele frequency for each marker being drawn from the Uniform distribution U(0.1, 0.9). We also simulated two additional SNPs, both with MAF 0.3, as the causal and the test SNPs, respectively. Two hundred nuclear families were simulated by sampling parental genotypes according to allele frequencies, and then by randomly selecting two parents to produce children. Unless otherwise specified, the number of children per family was drawn from a Poisson distribution with mean 2.

Two quantitative traits were simulated by the equations (3) and (4). To each trait, the causal SNP was assumed to explain a specific proportion of phenotypic variation, which was set to 2% by default, and the background polygenic effects were assumed to explain additional 40% of phenotypic variation. The polygenic (ρ_a) and environmental (ρ_e) correlations between the traits were set to 0.4 unless otherwise specified.

When needed, population stratification was generated by mixing equal numbers of families from two discrete populations A and B. Marker allele frequencies in the two populations were generated using the Balding-Nichols model [22]. Specifically, for each SNP, an ancestry allele frequency p was drawn from the Uniform distribution U(0.1, 0.9). The allele frequencies for populations A and B were then drawn from a Beta distribution with parameters p(1−F_ST)/F_ST and (1−p)(1−F_ST)/F_ST, where F_ST is the measure of genetic distance between the two populations. We set F_ST to 0.05 to simulate moderate population stratification. The two populations were generated separately with different phenotypic means (μ_A and μ_B) and different causal and test SNP MAFs (p_A = 0.2 and p_B = 0.4 for both causal and test SNPs). The values of μ_A and μ_B were selected such that 20% of the phenotypic variance of each trait in the combined population was explained by stratification.

We evaluated the statistical properties, including type I error rates and power, of the proposed method. In all the simulations, the null hypothesis was produced by setting the LD measure r² between the causal and the test sites to 0.0, whereas the alternative hypothesis was produced by setting a certain value of r² between the two sites. Unless otherwise specified, the value of r² under the alternative was set to 1.0 to produce a perfect association between the two sites.

We also studied the effects of various influential factors, including locus effect, correlation coefficient between traits, the level of LD, and family structure, on the performance of the proposed method. For comparison, we also included extensive popular univariate and bivariate methods into analysis. For univariate method, we selected the commonly used method QTDT proposed by Abecasis et al. [23], and the univariate score test proposed by us previously [18]. For bivariate analyses, we selected two popular methods: FBAT-GEE [10] and FBAT-PC [9], which are implemented in the programs FBAT [24] and PBAT [25], respectively. We denote the proposed test and the other methods as T, UT, QTDT, FBAT and PBAT, respectively, throughout the study.

GAW16 Datasets

As an application, we analyzed the Genetic Analysis Workshop 16 (GAW16) Problem 3 simulated data sets with the proposed method. The access and analyses of the GAW16 simulated data sets have been approved by the Institutional Review Board (IRB) of the University of Missouri-Kansas City (UMKC). The GAW16 data sets consist of 6476 subjects from the Framingham Heart Study (FHS), where each subject has real genotypes at approximately 550,000 (549,872) SNPs and simulated phenotypes.

Subjects are distributed among three generations and singletons. After dividing large families into smaller nuclear families and applying some quality controls to the data (for example, as the proposed test cannot analyze half-sibs, we deleted half-sibs from the data), we finally identified 5456 family members from a total of 1815 nuclear families.

A total of four correlated quantitative traits, termed HDL, LDL, TG, CHOL, respectively, are simulated in order to mimic the lipid pathway underlying the development of cardiovascular disease [26]. We focused on the traits HDL and TG. Genetic components underlying each of both traits consist of several SNPs with major effects and 1,000 SNPs with polygenic effects. Two major SNPs (rs3200218 and rs8192719) present pleiotropic effects to both traits in the simulation. Phenotype data are simulated at three pseudo-visits with 10 years apart to mimic the context of longitudinal study, and at each visit, 200 simulated data sets are replicated. The dataset from the first replicate of the first visit was analyzed as suggested by the workshop. Both phenotypes were adjusted by age and sex.

Results

Type I Error Rates

We first estimate type I error rates under a variety of polygenic (ρ_a) and environmental (ρ_e) correlations in homogeneous population setting, as listed in Table 1. Two modes of linkage are considered: 1) the marker is tightly linked to but not associated with the QTL (Linkage); and 2) the marker is neither linked to nor associated with the QTL (No linkage). It is obvious that all methods have correct type I error rates that are close to the target levels, regardless of the existence of linkage.

Download:

Table 1. Type I error rates at various levels of residual correlations under homogeneous population.

https://doi.org/10.1371/journal.pone.0008133.t001

Table 2 lists the type I error rates when families from two populations are admixed. All methods again have correct error rates, implying their ability to protect against population stratification.

Download:

Table 2. Type I error rates at various levels of residual correlations under admixed population.

https://doi.org/10.1371/journal.pone.0008133.t002

We also estimate the type I error rates when parents in each family are missing, as presented in table 3. In the table, the number of children per family varies from 2 to 4, with the total number of children being fixed at 480. Again, all investigated methods have correct type I error rates regardless of the presence of linkage or population stratification, indicating the proposed method as well as the others is applicable to studies with missing parental information.

Download:

Table 3. Type I error rates when parents are missing.

https://doi.org/10.1371/journal.pone.0008133.t003

Power Estimates

Powers of various methods affected by ρ_a and ρ_e are listed in table 4. For all the three bivariate approaches, power increases as residual correlations (ρ_a and/or ρ_e) decrease from +1.0 to −1.0. For example, under homogenous population, the power of the proposed method is 87.6% when both ρ_a and ρ_e are +0.8, and increases to 100.0% when both correlations decrease to −0.8. In additional simulations where the major gene correlation between the traits is constrained to −1.0 rather than +1.0, we observe an opposite trend in power change; power increases as ρ_a and/or ρ_e increase from −1.0 to +1.0 (data not shown). Therefore, our simulation results indicate that the power of the bivariate approaches increases when the effects of the major-gene and those of the residuals (polygenic and environmental) act in increasingly dissimilar directions, which coincides with previous studies in the literature of linkage analyses [3].

Download:

Table 4. Power estimates at various levels of residual correlations.

https://doi.org/10.1371/journal.pone.0008133.t004

Among the three bivariate methods, the proposed one has the highest power under all the parameter settings. The power improvement is remarkably large. For example, when both ρ_a and ρ_e are +0.8 under homogeneous population, the power for the proposed method is 90.5%, whereas it is only 48.8% for FBAT and 58.0% for PBAT. For the other two methods, PBAT has a higher power than FBAT.

When comparing the bivariate and univariate approaches, the proposed method has higher power than UT under most conditions, and than QTDT under all the conditions. UT has higher power than QTDT, consistent with our previous study [18]. FBAT and PBAT have higher power than QTDT unless the traits are highly positively correlated.

Power when parental information is missing is presented in table 5. The trends in power changes are similar with those when parental information is available. The proposed method again has the highest power, and the bivariate tests have higher power over univariate tests in most situations.

Download:

Table 5. Power estimates at various levels of residual correlations when parents are missing.

https://doi.org/10.1371/journal.pone.0008133.t005

We also study the effects of two factors, including the level of LD and family structure, on power estimations. As presented in table 6, all the methods have increased power with increased r². As the number of children per family increases, the power of the proposed method and UT decreases a little, whereas that of the other methods increases instead. Again, the proposed method has the highest power in all situations.

Download:

Table 6. The effects of LD level and family structures on power estimates.

https://doi.org/10.1371/journal.pone.0008133.t006

Analysis of GAW16 Datasets

As an application, we analyze the GAW16 simulated datasets as described in the preceding section. On a desktop computer with a single 2.8GHz Intel Xeon CPU, the computation from the proposed method for the scan with 550K SNPs takes a total time of approximately 20 hours. Figure 1A presents the quantile-quantile (QQ) plot (left) and log-QQ plot (right), and Figure 1B plots raw p-values over 22 autosomes. Obviously, the overall p-values distribute uniformly between 0 and 1. The most significant signal from the proposed method is observed at SNP rs10820738, with a p-value 9.55e-17. UT has a p-value 4.42e-17 at this SNP. The second most significant signal corresponds to one of its flanking SNPs rs2297398 (7.7kb apart), with a p-value 1.88e-15 from the proposed method. The SNP rs10820738 explains the most phenotypic variation (1.0%) for HDL trait in the GAW16 simulation, but none for TG trait. The third most significant SNP is rs3200218 with a p-value 3.47e-12. This SNP is exactly one of the two SNPs that present pleiotropic effects to both traits. The p-value at the other major pleiotropic SNP rs8192719 is 3.07e-6, which does not achieve the genomewide significance level. However, one of its flanking SNPs rs7249735 (12.8kb apart) presents a p-value 2.77e-8 that is significant at the genomewide level. Generally speaking, the proposed method successfully identifies both pleiotropic SNPs at the genomewide level. We also list p-values at these two SNPs from other methods in Table 7. Obviously, most of them do not achieve genomewide significant level, further demonstrating the advantage of the proposed method.

Download:

Figure 1. Application of the proposed method to the GAW16 simulated datasets.

The GAW16 simulated HDL and TG traits were analyzed. Figure 1A, the quantile-quantile (QQ) plot (left), and log-QQ plot (right); Figure 1B, raw p-values of the genome-wide scan.

https://doi.org/10.1371/journal.pone.0008133.g001

Download:

Table 7. P-values at pleiotropic SNPs.

https://doi.org/10.1371/journal.pone.0008133.t007

Discussion

In this study, we have presented a bivariate test of association for quantitative traits in families, by the use of the multivariate variance-components model. In particular, the proposed method uses principal component analysis to correct for population stratification. Simulation studies have shown that the proposed method not only outperforms the analysis focused on individual traits when pleiotropic effect is present, but also has increased power compared with the existing bivariate methods, while correcting for population stratification.

The strength of bivariate analyses is influenced by correlations between traits. Our simulation results show that bivariate analyses are more powerful when the major genes and the residual factors act in more dissimilar ways. For example, by constraining the major-gene correlation to +1.0, bivariate approaches are most powerful when both ρ_a and ρ_e are equal to −0.8, corresponding to an approximate phenotypic correlation of −0.76. When pleiotropic effect is present, bivariate analysis is more powerful than univariate analysis unless the residual correlation is high and in the same direction to the correlation of major gene effect, coinciding with the findings in linkage studies [27]. When pleiotropic effect is not present and there is weak or no correlation between traits, on the other hand, bivariate analysis is less powerful than univariate analysis [28]. In our analyses of the SNP rs10820738 that has no pleiotropic effect in the GAW16 simulation, the p-value from the proposed bivariate method, 9.55e-17, is slightly higher than that from the univariate method UT, 4.42e-17. In practical applications in which the existence of pleiotropic effect is unknown in prior, bivariate analysis is not necessarily more powerful than univariate analysis, even when the traits are strongly correlated. Bivariate analysis should thus be processed with caution, and combing the results of bivariate and univariate analyses is warranted.

Thus, our simulations provide a statement in demonstrating that bivariate approaches are more powerful than univariate analyses unless major-gene effects and residual effects are very highly correlated in the same direction, which coincides with the conclusion of Amos et al. [27].

Our method that is developed by extending the variance-components model offers a way to powerfully/robustly perform bivariate association analysis in the presence of linkage in general pedigrees. The variance-components model is advantageous in detecting QTL in the following two aspects: first, it combines the analysis of linkage and association and would increase the power of detecting QTL when the marker, itself is not the QTL, is associated with the QTL. Second, the prior evidence on linkage can be incorporated to indicate LD strength between the QTL and the marker [29].

Another strengthen of our method is decomposing individual genotype scores by principal component analysis rather than by TDT-like strategy for controlling population stratification. The resulting test statistic provides largely improved power over existing TDT-based methods, where the latter may be prohibitive for application to genome scans due to their poor powers. For example, under the moderate setting where both polygenic and environmental correlation coefficients were set to 0.4 and the locus effect were set to 2%, we observed 164 and 24 significant results over 1,000 replicates for the proposed method and UT, respectively, but only 6, 8, and 4 for FBAT-GEE, FBAT-PC, and QTDT, respectively, at the genome-wide level 1.0e-6.

An interesting observation from our simulations is that family structures influence the power of the investigated methods in different patterns. For FBAT-GEE, FBAT-PC and QTDT that control population stratification through TDT, small numbers of large families provide more power than large numbers of small families. This is not surprising, since with these methods the parental information is used to control the stratification, and consequently, only the information of children contributes to test statistics. Contrary, the power decreases slightly as the number of children per family increases for the proposed method. This appears to be caused by the fact that a large number of small independent families provides more information on allele distributions than a small number of larger families can provide.

In this manuscript, we focus our attention on data with nuclear families. However, the proposed method is applicable to extended pedigrees as well. The described multivariate variance-components model can be directly applied to extended pedigrees. As for correcting for population stratification, the extension of principal component analysis coupled with TDT strategy to extended pedigrees is also straightforward. For example, all founders can be collected to form an unrelated sample. In cases where there is no founder, one sib can be randomly selected into the unrelated sample, as we described in reference [18]. PCA will then be performed on the unrelated sample, and both the genotype and the phenotype for each subject in the unrelated sample are adjusted accordingly. For subjects not in the unrelated sample, their principal components can be calculated by that of their parents or sib that is in the unrelated sample. The process will carry on recursively until all non-founders are adjusted. Some specialized algorithms, e.g., the one described in [30], can be adopted in a relatively simple manner. However, the performance when applying to extended pedigrees is unknown deserves further endeavor.

In summary, we develop a novel method for family based bivariate association test. Our method is more powerful than currently available bivariate methods. The proposed method is computationally effective and can complete a typical GWAS scan within hours. The program implementing our proposed method is available upon request to us.

Acknowledgments

We acknowledge the Genetic Analysis Workshop 16 (GAW16) and the Framingham Heart Study (FHS) for permission of use of the data. We also acknowledge Yanfang Guo and Feng Zhang for helpful discussions. Computing support was partially provided by the High Performance Computing Cluster Center at Xi'an Jiaotong University.

Author Contributions

Conceived and designed the experiments: LZ HWD. Performed the experiments: LZ. Analyzed the data: LZ AJB YFP. Contributed reagents/materials/analysis tools: AJB JL YFP. Wrote the paper: LZ AJB JL YFP JC CJP HWD.

References

1. Cupples LA, Arruda HT, Benjamin EJ, D'Agostino RB Sr, Demissie S, et al. (2007) The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med Genet 8: Suppl 1S1.
- View Article
- Google Scholar
2. Almasy L, Dyer TD, Blangero J (1997) Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol 14: 953–958.
- View Article
- Google Scholar
3. Amos CI, Elston RC, Bonney GE, Keats BJ, Berenson GS (1990) A multivariate method for detecting genetic linkage, with application to a pedigree with an adverse lipoprotein phenotype. Am J Hum Genet 47: 247–254.
- View Article
- Google Scholar
4. de Andrade M, Mendell NR (2005) Summary of contributions to GAW Group 12: multivariate methods. Genet Epidemiol 29: Suppl 1S91–95.
- View Article
- Google Scholar
5. Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127.
- View Article
- Google Scholar
6. Li Y, Sung WK, Liu JJ (2007) Association mapping via regularized regression analysis of single-nucleotide-polymorphism haplotypes in variable-sized sliding windows. Am J Hum Genet 80: 705–715.
- View Article
- Google Scholar
7. Williams JT, Van Eerdewegh P, Almasy L, Blangero J (1999) Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet 65: 1134–1147.
- View Article
- Google Scholar
8. Jung J, Zhong M, Liu L, Fan R (2008) Bivariate combined linkage and association mapping of quantitative trait loci. Genet Epidemiol 32: 396–412.
- View Article
- Google Scholar
9. Lange C, van Steen K, Andrew T, Lyon H, DeMeo DL, et al. (2004) A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol 3: Article17.
- View Article
- Google Scholar
10. Lange C, Silverman EK, Xu X, Weiss ST, Laird NM (2003) A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4: 195–206.
- View Article
- Google Scholar
11. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55: 997–1004.
- View Article
- Google Scholar
12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909.
- View Article
- Google Scholar
13. Wall JD, Pritchard JK (2003) Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet 73: 502–515.
- View Article
- Google Scholar
14. Zhu X, Li S, Cooper RS, Elston RC (2008) A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 82: 352–365.
- View Article
- Google Scholar
15. Chen HS, Zhu X, Zhao H, Zhang S (2003) Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet 67: 250–264.
- View Article
- Google Scholar
16. Zhang S, Zhu X, Zhao H (2003) On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet Epidemiol 24: 44–56.
- View Article
- Google Scholar
17. Zhu X, Zhang S, Zhao H, Cooper RS (2002) Association mapping, using a mixture model for complex traits. Genet Epidemiol 23: 181–196.
- View Article
- Google Scholar
18. Zhang L, Li J, Pei Y-F, Liu Y, Deng H-W (2009) Tests of association for quantitative traits in nuclear families using principal components to correct for population stratification. Annals of Human Genetics In Press.
- View Article
- Google Scholar
19. Schaid DJ (1996) General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 13: 423–449.
- View Article
- Google Scholar
20. Amos CI (1994) Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet 54: 535–543.
- View Article
- Google Scholar
21. Chen WM, Abecasis GR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81: 913–926.
- View Article
- Google Scholar
22. Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12.
- View Article
- Google Scholar
23. Abecasis GR, Cardon LR, Cookson WO (2000) A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66: 279–292.
- View Article
- Google Scholar
24. Laird NM, Horvath S, Xu X (2000) Implementing a unified approach to family-based tests of association. Genet Epidemiol 19: Suppl 1S36–42.
- View Article
- Google Scholar
25. Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM (2004) PBAT: tools for family-based association studies. Am J Hum Genet 74: 367–369.
- View Article
- Google Scholar
26. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J 3rd (1961) Factors of risk in the development of coronary heart disease–six year follow-up experience. The Framingham Study. Ann Intern Med 55: 33–50.
- View Article
- Google Scholar
27. Amos CI, Laing AE (1993) A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol 10: 671–676.
- View Article
- Google Scholar
28. Zhang L, Pei YF, Li J, Papasian CJ, Deng HW (2009) Univariate/multivariate genome-wide association scans using data from families and unrelated samples. PLoS One 4: e6502.
- View Article
- Google Scholar
29. Fulker DW, Cherny SS, Sham PC, Hewitt JK (1999) Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 64: 259–267.
- View Article
- Google Scholar
30. Abecasis GR, Cookson WO, Cardon LR (2000) Pedigree tests of transmission disequilibrium. Eur J Hum Genet 8: 545–551.
- View Article
- Google Scholar

[ref1] 1. Cupples LA, Arruda HT, Benjamin EJ, D'Agostino RB Sr, Demissie S, et al. (2007) The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med Genet 8: Suppl 1S1.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Almasy L, Dyer TD, Blangero J (1997) Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol 14: 953–958.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Amos CI, Elston RC, Bonney GE, Keats BJ, Berenson GS (1990) A multivariate method for detecting genetic linkage, with application to a pedigree with an adverse lipoprotein phenotype. Am J Hum Genet 47: 247–254.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. de Andrade M, Mendell NR (2005) Summary of contributions to GAW Group 12: multivariate methods. Genet Epidemiol 29: Suppl 1S91–95.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Li Y, Sung WK, Liu JJ (2007) Association mapping via regularized regression analysis of single-nucleotide-polymorphism haplotypes in variable-sized sliding windows. Am J Hum Genet 80: 705–715.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Williams JT, Van Eerdewegh P, Almasy L, Blangero J (1999) Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet 65: 1134–1147.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Jung J, Zhong M, Liu L, Fan R (2008) Bivariate combined linkage and association mapping of quantitative trait loci. Genet Epidemiol 32: 396–412.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Lange C, van Steen K, Andrew T, Lyon H, DeMeo DL, et al. (2004) A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol 3: Article17.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Lange C, Silverman EK, Xu X, Weiss ST, Laird NM (2003) A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4: 195–206.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55: 997–1004.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Wall JD, Pritchard JK (2003) Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet 73: 502–515.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Zhu X, Li S, Cooper RS, Elston RC (2008) A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 82: 352–365.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Chen HS, Zhu X, Zhao H, Zhang S (2003) Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet 67: 250–264.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Zhang S, Zhu X, Zhao H (2003) On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet Epidemiol 24: 44–56.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Zhu X, Zhang S, Zhao H, Cooper RS (2002) Association mapping, using a mixture model for complex traits. Genet Epidemiol 23: 181–196.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Zhang L, Li J, Pei Y-F, Liu Y, Deng H-W (2009) Tests of association for quantitative traits in nuclear families using principal components to correct for population stratification. Annals of Human Genetics In Press.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Schaid DJ (1996) General score tests for associations of genetic markers with disease using cases and their parents. Genet Epidemiol 13: 423–449.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Amos CI (1994) Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet 54: 535–543.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Chen WM, Abecasis GR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81: 913–926.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Abecasis GR, Cardon LR, Cookson WO (2000) A general test of association for quantitative traits in nuclear families. Am J Hum Genet 66: 279–292.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Laird NM, Horvath S, Xu X (2000) Implementing a unified approach to family-based tests of association. Genet Epidemiol 19: Suppl 1S36–42.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM (2004) PBAT: tools for family-based association studies. Am J Hum Genet 74: 367–369.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J 3rd (1961) Factors of risk in the development of coronary heart disease–six year follow-up experience. The Framingham Study. Ann Intern Med 55: 33–50.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Amos CI, Laing AE (1993) A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol 10: 671–676.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Zhang L, Pei YF, Li J, Papasian CJ, Deng HW (2009) Univariate/multivariate genome-wide association scans using data from families and unrelated samples. PLoS One 4: e6502.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. Fulker DW, Cherny SS, Sham PC, Hewitt JK (1999) Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 64: 259–267.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Abecasis GR, Cookson WO, Cardon LR (2000) Pedigree tests of transmission disequilibrium. Eur J Hum Genet 8: 545–551.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

Figures

Abstract

Introduction

Methods

Multivariate Variance-Components Pedigree Model

Correcting for Population Stratification

Tests of Association

Data Simulations

GAW16 Datasets

Results

Type I Error Rates

Power Estimates

Analysis of GAW16 Datasets

Discussion

Acknowledgments

Author Contributions

References