Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data

doi:10.1371/journal.pgen.1004445

Table 1.

Summary of methods/software packages investigated.

More »

Expand

Figure 1.

Comparison of kinship estimates (pruned SNPs) using different software packages.

Plots above the diagonal show a comparison of kinship measures, with correlations between the kinship measures indicated below the diagonal. EM_BN = EMMAX (Balding-Nichols), EM_IBS = EMMAX (IBS method), FLMM_C = FaST-LMM using covariance matrix, FLMM_R = FaST-LMM using realised relationship matrix, GA = GenABEL, GMA_C = GEMMA using centred genotypes, GMA_S = GEMMA using standardised genotypes, KING_H = KING with homogeneous population assumption, KING_R = KING with robust estimation.

More »

Expand

Figure 2.

Genomic control factors obtained using different software packages and different strategies for modelling kinships.

PLINK = analysis in PLINK with no adjustment made for relatedness. Other methods/software packages are listed in Table 1 (see Table 2 for abbreviated names of methods). Pedigree = theoretical kinships based on known pedigree relationships used to adjust for relatedness. Thinned = kinships based on 1900 ‘thinned’ SNPs used to adjust for relatedness. Pruned = kinships based on 50,129 ‘pruned’ SNPs used to adjust for relatedness. Full = kinships based on 545,433 SNPs used to adjust for relatedness.

More »

Expand

Table 2.

Genomic control inflation factors achieved in real data or in a single replicate of the simulated data sets.

More »

Expand

Figure 3.

Power and type 1 error of different methods.

Powers (left hand plots) are defined as the proportion of replicates (out of 1000) in which both simulated disease loci are detected, with ‘detection’ corresponding to any SNP within 40 kb of the simulated disease locus reaching the specified p-value threshold. Type 1 errors (right hand plots) are defined as the proportion of null SNPs (out of 20,000 = 20 null SNPs times 1000 simulation replicates) that reach the specified p-value threshold. Horizontal dashed lines indicate the target p-value thresholds (i.e. the expected type 1 error rates).

More »

Expand

Figure 4.

Manhattan plots for the real phenotype using FaST-LMM exact and alternative software packages.

The points marked in red denote the confirmed significant region from Fakiola et al. (2013). FLMM_E = FaST-LMM using exact calculation, MQLS1972 = MQLS using 1972 genotyped individuals, RT1972 = ROADTRIPS using 1972 genotyped individuals, FBATaff = FBAT using transmissions to affecteds only, FBATboth = FBAT using transmissions to both affecteds and unaffecteds. Results from all other LMM methods were indistinguishable from FLMM_E and so are not shown.

More »

Expand

Table 3.

Concordance between top SNPs identified by different methods.

More »

Expand

Table 4.

Genomic control factors achieved in naive analysis of a single replicate of the simulated longitudinal data sets.

More »

Expand