Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies
Four methods were employed to reanalyze the 107 traits of 199 Arabidopsis thaliana samples genotyped at 250,000 SNPs (a), including a naïve method (t-test), GLM, MLM, and FarmCPU. The first three PCs were included in the GLM and MLM to control population structure. FarmCPU did not use any PCs. The horizontal axis indicates the 107 traits grouped into four categories: resistance, developmental, ionomics, and flowering time. The vertical axis indicates the number of associated SNPs at three significance levels (0.01, 0.05 and 0.1) after Bonferroni multiple test corrections. The previous results were replicated by using the naïve and MLM methods. The naïve method, without any control on population structure and kinship, generates many associated SNPs. The associations due to genetic linkage to known genes are indistinguishable from the background noise. In contrast, the MLM method controls the inflation of P values well; however, the associations due to genetic linkage to known genes are also weakened and indistinguishable from the background. The GLM method generates results that are between the naïve method and the MLM method. Interestingly, for each flowering time trait, FarmCPU revealed multiple genetic loci. Enrichment analysis was performed to evaluate the four statistical methods (b) on the 23 flowering time traits by using flowering time genes. The random hits are expected to have an enrichment coefficient of 1. For the first hit, the enrichment coefficients are 2.4, 2.4, 3.8, and 8.9 for t-test, GLM, MLM, and FarmCPU, respectively. For the top ten hits, the enrichment coefficients are 1.7, 2.3, 2.8, and 4.0 for t-test, GLM, MLM, and FarmCPU, respectively.