Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies
The flowering time at 16°C was measured on 199 Arabidopsis thaliana individuals genotyped with 250,000 SNPs. Seven statistical methods were employed to conduct the association studies: (a) t-test (naïve method), which tests the additive genetic effect of markers, one marker at a time, with the marker as the only explanatory variable; (b) GLM; (c) MLM; (d) CMLM; (e) FaST-LMM-Select; (f) MLMM; and (g) FarmCPU. All methods, except the t-test, MLMM and FarmCPU, included the first three PCs derived from the genetic markers as covariates. FarmCPU identified five associated SNPs after Bonferroni multiple test correction, including three within a distance of 50,000 base pairs to known genes such as FLC. MLMM identified two associated SNPs after Bonferroni multiple test correction, and overlapped with the five associated SNPs from FarmCPU results. With all other methods, these genes are indistinguishable from the background noise.