^{1}

^{2}

^{2}

^{3}

^{1}

^{2}

^{4}

^{5}

^{1}

^{6}

^{4}

^{7}

^{8}

^{1}

^{2}

^{3}

^{4}

^{1}

^{2}

^{4}

^{8}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JL XL CYC. Performed the experiments: JL. Analyzed the data: JL XL. Contributed reagents/materials/analysis tools: TYW JJW CCK EST TA YYT. Wrote the paper: JL XL TYW JJW CCK EST YYT CYC.

Measurement error of a phenotypic trait reduces the power to detect genetic associations. We examined the impact of sample size, allele frequency and effect size in presence of measurement error for quantitative traits. The statistical power to detect genetic association with phenotype mean and variability was investigated analytically. The non-centrality parameter for a non-central ^{−5}) for the two cataract grading scales while replication results in genetic variants of blood pressure displayed no significant differences between averaged blood pressure measurements and single blood pressure measurements. We have developed a framework for researchers to quantify power in the presence of measurement error, which will be applicable to studies of phenotypes in which the measurement is highly variable.

In genome-wide association studies (GWAS), association between large number of single nucleotide polymorphisms (SNPs) and a trait measurement is computed and SNPs with strong associations will be replicated in a separate cohort. Non-differential measurement error in both genotyping and phenotyping reduces the power and hence increases the type II error to identify true associations in discovery cohorts. This decreases the efficiency of GWAS to produce findings in discovery that are less likely to be replicated in subsequent studies. Errors in genotype have been reduced through technological advances and stringent quality controls in SNP genotyping. Measurement and misclassification errors in case-control studies and measurement errors in exposure variables have been well studied

Performing power and sample size calculations allows researchers to manage cost of genotyping effectively. With recent discoveries made using web-based questionnaire for data collection

In this study, firstly we quantified the power to identify genetic variants that affect the means and variability of quantitative traits in GWAS of unrelated individuals in the presence of measurement error, where measurement error was defined as the additional variation introduced to a “true” underlying phenotype. Secondly, we demonstrated the impact of measurement error on the pipeline of GWAS analysis in population-based studies. We presented real data analysis based on two phenotypes: age-related cataract and blood pressure to illustrate the impact of measurement error on GWAS discovery and on genetic replication studies.

We used the following model to describe the phenotype:^{th} individual,

The marker locus satisfies the Hardy-Weinberg equilibrium (HWE). Hence the genotype frequencies are computed based on

SNP effects are additive. Without loss of generality, we let

With measurement error

The power for linear regression can be determined using the non-central F distribution, with non-centrality parameter (NCP)

Without measurement errors,

Following the framework described by Visscher and Posthuma ^{th} individual,

The SNP has effect on phenotype variance but not the trait mean.

Phenotype is standard normally distributed in absence of heterogeneous variance.

We assume that

Genotype | Frequency | Genotype Indicator | ^{2}) |
^{4}) |
||

AA | 0 | 1 | 3 | |||

AB | 1 | |||||

BB | 2 |

To verify our findings and assess the power of genetic association testing in the presence of measurement error, we carried out simulation studies under various scenarios. First, we simulated the genotypes

We defined cost of phenotype measurement error as the percentage increase in sample size required to maintain a constant analytical power for an increase in measurement error. Following the framework of Edwards et al.

Similarly, for comparison of phenotype variances, we used

The Singapore Malay Eye Study (SiMES) and Singapore Chinese Eye Study (SCES) are population-based cross-sectional epidemiological studies on eye diseases for residents of Singapore. Details of the study design and methodology have been reported and published elsewhere

In the SiMES cohort, nuclear cataract was assessed using two methods: 1) the Lens Opacities Classification System III (LOCS III)

In the Chinese cohort, blood pressure was measured according to a protocol used in the Multi-Ethnic Study of Atherosclerosis _{e} and DBP_{e}) and used for association testing in comparison with the “true” values (SBP and DBP).

Genotyping of 3,072 and 1,952 samples in SiMES and SCES, respectively, was performed using Illumina Human610-Quad BeadChips (Illumina Inc.). A total of 620,901 SNPs were genotyped in each cohort. An additional 635 samples in SCES was genotyped using Illumina Human OmniExpress BeadChips with a total of 729,698 SNPs. Detailed quality control procedures for sample and SNPs were described elsewhere ^{−6}) and (3) MAF <1%. Detailed quality control procedures for SCES samples genotyped on OmniExpress chips were provided in the supplementary materials (

For genome-wide analysis of nuclear cataract in SiMES, we used the nuclear cataract value from the worse eye, where a larger value indicates higher severity. Each phenotype was standardized by subtracting the mean and dividing over the SD of the phenotype. Association testing was performed on standardized nuclear cataract phenotype for comparison of means and squared-standardized nuclear cataract phenotype for comparison of variances. For genetic replication analysis, we analyzed 9 variants which showed significant associations with BP in East Asians _{e}, SBP and SBP_{e} in each cohort. In brief, linear regression analysis was performed assuming an additive model, adjusted for age, age-squared and body mass index (BMI), with medication corrected BP as the dependent variable. To account for batch effect of data from separate chips in SCES, meta-analysis was performed using an inverse-variance fixed effects model and a Bonferroni adjusted cut off of

The PLINK software (version 2.0)

Measurement error is displayed in terms of the number of SD of the true phenotype (without errors). The top panel represents comparison of means and three configurations were considered with the rest of the parameters following the default configuration:

To verify our findings, we compared the analytical power with the simulated power. The mean (SD) of absolute difference between the analytical power and simulation power for comparison of means and variances was 0.00169 (0.00195) and 0.00418 (0.00398) respectively. The maximum absolute difference for comparison of means and variances was 0.00941 and 0.0197 respectively.

For small effect sizes,

Measurement Error |
||

0.1 | 1.0 | 2.0 |

0.2 | 4.0 | 8.0 |

0.3 | 9.0 | 18.3 |

0.4 | 16.0 | 33.7 |

0.5 | 25.0 | 54.7 |

0.6 | 36.0 | 82.6 |

0.7 | 49.1 | 118.5 |

0.8 | 64.1 | 163.9 |

0.9 | 81.1 | 220.5 |

1.0 | 100.1 | 290.4 |

The following parameter values were used:

The following parameter values were used:

A total of 2,349 samples from SiMES with both genotype and phenotype data of Wisconsin System and LOC III grading were included for genome-wide testing. The measurements of nuclear cataract in SiMES varied substantially for some individuals (^{−5}) from both grading scales in the GWAS of nuclear cataract in a comparison of phenotypic means. None of the SNPs overlapped.

(A) Standardized phenotype for comparison of means, (B) Bland-Altman plot of difference in standardized phenotype (Wisconsin System – LOCS III) against the average of the two, (C) Standardized and squared phenotype for comparison of variances, and (D) Bland-Altman plot of difference in standardized and squared phenotype (Wisconsin System – LOCS III) against the average of the two.

SNP | Chr | Position(bp) | Effectallele | MAF | Effectsize | P value |

Wisconsin System | ||||||

rs11184985 | 1 | 107,115,133 | C | 0.37 | 0.13 | 7.82×10^{−6} |

rs12133448 | 1 | 107,100,064 | A | 0.40 | −0.13 | 5.94×10^{−6} |

rs1401830 | 1 | 107,068,638 | A | 0.37 | 0.13 | 9.09×10^{−6} |

rs777965 | 3 | 105,954,655 | A | 0.24 | 0.17 | 3.26×10^{−7} |

rs9985272 | 3 | 176,362,024 | A | 0.10 | −0.22 | 3.73×10^{−6} |

rs6879319 | 5 | 117,214,194 | G | 0.37 | −0.14 | 4.18×10^{−6} |

rs17066166 | 6 | 137,585,624 | T | 0.17 | 0.18 | 5.05×10^{−6} |

rs12931881 | 16 | 83,436,787 | A | 0.15 | 0.20 | 1.04×10^{−6} |

LOCS III | ||||||

rs4676323 | 2 | 107,164,560 | G | 0.13 | 0.19 | 7.81×10^{−6} |

rs1981845 | 5 | 53,734,292 | A | 0.29 | −0.14 | 8.52×10^{−6} |

rs17072293 | 6 | 143,564,955 | G | 0.04 | −0.40 | 4.43×10^{−7} |

rs6977512 | 7 | 39,471,584 | T | 0.26 | 0.15 | 5.52×10^{−6} |

rs917454 | 7 | 32,196,702 | G | 0.38 | 0.14 | 1.94×10^{−6} |

rs2160766 | 8 | 129,207,845 | T | 0.09 | 0.24 | 2.13×10^{−6} |

rs10760430 | 9 | 128,205,909 | A | 0.32 | −0.15 | 4.48×10^{−6} |

rs11255087 | 10 | 7,441,387 | G | 0.03 | −0.44 | 2.70×10^{−6} |

rs2724188 | 12 | 98,372,331 | A | 0.24 | 0.15 | 5.94×10^{−6} |

rs309427 | 15 | 82,932,421 | G | 0.03 | −0.43 | 3.71×10^{−6} |

rs13038799 | 20 | 61,200,607 | C | 0.03 | −0.44 | 1.86×10^{−6} |

rs3021272 | 22 | 38,730,950 | G | 0.03 | −0.45 | 5.58×10^{−8} |

rs4145526 | 22 | 14,577,021 | C | 0.03 | −0.42 | 3.25×10^{−6} |

For genetic replication analysis, a total of 2,490 SCES samples with BP phenotype, age, gender, BMI information and genotype data were included. The Pearson correlations between DBP and DBP_{e} was high (_{e} was also high (

(A) Standardized phenotype for DBP, (B) Bland-Altman plot of difference in standardized phenotype (DBP – DBP_{e}) against the average of the two, (C) Standardized phenotype for SBP, and (D) Bland-Altman plot of difference in standardized phenotype (SBP – SBP_{e}) against the average of the two.

DBP | DBP_{e} |
SBP | SBP_{e} |
||||||||||

Index SNP | Chr | Position | Gene | EA | MAF | Beta | P value | Beta | P value | Beta | P value | Beta | P value |

rs1458038 | 4 | 81,383,747 | FGF5 | T | 0.43 | 0.037 | 0.163 | 0.007 | 0.804 | 0.041 | 0.097 | 0.025 | 0.319 |

rs1173771 | 5 | 32,850,785 | NPR3-C5orf23 | G | 0.32 | 0.031 | 0.283 | 0.046 | 0.127 | 0.019 | 0.469 | 0.027 | 0.326 |

rs11191548 | 10 | 104,836,168 | CYP17A1-NT5C2 | T | 0.25 | −0.0009 | 0.975 | 0.011 | 0.731 | 0.018 | 0.518 | 0.018 | 0.541 |

rs381815 | 11 | 16,858,844 | PLEKHA7 | T | 0.14 | 0.050 | 0.186 | 0.037 | 0.342 | 0.097 | 5.8×10^{−3} |
0.086 | 0.016 |

rs633185 | 11 | 100,098,748 | FLJ32810-TMEM133 | C | 0.48 | 0.101 | 1.6×10^{−4} |
0.111 | 5.3×10^{−5} |
0.087 | 3.9×10^{−4} |
0.089 | 3.9×10^{−4} |

rs17249754 | 12 | 88,584,717 | ATP2B1 | G | 0.32 | 0.043 | 0.142 | 0.015 | 0.624 | 0.084 | 2.1×10^{−3} |
0.090 | 1.2×10^{−3} |

rs1378942 | 15 | 72,864,420 | CYP1A1-ULK3 | A | 0.18 | 0.017 | 0.635 | 0.022 | 0.536 | 0.033 | 0.311 | 0.030 | 0.370 |

rs2521501 | 15 | 89,238,392 | FURIN-FES | T | 0.09 | 0.081 | 0.118 | 0.057 | 0.288 | 0.094 | 0.051 | 0.126 | 9.9×10^{−3} |

rs1327235 | 20 | 10,917,030 | JAG1 | G | 0.45 | 0.042 | 0.120 | 0.048 | 0.084 | 0.017 | 0.503 | 0.009 | 0.719 |

^{−3}. Significance level was set at 0.05/9 = 0.0055.

We derived power calculations that take measurement error into account, which could be used for study design purposes. Using simulations, we verified our calculations and concluded that researchers may perform adequate power and sample size calculations for GWAS in the presence of phenotype measurement error. Recently, Yang, et al. discovered variants related to phenotypic variability of BMI in a GWAS setting

We used real datasets to demonstrate the impact of using different measurements of the same trait for GWAS. In the GWAS of nuclear cataract, our results displayed almost no overlap between the top SNPs associated with the two measurements. This finding was consistent with the results from Barendse

(A) By effect size, the parameter values used were

The impact on statistical power is much smaller in the presence of measurement error (of quantitative traits), compared to the presence of misclassification errors (of case-control status) for GWAS. We note that only as the measurement error exceeded 0.4 and 0.6 SD of the phenotype for comparison of means and variances respectively, the decrease in power became substantial. In current times, measurements prone to large errors have mostly been improved through technological advancements, or taking of multiple measurements and averaging them. While measurement error is not easily quantifiable in practice, we provide a framework to estimate measurement error using repeated measurements (

In the National Cooperative Gallstone Study, it was reported that 7% and 17% of the variation in observed triglycerides and cholesterol values were attributable to errors respectively

Our measurement error model has the same power as a classical measurement error model, where the error is in the independent variable instead of the dependent variable. The impact of measurement error under the classical measurement error model has been well studied in the area of econometrics and statistics

To reduce measurement error, simple methods such as trimming and winsorizing have been used to screen outliers

In this work, we chose to compute power based on the simple linear regression framework and additive allele effects. We recognize that there are other tests available for testing association in GWAS

Our results have important implications in practice. The methods of assessing the power of the sample size calculation in GWAS, which do not account for potential measurement errors, may optimistically over-estimate the power or equivalently under-estimate the sample size required. In the present study, we recommend the computation of sample size and power for GWAS of traits that have low repeatability, or differ between different grading scales and machinery, by a magnitude of more than 0.6 and 0.4 SD of true phenotype for comparison of means and variances respectively. A pilot study with multiple measurements is recommended to estimate the measurement error using our proposed method. This is to ensure accurate sample size calculation before GWAS. Finally, we note that the statistical power incorporating measurement errors is straightforward to compute using any software that provides values under the F distribution probability density function and the R code is available at request from the authors.

(DOC)

(DOC)

(DOC)