Measurement error of a phenotypic trait reduces the power to detect genetic associations. We examined the impact of sample size, allele frequency and effect size in presence of measurement error for quantitative traits. The statistical power to detect genetic association with phenotype mean and variability was investigated analytically. The non-centrality parameter for a non-central F distribution was derived and verified using computer simulations. We obtained equivalent formulas for the cost of phenotype measurement error. Effects of differences in measurements were examined in a genome-wide association study (GWAS) of two grading scales for cataract and a replication study of genetic variants influencing blood pressure. The mean absolute difference between the analytic power and simulation power for comparison of phenotypic means and variances was less than 0.005, and the absolute difference did not exceed 0.02. To maintain the same power, a one standard deviation (SD) in measurement error of a standard normal distributed trait required a one-fold increase in sample size for comparison of means, and a three-fold increase in sample size for comparison of variances. GWAS results revealed almost no overlap in the significant SNPs (p<10−5) for the two cataract grading scales while replication results in genetic variants of blood pressure displayed no significant differences between averaged blood pressure measurements and single blood pressure measurements. We have developed a framework for researchers to quantify power in the presence of measurement error, which will be applicable to studies of phenotypes in which the measurement is highly variable.
Citation: Liao J, Li X, Wong T-Y, Wang JJ, Khor CC, Tai ES, et al. (2014) Impact of Measurement Error on Testing Genetic Association with Quantitative Traits. PLoS ONE 9(1): e87044. doi:10.1371/journal.pone.0087044
Editor: Dmitri Zaykin, National Institute of Environmental Health Sciences, United States of America
Received: August 30, 2013; Accepted: December 17, 2013; Published: January 24, 2014
Copyright: © 2014 Liao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: SiMES and SCES are funded by National Medical Research Council (grants 0796/2003, IRG07nov013, IRG09nov014, STaR/0003/2008 and CG/SERI/2010) and Biomedical Research Council (grants 09/1/35/19/616), Singapore. Ching-Yu Cheng is supported by an award from NMRC (CSA/033/2012) and E-Shyong Tai is supported by an award from NMRC (CSA/008/2009). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
In genome-wide association studies (GWAS), association between large number of single nucleotide polymorphisms (SNPs) and a trait measurement is computed and SNPs with strong associations will be replicated in a separate cohort. Non-differential measurement error in both genotyping and phenotyping reduces the power and hence increases the type II error to identify true associations in discovery cohorts. This decreases the efficiency of GWAS to produce findings in discovery that are less likely to be replicated in subsequent studies. Errors in genotype have been reduced through technological advances and stringent quality controls in SNP genotyping. Measurement and misclassification errors in case-control studies and measurement errors in exposure variables have been well studied–. However, to the best of our knowledge, there is only one paper evaluating the implications of measurement error in a continuous outcome in genetic analysis .
Performing power and sample size calculations allows researchers to manage cost of genotyping effectively. With recent discoveries made using web-based questionnaire for data collection , one may question the trade-off between sample size and accuracy of phenotype measurement to achieve a minimal level of statistical power. Using the asymptotic non-centrality parameter of the distribution, researchers have arrived at power and sample size formulas that account for misclassification error in case-control studies , . Online programs PAWE-PH and PAWE-3D were also developed  and used to demonstrate that in case-control GWAS, there is substantial reduction in statistical power when diagnostic error increases, especially for lower allele frequencies and genotype relative risks . Barendse  recommended checks at phenotype collection stage, but did not offer theoretical solutions in terms of power and sample size calculation.
In this study, firstly we quantified the power to identify genetic variants that affect the means and variability of quantitative traits in GWAS of unrelated individuals in the presence of measurement error, where measurement error was defined as the additional variation introduced to a “true” underlying phenotype. Secondly, we demonstrated the impact of measurement error on the pipeline of GWAS analysis in population-based studies. We presented real data analysis based on two phenotypes: age-related cataract and blood pressure to illustrate the impact of measurement error on GWAS discovery and on genetic replication studies.
Materials and Methods
Power to Detect Differences in Means
We used the following model to describe the phenotype:where is the phenotype for the ith individual, is the phenotype mean, is the effect size of a SNP, is the allelic dosage for the SNP, taking values 0, 1 or 2, and is the noise in the phenotype. We made the following assumptions:
- The marker locus satisfies the Hardy-Weinberg equilibrium (HWE). Hence the genotype frequencies are computed based on p, the minor allele frequency (MAF). is dependent on p via a Binomial distribution.
- follows a standard normal distribution, which can be achieved through standardization of a normally distributed phenotype.
- SNP effects are additive. Without loss of generality, we let . This can be easily extended when by centering the phenotype. Taking the previous assumption into account, the underlying true phenotype is standard normally distributed.
The power for linear regression can be determined using the non-central F distribution, with non-centrality parameter (NCP) , where refers to the total sample size and is the squared correlation coefficient. is computed as follows (Text S1):(2)
Without measurement errors, . With measurement errors, . As ranges from 0 to 1, we require . Since effect sizes in GWAS tend to be very small, this constraint is usually satisfied. Finally, power can be computed as , where is the cumulative distribution function of the non-central F distribution with and 1 degree of freedom, non-centrality parameter , evaluated at the percentile of the F distribution.
Power to Detect Differences in Variances
Following the framework described by Visscher and Posthuma , the underlying model of trait variance assuming there are no covariates is:where is the phenotype for the ith individual, is the phenotype mean, is the phenotype variance, is the effect of a SNP, and is as defined previously. refers to the intercept of the regression of phenotype variability on genotype distribution and is the noise. We added a subscript of ν to denote that these variables are different from the model for comparison of means. In addition to the assumptions made for the previous model, we made the following assumptions:
- The SNP has effect on phenotype variance but not the trait mean.
- Phenotype is standard normally distributed in absence of heterogeneous variance.
We assume that , and via standardization of a normally distributed phenotype. Hence, . The model with and without measurement error is summarized in Table 1. Using the same definition of the non-centrality parameter, we compute power with defined as (Text S1):(3)
Empirical Power Simulations
To verify our findings and assess the power of genetic association testing in the presence of measurement error, we carried out simulation studies under various scenarios. First, we simulated the genotypes based on the Binomial distribution with probability p. For the comparison of phenotype means, the phenotypes were simulated using Equation 1, where the phenotypes have different means for different genotypes under the alternative hypothesis. For the test of difference in variances, the phenotypes were simulated under the normal distribution with mean 0 and variances based on Table 1, and the standardized and squared phenotype was used for testing. We performed 10 000 linear regressions for each simulation configuration and computed the empirical power, assuming . Configurations of model parameters were chosen to suitably represent the reality for future GWAS, where the effect sizes are expected to be very small and large sample sizes are required to detect the effects. Default parameters were p = 0.2, n = 15,000, = 0.06 for the comparison of means and p = 0.2, n = 30,000, = 0.06 for the comparison of variances, and we varied only one parameter at one time. The R software version 2.14.2 was used for the simulations .
Cost coefficients of Phenotype Measurement Error
We defined cost of phenotype measurement error as the percentage increase in sample size required to maintain a constant analytical power for an increase in measurement error. Following the framework of Edwards et al. , we set , where is the non-centrality parameter when there is no measurement error and is the non-centrality parameter when there is measurement error. For comparison of phenotype means, we used Equation 2 with and to obtain the following expression for the cost of phenotype measurement error:
Similarly, for comparison of phenotype variances, we used Equation 3 and by letting for , the following expression was obtained:
The Singapore Malay Eye Study (SiMES) and Singapore Chinese Eye Study (SCES) are population-based cross-sectional epidemiological studies on eye diseases for residents of Singapore. Details of the study design and methodology have been reported and published elsewhere , . In brief, a total of 4,168 Malay and 4,605 Chinese residents in the southwestern part of Singapore, aged 40 to 80 years old, were identified through age-stratified random sampling and were invited to participate in the study, for which 3,280 (response rate, 78.7%) Malays and 3,353 (response rate, 72.8%) Chinese underwent a detailed ocular examination. Ethics approval was obtained from the Singapore Eye Research Institute Institutional Review Board and all participants were provided with written informed consent in adherence to the Declaration of Helsinki.
In the SiMES cohort, nuclear cataract was assessed using two methods: 1) the Lens Opacities Classification System III (LOCS III)  under slit lamp, and 2) the Wisconsin Cataract Grading System (Wisconsin System) based on lens photographs . For LOCS III (decimal grade 0.1 to 6.9), participants went through slit lamp bio-microscopy where nuclear cataract was graded by multiple study ophthalmologists through comparison with standard photographs. For Wisconsin System (decimal grade 0.1 to 5.0), lens photographs were taken using a digital slit-lamp camera (model DC-1 with FD-21 flash attachment; Topcon, Tokyo, Japan) and grading was performed through comparison with standard photographs, at the University of Sydney by a single experienced grader, with adjudication by a senior ophthalmologist. A decimal grade was used if the severity of cataract was judged to be midway between two standards photographs. Higher accuracy and consistency is achieved with lens photographs graded by a single person. Hence, we assume that the Wisconsin System is the preferred grading system and deviation of the LOCS III grading from the Wisconsin System is regarded as measurement error.
In the Chinese cohort, blood pressure was measured according to a protocol used in the Multi-Ethnic Study of Atherosclerosis . Blood pressure was measured twice, at an interval of 5 minutes. A third measurement was performed if blood pressure differed by more than 10 mmHg systolic or 5 mmHg diastolic. Blood pressure was taken as the mean between the two closest readings, which was assumed to be the “true” blood pressure value. The last measured blood pressure reading of an individual was assumed to contain measurement error for systolic and diastolic blood pressure (SBPe and DBPe) and used for association testing in comparison with the “true” values (SBP and DBP).
Genotyping and Data Quality Control
Genotyping of 3,072 and 1,952 samples in SiMES and SCES, respectively, was performed using Illumina Human610-Quad BeadChips (Illumina Inc.). A total of 620,901 SNPs were genotyped in each cohort. An additional 635 samples in SCES was genotyped using Illumina Human OmniExpress BeadChips with a total of 729,698 SNPs. Detailed quality control procedures for sample and SNPs were described elsewhere , . In brief, samples were excluded based on the following conditions: (1) sample call-rates of less than 95%; (2) excessive heterozygosity; (3) cryptic relatedness; (4) gender discrepancies; and (5) discordant ethnic memberships. We excluded SNPs with (1) high missingness (>5%); (2) gross departure from HWE (p value <10−6) and (3) MAF <1%. Detailed quality control procedures for SCES samples genotyped on OmniExpress chips were provided in the supplementary materials (Text S2). After quality control, we have the following samples and SNPs available for analysis: 2,542 samples and 557,824 SNPs in SiMES, 1,889 samples and 538,408 SNPs in SCES on Illumina Human610-Quad BeadChips, and 615 samples and 633,783 SNPs in SCES on Illumina Human OmniExpress BeadChips.
Real Data Analysis
For genome-wide analysis of nuclear cataract in SiMES, we used the nuclear cataract value from the worse eye, where a larger value indicates higher severity. Each phenotype was standardized by subtracting the mean and dividing over the SD of the phenotype. Association testing was performed on standardized nuclear cataract phenotype for comparison of means and squared-standardized nuclear cataract phenotype for comparison of variances. For genetic replication analysis, we analyzed 9 variants which showed significant associations with BP in East Asians . We followed the analysis protocol used by Ehret et al.  for phenotypes DBP, DBPe, SBP and SBPe in each cohort. In brief, linear regression analysis was performed assuming an additive model, adjusted for age, age-squared and body mass index (BMI), with medication corrected BP as the dependent variable. To account for batch effect of data from separate chips in SCES, meta-analysis was performed using an inverse-variance fixed effects model and a Bonferroni adjusted cut off of p value = 0.0055 (0.05/9 tests) was used to control Type I error at 5%.
The PLINK software (version 2.0)  was used for association testing on nuclear cataract and blood pressure phenotypes. We assumed an additive genetic model where individual genotypes were coded according to the number of variant allele present. A trend test within a linear regression model was used to test the associations between phenotypes and SNPs.
Power to Detect Differences in Means and Variances
Figure 1 represents impact of effect size, sample size, and minor allele frequency on analytical power for comparison of phenotypic means and variances. For comparison of phenotypic means, there was substantial decrease in power when measurement error was larger than 0.6 SD of the true phenotype. Decreasing effect size to 0.04 (change in 0.02 SD per additional copy of the risk allele) had the most impact on power, dropping it by 20% even without measurement error. For comparison of phenotypic variances, the impact of measurement error on power was more significant. In most of the simulated configurations, there was substantial decrease in power when measurement error was larger than 0.4 SD. We also noted that an effect size of 0.06 with 0.7 SD of measurement error achieved equivalent power (78%) to an effect size of 0.04 with no measurement error.
Measurement error is displayed in terms of the number of SD of the true phenotype (without errors). The top panel represents comparison of means and three configurations were considered with the rest of the parameters following the default configuration: p = 0.2, n = 15,000, = 0.06. is interpreted as the change in the standardized phenotype for every increase in one effect allele. The bottom panel represents comparison of variances and three configurations were considered with the rest of the parameters following the default configuration: p = 0.2, n = 30,000, = 0.06. is interpreted as the change in the standardized and squared phenotype for every increase in one effect allele.
To verify our findings, we compared the analytical power with the simulated power. The mean (SD) of absolute difference between the analytical power and simulation power for comparison of means and variances was 0.00169 (0.00195) and 0.00418 (0.00398) respectively. The maximum absolute difference for comparison of means and variances was 0.00941 and 0.0197 respectively.
For small effect sizes, C could be approximately equal to . Hence the percentage increase in sample size ranged from 1% to 100% for measurement errors between 0.1 and 1.0 SD. For the analysis of heterogeneity of variances, the cost was almost three times higher as compared to the analysis of heterogeneity of means when the measurement error was equal to 1 SD of the phenotype (Table 2).
Replication and Genome-wide Association Testing Results
A total of 2,349 samples from SiMES with both genotype and phenotype data of Wisconsin System and LOC III grading were included for genome-wide testing. The measurements of nuclear cataract in SiMES varied substantially for some individuals (Figure 2), especially for the standardized and squared phenotype, which has SD of 1.52 and 1.80 for the Wisconsin System and LOCS III, respectively. The Pearson correlation between standardized phenotypes for the two grading systems was 0.71 while the correlation between the standardized and squared phenotypes was 0.56. The average measurement error was 0.0112, which corresponded to about 0.1 SD of the standardized Wisconsin System phenotype. Table 3 displayed the top SNPs (p<10−5) from both grading scales in the GWAS of nuclear cataract in a comparison of phenotypic means. None of the SNPs overlapped.
(A) Standardized phenotype for comparison of means, (B) Bland-Altman plot of difference in standardized phenotype (Wisconsin System – LOCS III) against the average of the two, (C) Standardized and squared phenotype for comparison of variances, and (D) Bland-Altman plot of difference in standardized and squared phenotype (Wisconsin System – LOCS III) against the average of the two.
For genetic replication analysis, a total of 2,490 SCES samples with BP phenotype, age, gender, BMI information and genotype data were included. The Pearson correlations between DBP and DBPe was high ( = 0.92) and the correlations between SBP and SBPe was also high ( = 0.93). The average measurement error, defined as the mean absolute difference between the standardized values of the two measurements for systolic and diastolic blood pressure, was 0.251 and 0.252 respectively, which corresponded to about 0.25 SD of SBP and DBP (Figure 3). Table 4 showed the association results for the 9 variants previously found to influence blood pressure in East Asians. Variants replicated in DBP or SBP were also replicated in their error counterparts (rs633185 and rs17249754).
(A) Standardized phenotype for DBP, (B) Bland-Altman plot of difference in standardized phenotype (DBP – DBPe) against the average of the two, (C) Standardized phenotype for SBP, and (D) Bland-Altman plot of difference in standardized phenotype (SBP – SBPe) against the average of the two.
We derived power calculations that take measurement error into account, which could be used for study design purposes. Using simulations, we verified our calculations and concluded that researchers may perform adequate power and sample size calculations for GWAS in the presence of phenotype measurement error. Recently, Yang, et al. discovered variants related to phenotypic variability of BMI in a GWAS setting . Analyzing phenotypic variability could uncover presence of statistical interactions associated with the genetic variant that has not been account for. Various methods have been proposed for such analysis , . Since measurement error affects the variability of phenotype, it is imperative that its impact on power should be studied closely. Hence, we developed the power analysis framework for comparison of both means and variances.
We used real datasets to demonstrate the impact of using different measurements of the same trait for GWAS. In the GWAS of nuclear cataract, our results displayed almost no overlap between the top SNPs associated with the two measurements. This finding was consistent with the results from Barendse  who also compared GWAS from two independent quantitative trait measurements of subcutaneous fat thickness in animals. In our replication study of BP, SNPs which replicated in the averaged BP measurements were also replicated in the single measurements. The minor differences suggest that failure to replicate is largely attributed to differences in genetic nature of the trait or false discoveries . Based on our sample size, MAF and effect size range in our study, the power of GWAS of BP with a measurement error of 0.25 SD was almost identical to the power of GWAS of BP without measurement error (Figure 4). In the process of reaching these conclusions, we had assumed that the difference between trait measurements were only due to random errors. The Bland-Altman plots of the measurements in Figures 2 and 3 implies that the differences were more likely to occur at random and not due to systematic differences.
(A) By effect size, the parameter values used were p = 0.3, n = 2,490. (B) By MAF, the parameters values used were = 0.05, n = 2,490.
The impact on statistical power is much smaller in the presence of measurement error (of quantitative traits), compared to the presence of misclassification errors (of case-control status) for GWAS. We note that only as the measurement error exceeded 0.4 and 0.6 SD of the phenotype for comparison of means and variances respectively, the decrease in power became substantial. In current times, measurements prone to large errors have mostly been improved through technological advancements, or taking of multiple measurements and averaging them. While measurement error is not easily quantifiable in practice, we provide a framework to estimate measurement error using repeated measurements (Text S3).
In the National Cooperative Gallstone Study, it was reported that 7% and 17% of the variation in observed triglycerides and cholesterol values were attributable to errors respectively . Depending on the settings or instruments used during phenotyping, measurement error in other studies ranged from 0.0035 to 0.63 SD of phenotype–. Knowledge of the impact of measurement error on statistical power can improve the efficiency of the data collection process with the optimal approach.
Our measurement error model has the same power as a classical measurement error model, where the error is in the independent variable instead of the dependent variable. The impact of measurement error under the classical measurement error model has been well studied in the area of econometrics and statistics – and results based on the linear and multivariate linear regression models could be extended to the GWAS framework. As estimates based on measurement error in the dependent variable are more innocuous than that based on the classical measurement error model, one need not apply bias-correction methods such as regression calibrations .
To reduce measurement error, simple methods such as trimming and winsorizing have been used to screen outliers . Application of data trimming in GWAS context was performed by Barendse , where bivariate trimming resulted in improved correlation of two independent measurements of the same phenotype. Bollinger and Chandra, however, highlighted that only in the case where measurement error results in an upward bias in the regression coefficient could the simple outlier screening methods perform well without introducing more bias . Another method in which measurement error can be reduced is through threshold-based sampling . Using a Gaussian mixture model, the distribution of phenotype measurement can be described using three mixture Gaussian components, one for each genotype (AA, AB or BB). Samples with phenotype measurement that fall between two genotype distributions would likely be due to measurement error and subsequently be excluded from analysis. Although this method results in a reduction of sample size, there is a potential gain in power through decreased variability of phenotype. Power calculations for threshold traits with two categories (case-control) in association-based studies have been described by Gorden et al. and Purcell et al. , . We suggest that if the power quantified based on our framework is low, apart from collection of additional samples, the sampling method based on mixture models could be a good choice for consideration.
In this work, we chose to compute power based on the simple linear regression framework and additive allele effects. We recognize that there are other tests available for testing association in GWAS , . Linear regression has the advantage of simplicity in implementation across cohorts in large meta-analyses, and is able to incorporate covariates and interactions. Our method can be extended to other types of allelic effects: multiplicative, dominant and recessive, by computing the relevant expected values such as those in Table 1. Our work is restricted by other model assumptions which include independent random errors and normality of phenotype. For large sample sizes, linear regression can perform well with data which deviate far from normality .
Our results have important implications in practice. The methods of assessing the power of the sample size calculation in GWAS, which do not account for potential measurement errors, may optimistically over-estimate the power or equivalently under-estimate the sample size required. In the present study, we recommend the computation of sample size and power for GWAS of traits that have low repeatability, or differ between different grading scales and machinery, by a magnitude of more than 0.6 and 0.4 SD of true phenotype for comparison of means and variances respectively. A pilot study with multiple measurements is recommended to estimate the measurement error using our proposed method. This is to ensure accurate sample size calculation before GWAS. Finally, we note that the statistical power incorporating measurement errors is straightforward to compute using any software that provides values under the F distribution probability density function and the R code is available at request from the authors.
Derivation of squared correlation coefficient for comparison of phenotypic means and variability.
Detailed quality control procedures for SCES samples genotyped on OmniExpress chips.
Estimation of measurement error using repeated measurements.
Conceived and designed the experiments: JL XL CYC. Performed the experiments: JL. Analyzed the data: JL XL. Contributed reagents/materials/analysis tools: TYW JJW CCK EST TA YYT. Wrote the paper: JL XL TYW JJW CCK EST YYT CYC.
- 1. Buonaccorsi JP, Laake P, Veierod MB (2011) On the power of the Cochran-Armitage test for trend in the presence of misclassification. Stat Methods Med Res.
- 2. Lindstrom S, Yen YC, Spiegelman D, Kraft P (2009) The impact of gene-environment dependence and misclassification in genetic association studies incorporating gene-environment interactions. Hum Hered 68: 171–181.
- 3. Cheng KF, Lin WJ (2009) The effects of misclassification in studies of gene-environment interactions. Hum Hered 67: 77–87.
- 4. Wojczynski MK, Tiwari HK (2008) Definition of phenotype. Adv Genet 60: 75–105.
- 5. Gordon D, Haynes C, Yang Y, Kramer PL, Finch SJ (2007) Linear trend tests for case-control genetic association that incorporate random phenotype and genotype misclassification error. Genet Epidemiol 31: 853–870.
- 6. Barendse W (2011) The effect of measurement error of phenotypes on genome wide association studies. BMC Genomics 12: 232.
- 7. Eriksson N, Benton GM, Do CB, Kiefer AK, Mountain JL, et al.. (2012) Genetic variants associated with breast size also influence breast cancer risk. Bmc Medical Genetics 13.
- 8. Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D (2005) Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet 6: 18.
- 9. Gordon D, Finch SJ, Nothnagel M, Ott J (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered 54: 22–33.
- 10. Gordon D, Haynes C, Blumenfeld J, Finch SJ (2005) PAWE-3D: visualizing power for association with error in case-control genetic studies of complex traits. Bioinformatics 21: 3935–3937.
- 11. Samuels DC, Burn DJ, Chinnery PF (2009) Detecting new neurodegenerative disease genes: does phenotype accuracy limit the horizon? Trends in Genetics 25: 486–488.
- 12. Kleinbaum DG, Kleinbaum DG (2007) Applied regression analysis and other multivariable methods. Australia; Belmont, CA: Brooks/Cole. xxi, 906 p. p.
- 13. Visscher PM, Posthuma D (2010) Statistical power to detect genetic Loci affecting environmental sensitivity. Behav Genet 40: 728–733.
- 14. R Development Core Team (2012) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- 15. Foong AW, Saw SM, Loo JL, Shen S, Loon SC, et al. (2007) Rationale and methodology for a population-based study of eye diseases in Malay people: The Singapore Malay eye study (SiMES). Ophthalmic Epidemiol 14: 25–35.
- 16. Lavanya R, Jeganathan VS, Zheng Y, Raju P, Cheung N, et al. (2009) Methodology of the Singapore Indian Chinese Cohort (SICC) eye study: quantifying ethnic variations in the epidemiology of eye diseases in Asians. Ophthalmic Epidemiol 16: 325–336.
- 17. Chylack LT Jr, Wolfe JK, Singer DM, Leske MC, Bullimore MA, et al. (1993) The Lens Opacities Classification System III. The Longitudinal Study of Cataract Study Group. Arch Ophthalmol 111: 831–836.
- 18. Klein BE, Klein R, Linton KL, Magli YL, Neider MW (1990) Assessment of cataracts from photographs in the Beaver Dam Eye Study. Ophthalmology 97: 1428–1433.
- 19. Manolio TA, Fishel SC, Beattie C, Torres J, Christopherson R, et al. (1988) Evaluation of the Dinamap continuous blood pressure monitor. Am J Hypertens 1: 161S–167S.
- 20. Fan Q, Zhou X, Khor CC, Cheng CY, Goh LK, et al. (2011) Genome-wide meta-analysis of five Asian cohorts identifies PDGFRA as a susceptibility locus for corneal astigmatism. PLoS Genet 7: e1002402.
- 21. Fan Q, Barathi VA, Cheng CY, Zhou X, Meguro A, et al. (2012) Genetic variants on chromosome 1q41 influence ocular axial length and high myopia. PLoS Genet 8: e1002753.
- 22. Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, et al. (2011) Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478: 103–109.
- 23. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
- 24. Yang J, Loos RJF, Powell JE, Medland SE, Speliotes EK, et al.. (2012) FTO genotype is associated with phenotypic variability of body mass index. Nature 490: 267-+.
- 25. Conover WJ, Johnson ME, Johnson MM (1981) A Comparative-Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental-Shelf Bidding Data. Technometrics 23: 351–361.
- 26. van der Sluis S, Verhage M, Posthuma D, Dolan CV (2010) Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies. PLoS One 5: e13929.
- 27. Lachin JM (2004) The role of measurement reliability in clinical trials. Clin Trials 1: 553–566.
- 28. Li H, Leung CK, Wong L, Cheung CY, Pang CP, et al.. (2008) Comparative study of central corneal thickness measurement with slit-lamp optical coherence tomography and visante optical coherence tomography. Ophthalmology 115: 796–801 e792.
- 29. Diem P, Walchli M, Mullis PE, Marti U (2004) Agreement between HbA1c measured by DCA 2000 and by HPLC: effects of fetal hemoglobin concentrations. Arch Med Res 35: 145–149.
- 30. Pannarale G, Bebb G, Clark S, Sullivan A, Foster C, et al. (1993) Bias and variability in blood pressure measurement with ambulatory recorders. Hypertension 22: 591–598.
- 31. Lam AK, Chan R, Pang PC (2001) The repeatability and accuracy of axial length and anterior chamber depth measurements from the IOLMaster. Ophthalmic Physiol Opt 21: 477–483.
- 32. Cochran WG (1968) Errors of Measurement in Statistics. Technometrics 10: 637–666.
- 33. Devine OJ, Smith JM (1998) Estimating sample size for epidemiologic studies: the impact of ignoring exposure measurement uncertainty. Stat Med 17: 1375–1389.
- 34. Fuller WA (1987) Measurement Error Models. New York: John Wiley & Sons, Inc.
- 35. Xiaohong Chen HH, Elie Tamer (2005) Measurement error models with auxiliary data. Review of Economic Studies 72: 343–366.
- 36. Hardin JW, Schmiediche H, Carroll RJ (2003) The regression calibration method for fitting generalized linear models with additive measurement error. The Stata Journal 8: 361–372.
- 37. Tukey JW (1962) The future of data analysis. Annals of Mathematical Statistics 33: 1–67.
- 38. Bollinger CR, Chandra A (2005) Iatrogenic specification error: a cautionary tale of cleaning data. Journal of Labor Economics 23: 235–257.
- 39. Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sunderland, Mass.: Sinauer. xvi, 980 p. p.
- 40. Purcell S, Cherny SS, Sham PC (2003) Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19: 149–150.
- 41. Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5: 251–261.
- 42. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7: 781–791.
- 43. Li X, Wong W, Lamoureux EL, Wong TY (2012) Are linear regression techniques appropriate for analysis when the dependent (outcome) variable is not normally distributed? Invest Ophthalmol Vis Sci 53: 3082–3083.