^{1}

^{*}

^{1}

^{2}

MAL, JO, and DG conceived and designed the experiments. MAL performed the experiments. MAL analyzed the data. MAL contributed reagents/materials/analysis tools. MAL and DG wrote the paper.

The authors have declared that no competing interests exist.

Because current molecular haplotyping methods are expensive and not amenable to automation, many researchers rely on statistical methods to infer haplotype pairs from multilocus genotypes, and subsequently treat these inferred haplotype pairs as observations. These procedures are prone to haplotype misclassification. We examine the effect of these misclassification errors on the false-positive rate and power for two association tests. These tests include the standard likelihood ratio test (LRT_{std}) and a likelihood ratio test that employs a double-sampling approach to allow for the misclassification inherent in the haplotype inference procedure (LRT_{ae}). We aim to determine the cost–benefit relationship of increasing the proportion of individuals with molecular haplotype measurements in addition to genotypes to raise the power gain of the LRT_{ae} over the LRT_{std}. This analysis should provide a guideline for determining the minimum number of molecular haplotypes required for desired power. Our simulations under the null hypothesis of equal haplotype frequencies in cases and controls indicate that (1) for each statistic, permutation methods maintain the correct type I error; (2) specific multilocus genotypes that are misclassified as the incorrect haplotype pair are consistently misclassified throughout each entire dataset; and (3) our simulations under the alternative hypothesis showed a significant power gain for the LRT_{ae} over the LRT_{std} for a subset of the parameter settings. Permutation methods should be used exclusively to determine significance for each statistic. For fixed cost, the power gain of the LRT_{ae} over the LRT_{std} varied depending on the relative costs of genotyping, molecular haplotyping, and phenotyping. The LRT_{ae} showed the greatest benefit over the LRT_{std} when the cost of phenotyping was very high relative to the cost of genotyping. This situation is likely to occur in a replication study as opposed to a whole-genome association study.

Localizing genes for complex genetic diseases presents a major challenge. Recent technological advances such as genotyping arrays containing hundreds of thousands of genomic “landmarks,” and databases cataloging these “landmarks” and the levels of correlation between them, have aided in these endeavors. To utilize these resources most effectively, many researchers employ a gene-mapping technique called haplotype-based association in order to examine the variation present at multiple genomic sites jointly for a role in and/or an association with the disease state. Although methods that determine haplotype pairs directly by biological assays are currently available, they rarely are used due to their expense and incongruity to automation. Statistical methods provide an inexpensive, relatively accurate means to determine haplotype pairs. However, these statistical methods can provide erroneous results. In this article, the authors compare a standard statistical method for performing a haplotype-based association test with a method that accounts for the misclassification of haplotype pairs as part of the test. Under a number of feasible scenarios, the performance of the new test exceeded that of the standard test.

With the advent of the HAPMAP project [

Methods for explicit determination of phased haplotypes are available [

Several statistical methods are available to perform tests of haplotype-based case-control association. One method calculates the likelihood of the data in terms of the estimated haplotype frequencies. An alternative method relies on the use of a contingency table containing the case-control counts for each inferred haplotype. The counts in the contingency table can be determined either by inferring phased haplotypes for each individual or by multiplying each haplotype frequency estimate by the total number of haplotypes in the study. Many researchers find the latter method appealing since it applies the same format as the classic genotypic and allelic case-control studies, and explicitly accounts for each phased haplotype. As a result, many researchers employ this method in practice [

However, misclassifications can lower a study's power and/or affect the false-positive rate. The act of calling haplotype pairs from multilocus genotypes in the phase-ambiguous situation is similar to the act of dichotomizing continuous measures. Royston et al. document a loss in power when dichotomizing continuous predictor variables in a regression analysis [

Although there have been several studies aimed at evaluating the accuracy of haplotype inference and haplotype frequency estimation procedures [^{2} distribution and permutation methods to evaluate the significance of the test statistics we employ; and (3) compare the power of our test statistic which accounts for haplotype misclassification with the power of the standard likelihood ratio test statistic when the costs are fixed.

In order to detect an association between a haplotype pair and disease status, we employed two statistical tests on 2 × _{std}) and a likelihood ratio test that employs a double-sampling approach to allow for the misclassification inherent in the haplotype inference procedure (LRT_{ae}). The LRT_{std} is a likelihood ratio statistic that treats the called haplotype pairs as observations, and as a result, the likelihood is the multinomial distribution where the called haplotype pairs are the categories [_{ae} statistic is a likelihood ratio statistic that employs a double-sampling procedure to account for the misclassification present in haplotype inference. On all the individuals in the study, there is a fallible measure [_{ae} procedure estimates the misclassification rates present in the fallible data and incorporates this information into the likelihood calculation [_{std} and LRT_{ae} statistics including notation and computation are provided in

We applied two methods for evaluating the ^{2} distribution to find the ^{2} distribution asymptotically for large sample sizes [

To investigate the behavior of these test statistics for a variety of situations, we applied these statistical tests to many simulated datasets. _{std}; however, LRT_{ae} requires additional information in the form of molecular haplotypes for a subset of the individuals in the study. Two alternative procedures for selecting individuals for the double sample (individuals with molecular haplotypes in addition to genotypes) were employed. In one selection scheme, individuals were selected randomly. In the other selection scheme, individuals possessing the most ambiguity in their statistically inferred haplotype pairs were prioritized in selecting the double sample. Specifically, we double-sampled those individuals with the smallest posterior probabilities associated with their inferred haplotype pair up to a posterior probability threshold,

(A) shows type I error, and (B) shows power by way of data simulation.

For the simplest non-trivial case, the scenario in which the haplotype under evaluation includes two SNPs, we applied a fractional factorial design [^{k}

Fractional Factorial Design Parameter Settings for the Study of Type I Error Assuming the Haplotype under Investigation Contains Two SNP Markers

To evaluate each test statistic's ability to maintain the correct type I error, we examined the distribution of the

We also evaluated the behavior of these statistics under the hypothesis that an unobserved disease locus exists in linkage disequilibrium (LD) with the haplotype under study. _{2})_{2} = R_{1}_{2} = R_{1}^{2}_{1}_{2}_{i},_{i}_{1}_{2}_{1}=_{1}/_{0}and _{2}=_{2}/_{0}, respectively [

Factorial Design Parameter Settings for the Study of Power Assuming the Haplotype under Investigation Contains Two SNP Markers

As with the study of type I error, we inferred the haplotypes for the power simulations with both SNPHAP v 1.3.1 and PHASE v 2.1.1. The proportion of individuals double-sampled, α, for the LRT_{ae} method (random double-sample selection) was set at 0.75. For the threshold double-sample selection,

To compare the power of the two test statistics, we evaluated the power of the statistics under fixed cost conditions. Since the LRT_{ae} requires the additional cost associated with obtaining molecular haplotypes on a subset of the samples, we reduced the number of samples when the LRT_{ae} statistic was applied so that the same total cost would be incurred as for the runs with the LRT_{std}. The reduced sample size for the LRT_{ae} sample was computed using ^{DS}_{ae}; _{std}; _{p}_{g}_{mh}/C_{g})_{ae} sample that have molecular haplotypes determined (double-sampling proportion). We consider the phenotyping costs, _{p}_{p}/C_{g}_{std} method, the corresponding total sample size for the LRT_{ae} method, ^{DS},_{ae} method. If _{p}/C_{g}^{DS},_{std}, ^{DS}_{ae} determined from the expectation of _{ae} and LRT_{std} for the situation of a haplotype comprising two SNPs.

Through additional simulations, we investigated the behavior of these statistics when applied to haplotypes comprising larger numbers of SNPs. Because these simulations required additional computational time, we only used SNPHAP v 1.3.1 (see _{ae} statistic. When the random double-sample selection method was used, the double-sample proportion,

(A) shows the LD for 15 SNP markers within the proximal promoter region of human pituitary expressed growth hormone (GH1), and (B) shows the LD for ten SNP markers within the

For all the simulations performed, we recorded the details of the misclassifications that occurred. Specifically, for every replicate, we computed the misclassification rates,

Our results for type I error and power were almost identical from the simulations utilizing SNPHAP v 1.3.1 and PHASE v 2.1.1 for the haplotype inference. Although we present graphs and tables that display the results provided by SNPHAP v 1.3.1 for the haplotype inference, the reader should note that similar results were found using PHASE v 2.1.1.

The type I error simulations demonstrated that the approach for determining statistical significance is critical for maintaining the correct false-positive rate. Although the KS and Anderson-Darling (AD) test results indicated that the distribution of permutation _{std} and LRT_{ae} (using the random and threshold double-sample selection methods) association tests in which statistical significance was indicated by permutation and asymptotic _{ae} are anti-conservative whereas those for LRT_{std} fluctuate between conservative and anti-conservative values. In contrast, the permutation

The ^{2} distribution. The 18 runs correspond to the combinations of parameter settings described in _{ae} was computed with the random and threshold double-sample selection methods. When the threshold double-sample method was used to compute LRT_{ae}, the setting of

Based on the results for the false-positive rates, we conclude that power can only be evaluated using the permutation _{ae} (using the random and threshold double-sample selection methods) to LRT_{std}. _{ae} power − LRT_{std} power) at various significance levels for the two cost ratios _{p}/C_{g}_{p}/C_{g}

Summary Statistics for Power Difference (LRT_{ae} − LRT_{std}) at Various Significance Levels

For the random double-sample selection method, the minimum power difference observed occurred when _{p}/C_{g}_{2}_{ae} power was 0.544 and LRT_{std} power was 0.606. The maximum power difference observed occurred when _{p}/C_{g}_{2}_{ae} power was 0.910 and LRT_{std} power was 0.775.

For the threshold double-sample selection method, the minimum power difference observed occurred when _{p}/C_{g}_{2}_{ae} power was 0.821 and LRT_{std} power was 0.831. The maximum power difference observed occurred when _{p}/C_{g}_{2}_{ae} power was 0.573 and LRT_{std} power was 0.411.

In the spirit of response surface analysis for factorial design [_{ae} computed with the random double-sample selection method. These parameter settings are a dominant disease model with _{2}_{ae} and LRT_{std} methods at the 0.05, 0.01, and 0.001 significance levels for both cost ratios of _{p}/C_{g}_{p}/C_{g}_{ae} with the random double-sample selection method. _{ae} and the LRT_{std} as a function of _{p}/C_{g}_{p}/C_{g}_{std}, _{ae} provides a power advantage over the LRT_{std} when _{ae} is less powerful than LRT_{std} for these parameter settings. The maximum power loss is 0.58 and occurs when _{ae} method, ^{DS}_{std} method,

Various settings for the cost ratio of molecular haplotyping to genotyping _{2}_{ae} was computed with the random double-sample selection method only. Haplotype pairs were inferred using SNPHAP v 1.3.1.

_{ae} is always at least as powerful as the LRT_{std} when _{p}/C_{g}

False-Positive Rate Estimates for Simulations with Generating Population Haplotype Frequencies Based on the Horan and HAPMAP

In our power study for haplotypes comprising five SNPs, we again used the disease model parameter settings that provided the maximum power difference (LRT_{ae} power − LRT_{std} power) for the two-SNP factorial design (_{ae} computed using random double-sample selection. These parameter settings are a dominant disease model with _{2}_{ae}, the double-sample proportion _{ae}, the setting of

For the Horan dataset, the power estimates for the LRT_{std} and the LRT_{ae} were almost identical at the 0.05, 0.01, and 0.001 significance levels for cost ratios _{p}/C_{g})_{ae} statistic reduces to the LRT_{std}. Therefore, the high degree of similarity in power for these statistics is not surprising.

For the HAPMAP _{std} and LRT_{ae} methods at the 0.05, 0.01, and 0.001 significance levels assuming fixed costs. When _{p}/C_{g}_{ae} provides a substantial power benefit over the LRT_{std} with the power difference ranging from 6% and 7% at a significance level of 0.05, to 14% and 21% at a significance level of 0.001 for random double-sample selection and threshold double-sample selection, respectively. When _{p}/C_{g}_{ae} over the LRT_{std} is still substantial for threshold double-sample selection, but more modest for random double-sample selection. For the three significance levels under investigation, the power difference ranged from 7% to 22%, and 1% to 3.5% for threshold and random double-sample selection, respectively.

Power Estimates for Simulations with Generating Population Haplotype Frequencies Based on the HAPMAP

We found that the median power gain of the LRT_{ae} over the LRT_{std} for the threshold double-sample selection method was consistently greater than that for the random double-sample selection method for the runs associated with the factorial design settings displayed in _{p}/C_{g}^{DS}

In practice, few researchers employ molecular haplotyping techniques in genetic case-control studies. The absence of a high-throughput procedure relative to current SNP genotyping technologies is arguably the main reason that this methodology is not more widely used. Another related reason is the cost in terms of both the time and money associated with employing this methodology. Our research suggests that the additional costs involved in molecular haplotyping may be worth the effort, especially if the cost of phenotyping is high relative to the cost of genotyping for a study. Ji et al. found analogous results for the effects of genotype misclassification on genotypic test of association [_{ae} for testing haplotype association should provide the most utility. It is interesting to note, however, that applying the threshold double-sample selection method provided comparable powers for both high and low phenotyping to genotyping cost ratios. This finding suggests that this selection strategy may provide additional power for an initial genome-wide association study, as well as for a replication study.

One potential limitation of these test statistics that we selected is the increase in degrees of freedom associated with using haplotype pairs rather than individual haplotypes. In general, larger degrees of freedom may result in a loss of power. That is, methods that fully account for uncertainty in the phase-assignment process [_{ae} because the LRT_{ae} method examines haplotype pairs rather than single haplotypes and therefore has more degrees of freedom. We chose these statistics for the following reasons: (1) The most general misclassification model involves modeling errors in haplotype pairs rather than in individual haplotypes [

A point for further research involves identifying the scenarios that produce differential and non-differential haplotype pair misclassification, respectively, as well as identifying the effects of each kind of misclassification on type I error and power. Under the null hypothesis that haplotype frequency distributions are equal in case and control populations, theoretical and simulation studies (including ours in this work) suggest that misclassification is non-differential. Under the alternative hypothesis, it is conceivable that haplotype pair misclassification rates may be different in case and control populations. Although recent research [

Although the current perception may be that molecular haplotyping costs are not cost-effective, recent publications suggest that for relatively small regions of the genome, accurate molecular haplotyping is no more expensive than performing fluorescent polymerase chain reactions [_{mh}/C_{g})

In this work, our simulations showed that the misclassification present in calling phased haplotypes from multilocus genotypes using statistical methods is complete. That is, each misclassified haplotype pair is consistently misclassified as the same incorrect haplotype pair throughout the entire dataset. In addition, our simulations under the null hypothesis of no association demonstrate that applying the theoretical χ^{2} distribution to evaluate the significance of test statistics produces conservative and anticonservative _{ae} provides the greatest advantage in terms of power over the LRT_{std} in situations in which more haplotype misclassification errors are present. These situations arise when the haplotype under investigation comprises many SNP markers with low pair-wise intermarker LD.

For fixed costs, the power gain of the LRT_{ae} over the LRT_{std} varied depending on the relative costs of genotyping, molecular haplotyping, and phenotyping. In general, the LRT_{ae} showed the greatest benefit over the LRT_{std} when the cost of phenotyping was very high relative to the cost of genotyping. This situation is likely to occur in a candidate gene replication study as opposed to a genome-wide association study. For intermediate phenotyping to genotyping cost ratios (e.g., _{p}/C_{g}_{ae} may still provide a power advantage if the cost ratio of molecular haplotyping to genotyping is low (_{mh}/C_{g}_{ae} will become applicable to a wider set of circumstances.

The documentation for SNPHAP and PHASE can be found at

The documentation for PAWE can be found at

Data for the estimation of haplotype frequencies from SNP markers within the

LRT_{ae} software is available for free download from

(415 KB DOC)

confidence interval

disease allele frequency

Kolmogorov-Smirnov

linkage disequilibrium

_{ae}

likelihood ratio test allowing for error

_{std}

standard likelihood ratio test

single nucleotide polymorphism