Figures
Abstract
Recently, there have been many case-control studies proposed to test for association between haplotypes and disease, which require the Hardy-Weinberg equilibrium (HWE) assumption of haplotype frequencies. As such, haplotype inference of unphased genotypes and development of haplotype-based HWE tests are crucial prior to fine mapping. The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, its degrees of freedom dramatically increase with the increase of the number of loci, which may lack the test power. Therefore, in this paper, to improve the test power for haplotype-based HWE, we first write out two likelihood functions of the observed data based on the Niu's model (NM) and inbreeding model (IM), respectively, which can cause the departure from HWE. Then, we use two expectation-maximization algorithms and one expectation-conditional-maximization algorithm to estimate the model parameters under the HWE, IM and NM models, respectively. Finally, we propose the likelihood ratio tests LRT and LRT
for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the type I error rates well in testing for haplotype-based HWE. If the NM model is true, then LRT
is more powerful. While, if the true model is the IM model, then LRT
has better performance in power. Under the population stratification model, LRT
is still more powerful. To this end, LRT
is generally recommended. Application of the proposed methods to a rheumatoid arthritis data set further illustrates their utility for real data analysis.
Citation: Mao W-G, He H-Q, Xu Y, Chen P-Y, Zhou J-Y (2013) Powerful Haplotype-Based Hardy-Weinberg Equilibrium Tests for Tightly Linked Loci. PLoS ONE 8(10): e77399. https://doi.org/10.1371/journal.pone.0077399
Editor: Claire Wade, University of Sydney, Australia
Received: April 24, 2013; Accepted: September 2, 2013; Published: October 22, 2013
Copyright: © 2013 Mao et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by the National Natural Science Foundation of China (81072386) and grants from School of Public Health and Tropical Medicine of Southern Medical University, China (grant numbers: GW201219 and GW201237). The Genetic Analysis Workshops are supported by the National Institutes of Health grant R01 GM031575. The RA data were gathered with the support of grants from the National Institutes of Health (NO1-AR-2-2263 and RO1-AR-44422), and the National Arthritis Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In studies of genetic epidemiology, complex diseases are often associated with multiple (interacting) markers [1]–[3]. As such, haplotype-based analysis has gained increasing attention as it can potentially be more efficient than a single-marker-based analysis [4]–[9]. Therefore, haplotype inference of unphased genotypes may be expected to play an important role in disease fine mapping [10]. Nowadays, there are many statistical and computational methods available for inferring haplotypes based on different types of data, such as unrelated individuals. One of the popular approaches is the likelihood method, and the maximum likelihood estimation via the expectation-maximization (EM) algorithm [11] is a frequently employed method for haplotype inference. For genotype data of unrelated individuals, an EM-based maximum likelihood method for the estimation of haplotype frequencies was first proposed by Excoffier and Slatkin [12]. We call it EM algorithm in this paper for easy description later. However, the EM algorithm needs the assumption that the population under study is in Hardy-Weinberg equilibrium (HWE), otherwise the estimates of haplotype frequencies may be biased.
Recently, there have been many case-control studies proposed to test for association between haplotypes and disease. The likelihood ratio test (LRT) was constructed from the maximum likelihood functions for cases, controls and the pooled data of cases and controls, to test for haplotype-disease association, which requires the assumption of HWE in the pooled sample data [3]. Prospective likelihood methods based on logistic regression or generalized linear models were investigated by Schaid et al. [13], Stram et al. [14], Zaykin et al. [15], and others. These methods treat unobserved haplotypes as covariates in a regression model and compute the conditional expectation of the covariates given genotype observations under the null hypothesis of no association with a HWE assumption in the pooled sample of cases and controls. Zhao et al. [16] proposed a prospective estimating-equation approach for the assessment of disease association with haplotypes when adjustment for covariates, which needs the HWE assumption of haplotype frequencies only in the control sample. The pooled sample of cases and controls is not necessarily in HWE. On the other hand, a retrospective likelihood method can be used in detecting haplotype-disease association in a case-control study and also requires HWE only in the control population [17]. Therefore, the detection of haplotype-based HWE is crucial prior to fine mapping and positional cloning studies for case-control designs.
The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, when the number of loci under study increases, the degrees of freedom dramatically increase, which may lack the test power. As such, in this paper, to investigate more powerful haplotype-based HWE tests, we first recall three models which can cause Hardy-Weinberg disequilibrium (HWD). One was proposed originally by Niu et al. [6], which includes a parameter and is called Niu's model (NM) in this paper for convenience; the second one is the inbreeding model (IM) with incorporating the inbreeding coefficient
[18]; the third one is a population stratification (PS) model, which can also lead to HWD. Then, we write out two likelihood functions of the observed data based on the NM and IM models, respectively. We develop an expectation-conditional-maximization (ECM) algorithm [19] for the NM model to estimate the parameter
and haplotype frequencies and suggest an EM algorithm for the IM model (denoted by IEM algorithm here) to estimate the inbreeding coefficient
and haplotype frequencies. Note that
or
means that HWE holds. So, we further propose two LRT tests LRT
and LRT
to test for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the size well in testing for haplotype-based HWE. If the Niu's model is true, then LRT
is more powerful. While, if the inbreeding model is true, then LRT
has better performance in power. Under the population stratification model, LRT
is still more powerful. Therefore, LRT
is generally recommended. In addition, we obtain the sum of absolute differences (SAD) between the true and estimated haplotype frequencies [20], and compare the performance of the EM, ECM and IEM algorithms in estimating the haplotype frequencies. If the true model is the Niu's model, then the ECM algorithm has more accurate estimates of haplotype frequencies than the EM and IEM estimates. However, for all the other simulation settings, the EM algorithm is not so much affected by the departure from HWE, and the EM and IEM algorithms almost have the same performance in controlling SAD, which is less than the ECM estimates. Application of the proposed methods to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC) further illustrates their utility for real data analysis.
Materials and Methods
Likelihood Function and EM Algorithm under HWE
Consider a sample of unrelated individuals and
single nucleotide polymorphism (SNP) markers. Assume that the SNPs are tightly linked so that the recombination fraction between any SNP pair is zero. For each SNP, there are two alleles 1 and 2. Let
be the set of all possible haplotypes at these
loci, where
. We assume that
is the frequency of haplotype
(
), so the set of haplotype frequencies can be denoted by
. Let
be the set of the observed genotypes of all the
individuals, where
is the genotype of the
individual. For the
individual, the number of haplotype combinations compatible with
is
. Therefore, the likelihood function of the sample can be expressed as
(1)where
denotes the
haplotype combination compatible with genotype
for the
individual.
To make the haplotype frequency estimation easy and feasible, the EM algorithm was employed [11]. Let be the true haplotype combinations of the sample which are actually unobserved, and
is the true haplotype combination of the
individual. Then the log-likelihood function of the complete data is
(2)where
is an indicator function and
if
and 0 otherwise. Note that under HWE, the probability
of unordered haplotype pair
is
if
and
otherwise. Further, Excoffier and Slatkin [12] proposed the following EM algorithm to obtain the maximum likelihood estimates of
(
) at iteration
,
where
is the number of times that haplotype
occurs in the
haplotype combination for the
individual and takes values of 0, 1 or 2, and
is the value of the probability
based on the estimated haplotype frequencies
at iteration
.
Two Forms of HWD
Note that the underlying assumption of HWE is strong and HWE does not hold usually. One may consider the following form of HWD,(3)where
is the inbreeding coefficient which is generally positive [21]. Note that Equation (3) is reduced to HWE when
. We denote this form of HWD as “inbreeding model (IM)” for convenient description in this paper.
Another form of the departure from HWE was originally proposed by Niu et al. [6] as follows. Assume that the probability of unordered haplotype pair is proportional to
if
and
otherwise, with two parameters
and
. Obviously, the HWE assumption holds if
. Note that the sum of all these terms for all the
haplotypes at the
loci may not be 1. Then, HWD can be defined as the following form:
(4)where
Let . Then, we assume
due to the positive inbreeding coefficient
. We denote this form of HWD as “Niu's model (NM)” for convenience.
Likelihood Function and Haplotype-Based HWE Test under Niu's Model
Using Equations (2) and (4), the log-likelihood function of the complete data under the Niu's model can be expressed as(5)where
. In fact, there is only one additional parameter
included in Equation (5), compared to the likelihood function under HWE. So, we propose the following expectation-conditional-maximization (ECM) algorithm to estimate the haplotype frequencies and the parameter
. It consists of one expectation step (E-step) and
conditional-maximization steps (CM-steps) at each iteration. In E-step at iteration
, we can get the following
function after taking the conditional expectation of Equation (5), given the observed genotype data
and current estimate
of
,
(6)where
is the conditional probability of the haplotype pair
given
and
, which is 0 if there is no haplotype pair compatible with genotype
.
In CM-steps, we maximize the function in Equation (6) to estimate
. Let
be the estimate of
in the
CM-step among
CM-steps at iteration
. The detailed CM-steps are as follows:
• Give the initial value , where
.
• At iteration , by fixing
in the first CM-step, maximize the
function by taking the first-order derivation with respect to
so as to get the estimate of
, and then
where
,
. So,
.
• Note that there is a constraint condition when we maximize
to estimate the haplotype frequencies
. Thus, from the second CM-step to the
CM- step,
's (
) are estimated step by step and
is then estimated by
. Let
be the set of the haplotype frequency estimates for all the haplotypes but
and
in the
CM-step. Then,
. For exam ple,
in the second CM-step for estimating
. As such, in the
CM-step (
), by maximizing
, it is shown in Text S1 that a cubic equation with respect to
is obtained,
(7)where the coefficients
,
,
and
are, respectively,
and the vector
and the matrix
are respectively
Moreover, the cubic equation above is alway solvable, and its solution can be obtained by Shengjin's formulas [22]. Note that the likelihood function converges no matter which initial values of are chosen. So, if there are two or three solutions between 0 and 1, then we can choose the solution which is closer to
in the former step. After this step,
• For ,
. Then
.
• Repeat the steps above until the observed log-likelihood function of Equation (1) converges.
Equation (1) can be written to be under the Niu's Model. Note that HWE holds when
and HWE is violated otherwise. Therefore, a likelihood ratio test (LRT) for HWE is naturally constructed based on the estimated haplotype frequencies as follows,
(8)where
and
are the values of the observed likelihood function under the null hypothesis of HWE and under the HWD alternative, respectively. Obviously, this LRT statistic asymptotically follows a Chi-square distribution with the degree of freedom being 1 when HWE holds.
Likelihood Function and Haplotype-Based HWE Test under Inbreeding Model
Borrowing the idea of Zeng and Lin on how to estimate the haplotype frequencies based on case-control data for testing for association [18], here we rewrite the likelihood function for unrelated individuals under study and then propose a haplotype-based HWE test under the inbreeding model. Let be a random variable, which takes values from
possible haplotype combinations compatible with
of the
individual. Suppose that
, and
is a Bernoulli variable with success probability
. Let
and
, where
and
are discrete random variables, and the haplotype before “/” is paternal and haplotype after “/” is maternal. So,
has the same distribution as
, and we treat
,
and
as missing. Then, the log-likelihood function of the complete data under the inbreeding model is
(9)where
.
To estimate the parameters in Equation (9), the EM algorithm is considered. In E-step, the
function is
In M-step, the estimation of at iteration
can be obtained by solving the following equation
So, can be estimated by
where
and
are the estimates of
and
at iteration
, respectively. The haplotype frequencies can be estimated by
where
is a normalizing constant, and
and
can be calculated as follows,
We call this process IEM algorithm for distinguishing it from the previous EM algorithm under HWE.
Note that under the IM model, HWE holds when , and HWE is not true when
. Therefore, we propose the following LRT to test for haplotype-based HWE,
where
and
are the values of the observed likelihood function under the null hypothesis of HWE and under the HWD alternative, respectively. Obviously, this LRT statistic asymptotically follows a Chi-square distribution with the degree of freedom being 1 when HWE holds.
Software Implementation
Based on the above EM, ECM and IEM algorithms, we have written a software HAP-HWE to conduct the proposed haplotype-based HWE tests, which is implemented in R (http://www.r-project.org) and is freely available at http://www.echobelt.org/web/UploadFiles/HAP-HWE.html. For each of the EM, ECM and IEM algorithms, let denote the number of haplotypes that occur in all the possible haplotype combinations compatible with the observed genotypes
in the sample. As such, the initial values of all these
haplotype frequencies are taken as
at
. For the ECM and IEM algorithms, the initial values of
and
are taken as 1 and 0.01, respectively. The convergence criterion is that the absolute difference between the estimated values of the log-likelihood function at two consecutive iterations is smaller than
. The default maximum number of iterations is 1000. Then, the last estimates,
,
and
, are taken as the maximum likelihood estimates of
,
and
, respectively. Consequently, the values of LRT
and LRT
and the corresponding P values are obtained.
The input data file is a standard linkage pedigree file containing pedigree relationship, genotype and phenotype information, with each row being for an individual. The HAP-HWE software will only use the founders in the sample and automatically exclude the nonfounders from the analysis. Further, a haplotype block file is needed with each row representing a haplotype block, which can be easily exported from other existing software, such as Haploview [23]. Then, our HAP-HWE software will analyze the haplotype blocks one by one. The usage of the HAP-HWE software and other details refer to Text S2.
Our HAP-HWE software outputs: (i) the convergence processes of the log-likelihood function under the EM, ECM and IEM algorithms, (ii) the haplotypes with frequency estimates being larger than and the associated frequency estimates under the three algorithms, (iii) the estimated value of
, the value of LRT
and the corresponding P value under the Niu's model, and (iv) the estimated value of
, the value of LRT
and the corresponding P value under the inbreeding model. The output results will be saved in a text file (named “results.txt”) in the working directory. In addition, like other haplotype frequency estimation methods, our methods also face running time and storage space problems because of the large number of possible haplotypes. In our software, to reduce storage space, each haplotype is represented by an integer, rather than a vector of alleles.
Results
Simulation Settings
To assess the validity and compare the performance of two LRT tests in testing for haplotype-based HWE, we consider three models with three tightly-linked SNPs that can lead to HWD: Niu's model (NM), inbreeding model (IM) and population stratification (PS) model. For both the NM and IM models, the true marginal haplotype distribution is given in Table 1. For the NM model, the value of is taken from 1.0 to 1.5 in increments of 0.05. Firstly, we calculate the probabilities of all the haplotype combinations from Equation (4). Then, one haplotype combination for each individual is randomly chosen. For the IM model, the inbreeding coefficient
is taken from 0 to 0.1 in increments of 0.01. Firstly, we calculate the probabilities of all the haplotype combinations from Equation (3), and then one haplotype combination is selected at random for each individual. Finally, we combine these two haplotypes to form the unphased genotype for the individual. To investigate how the population admixture affects the performance of two haplotype-based HWE tests, we consider the following PS model with two subpopulations I and II, where the corresponding haplotype distributions are given in Table 2, respectively. The proportion
of the subpopulation I is taken to be 0.6 and 0.8.
Note that when and
, HWE holds for the NM and IM models, respectively. So, we simulate the type I error rates of the proposed HWE tests when
or
, and make power comparison when
and
. The PS model is also used to simulate the powers of both of the tests. For all the models, we generate samples of unrelated individuals at these three loci and the sample size is taken as 500, 1000 and 1500, respectively. The number of simulation replicates is fixed at 1000 and the significance level
is taken to be 5%.
As additional findings in this paper, we can also compare the efficiency of the EM, ECM and IEM algorithms in haplotype inference. The accuracy of haplotype frequency estimates is assessed by the sum of absolute differences (SAD) between the true and estimated frequencies, which was proposed by Fallin and Schork [20] and defined aswhere
and
are the true and estimated haplotype frequencies of
, respectively. It ranges from 0 (when the estimation is perfect) to 1.
Simulation Results
Table 3 lists the estimate of , mean SAD of haplotype frequency estimates, simulated size and powers of two HWE tests for different values of
and different sample sizes
under the Niu's model. It is shown in the table that the mean estimated value
over 1000 replicates is close to its true value. The type I error rate of LRT
is close to the nominal 5% level, while the size result of LRT
is less than 0.05, when
(i.e. HWE holds). This means that in testing for haplotype-based HWE, LRT
controls the size well and LRT
is conservative under the NM model. The powers of both LRT
and LRT
are larger when
increases from 1.1 to 1.5 and the sample size
is fixed. However, LRT
is more powerful than LRT
. In addition, when
and
is unchanged, the EM, ECM and IEM algorithms perform similarly in the estimation of haplotype frequencies. However, with the increase of the
value, the SAD measure of the ECM algorithm does not have much change and is much smaller than the EM and IEM algorithms. The SADs of the EM and IEM algorithms are very close to each other and become larger when
is larger. On the other hand, with the sample size increasing, the SAD measures of all the three algorithms become less and two proposed LRT tests have more powers.
Table 4 shows the estimate of , mean SAD of haplotype frequency estimates, simulated size and powers of two HWE tests for different values of inbreeding coefficient
and different sample sizes
under the inbreeding model. We can see from the table that the mean estimated value
over 1000 replicates is close to its true value. As shown in Table 3, LRT
performs better in controlling the size than LRT
under the IM model. However, LRT
is more powerful than LRT
under this situation. On the other hand, both the EM and IEM algorithms have the same performance and the corresponding SADs are stable across different values taken for
(0 to 0.1) in the estimation of haplotype frequencies. However, the ECM estimate gets larger with the increase of
and performs worse than the EM and IEM estimates. When the sample size is larger, the corresponding SADs appear to be smaller and two proposed LRT tests are more powerful.
Table 5 displays the mean SAD of haplotype frequency estimates and simulated powers of two HWE tests based on 1000 simulation replicates, under the PS model, with the proportion of subpopulation I being taken as 0.6 and 0.8, and the sample size being fixed at 500, 1000 and 1500. From the table, we find that LRT
is more powerful than LRT
, irrespective of the
value or the sample size
. In the estimation of haplotype frequencies, the EM and IEM algorithms perform similarly in SAD and have better SADs than the ECM estimate, which signifies that the EM and IEM algorithm are more robust to population stratification than the ECM algorithm.
Application to NARAC Data Set
We apply our HAP-HWE software to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC) [24], which was made available through the Genetic Analysis Workshop 15 [25]. In the data set, there are 757 pedigrees comprised of 8017 individuals (2481 founders and 5536 nonfounders), which were genotyped at 5407 SNP markers over the 22 autosomes. In each pedigree, there is at least one affected nonfounder with RA.
Note that information on haplotype blocks is needed prior to the HAP-HWE analysis. In this application, we use the existing software Haploview (version 4.2) [23] to define haplotype blocks, with all the arguments being taken as the default values. Then, 181 haplotype blocks are identified, 150 blocks including 2 SNPs, 19 blocks including 3 SNPs, 7 blocks including 4 SNPs, 1 block including 5 SNPs, 2 blocks including 6 SNPs, 1 block including 9 SNPs and 1 block including 13 SNPs.
On the other hand, HAP-HWE only uses the founders and excludes the nonfounders from the analysis. Further, there is a large proportion of missing genotypes for individuals in the data set. Therefore, the reduced data set used for the HAP-HWE analysis contains only a few founders in the data set. On the average, there are about 295 pedigrees (about 367 unrelated individuals) used for each haplotype block, ranging from 288 to 296 (ranging from 358 to 369).
Table 6 lists the results of the application to the NARAC data set. The significance level is fixed at . There are 13 haplotype blocks (out of 181) with at least one of the P values of the LRT
and LRT
being less than 5%. However, after multiple testing based on Bonferroni correction (
), only the seventh haplotype block including 6 SNPs (rs347117, rs383902, rs395601, rs387812, rs347115 and rs610877) on chromosome 15 is statistically significant with the P value of the LRT
being
. Figure 1 gives the Haploview LD display for this haplotype block. On the other hand, Min et al. [26] reported that chromosome 15p34 at rs347117 showed a possible linkage peak to RA by using the nonparametric linkage
score (
), which may support our finding.
The red box denotes that the LOD value between any two loci is larger than or equal to 2.0. The numbers in the red boxes are the corresponding values of and the empty box denotes that
.
Discussion
In this paper, we first wrote out two likelihood functions of the observed data based on the NM model and IM model. Then, we developed the ECM algorithm for the NM model to estimate the parameter and haplotype frequencies and suggested the IEM algorithm for the IM model to estimate the inbreeding coefficient
and haplotype frequencies. Note that
or
means that HWE holds. So, we further proposed two LRT tests to test for haplotype-based HWE. We simulated the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results showed that both of the two tests are valid in testing for the haplotype-based HWE. If the Niu's model is true, then LRT
is more powerful. While, if the inbreeding model is true, then LRT
has better performance in power. Under the population stratification model, LRT
is still more powerful. Therefore, if the population model is unknown in practice, LRT
is generally recommended due to its good performance. Furthermore, we compared the performance of the EM, ECM and IEM algorithms in estimating the haplotype frequencies. If the true model is the Niu's model, then the ECM algorithm has more accurate estimates of haplotype frequencies than the EM and IEM estimates. However, for all the other simulation settings, the EM algorithm is not so much affected by the departure from HWE, and the EM and IEM algorithms almost have the same performance in controlling SAD, which is less than the ECM estimates. We also demonstrate the practical utility of the proposed methods by the application to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC). In addition, note that there are many abbreviations and notations used in this paper. So, in Supporting Information, we give two tables (Tables S1 and S2) to list them for the easy reference.
Supporting Information
Text S1.
Conditional-maximization steps of ECM algorithm.
https://doi.org/10.1371/journal.pone.0077399.s003
(PDF)
Author Contributions
Conceived and designed the experiments: WGM JYZ. Performed the experiments: WGM HQH JYZ. Analyzed the data: WGM HQH YX PYC JYZ. Contributed reagents/materials/analysis tools: WGM HQH JYZ. Wrote the paper: WGM JYZ. Designed the software used in analysis: WGM JYZ. Revised the manuscript: YX PYC.
References
- 1. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, et al. (2003) The international HapMap project. Nature 426: 789–796.
- 2. Gibbs RA, Belmont JW, Boudreau A, Leal SM, Hardenbol P, et al. (2005) A haplotype map of the human genome. Nature 437: 1299–1320.
- 3.
Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of Genetic Association Studies. New York: Springer.
- 4. Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, et al. (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544–548.
- 5. Huang BE, Amos CI, Lin DY (2007) Detecting haplotype effects in genomewide association studies. Genet Epidemiol 31: 803–812.
- 6. Niu T, Qin ZS, Xu X, Liu JS (2002) Bayesian haplotype inference for multiple linked single- nucleotide polymorphisms. Am J Hum Genet 70: 157–169.
- 7. Yu Z, Schaid DJ (2007) Sequential haplotype scan methods for association analysis. Genet Epidemiol 31: 553–564.
- 8. Zhang K, Zhao H (2006) A comparison of several methods for haplotype frequency estimation and haplotype reconstruction for tightly linked markers from general pedigrees. Genet Epidemiol 30: 423–437.
- 9. Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, et al. (2000) Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet 67: 936–946.
- 10. Becker T, Knapp M (2004) Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet Epidemiol 27: 21–32.
- 11. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological) 39: 1–38.
- 12. Excoffer L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12: 921–927.
- 13. Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70: 425–434.
- 14. Stram DO, Pearce L, Bretsky P, Freedman M, Hirschhorn JN, et al. (2003) Modeling and EM estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum Hered 55: 179–190.
- 15. Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, et al. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53: 79–91.
- 16. Zhao LP, Li SS, Khalid N (2003) A method for the assessment of disease associations with singlenucleotide polymorphism haplotypes and environmental variables in case-control studies. Am J Hum Genet 72: 1231–1250.
- 17. Epstein MP, Satten GA (2003) Inference on haplotype effects in case-control studies using unphased genotype data. Am J Hum Genet 73: 1316–1329.
- 18. Zeng D, Lin DY (2005) Estimating haplotype-disease associations with pooled genotype data. Genet Epidemiol 28: 70–82.
- 19. Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–278.
- 20. Fallin D, Schork NJ (2000) Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am J Hum Genet 67: 947–959.
- 21. Kuk AYC, Zhang H, Yang Y (2009) Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium. Bioinformatics 25: 379–386.
- 22. Fan S (1989) A new extracting formula and a new distinguishing means on the one variable cubic equation (in Chinese). Natural Science Journal of Hainan Teachers College (in China) 2: 91–98.
- 23. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.
- 24. Jawaheer D, Seldin MF, Amos CI, Chen WV, Shigeta R, et al. (2003) Screening the genome for rheumatoid arthritis susceptibility genes: a replication study and combined analysis of 512 multicase families. Arthritis Rheum 48: 906–916.
- 25.
Amos CI, Chen WV, Remmers E, Siminovitch K, Seldin MF, et al.. (2007) Data for Genetic Analysis Workshop (GAW) 15 Problem 2, genetic causes of rheumatoid arthritis and associated traits. BMC Proc (Suppl 1): S3.
- 26. Min JY, Min KB, Sung J, Cho SI (2010) Linkage and association studies of joint morbidity from rheumatoid arthritis. J Rheumatol 37: 291–295.