Text S1
Let A and B be alleles at two bi-allelic polymorphisms (e.g. SNPs) located close enough to be in LD. The standard LD coefficient (DAB) is transformed to the LD correlation coefficient (r) by the well known formula EMBED Equation.3 where PA and PB are allele frequencies of A and B [42]. Similarly, Weir [42,43] described the composite disequilibrium coefficient (DAB) which he explained is similarly transformed to an analogous correlation coefficient (r*) by the formula EMBED Equation.3 where DA and DB are Hardy-Weinberg disequilibrium coefficients. Whether the formula variables are for the entire population or a particular sample, Zaykin [44] and Weir [43] each noted that r* is the same as the formula for the (unsquared) correlation between genotypes ( EMBED Equation.3 ) at two loci; and thus EMBED Equation.3 .
For a population in Hardy-Weinberg equilibrium (HWE), the population values of the four disequilibrium coefficients are such that DAB=DAB and DA=DB=0; and Weir [43] also noted that the population values of r and r* become identical implying that EMBED Equation.3 . Weir [43] also stated that the sample statistics of r and r* are normally distributed after Fisher s z-transforma
hfLhNCJPJUVhfLhNjhfLhNUjhfLhNEHU tion such that their sample means are equal under HWE while their sampling variances are 1/(2N-3) and 1/(N-3), respectively (where N is number of subjects in the sample). We therefore conclude that EMBED Equation.3 for samples from populations in HWE for which sample size is sufficiently large (N>100) for accurate normal distribution under Fisher z-transformation with the approximation being very good for sample sizes as large as our GWAS (N>1000).
