A Generalized Model to Estimate the Statistical Power in Mitochondrial Disease Studies Involving 2×k Tables

Background Mitochondrial DNA (mtDNA) variation (i.e. haplogroups) has been analyzed in regards to a number of multifactorial diseases. The statistical power of a case-control study determines the a priori probability to reject the null hypothesis of homogeneity between cases and controls. Methods/Principal Findings We critically review previous approaches to the estimation of the statistical power based on the restricted scenario where the number of cases equals the number of controls, and propose a methodology that broadens procedures to more general situations. We developed statistical procedures that consider different disease scenarios, variable sample sizes in cases and controls, and variable number of haplogroups and effect sizes. The results indicate that the statistical power of a particular study can improve substantially by increasing the number of controls with respect to cases. In the opposite direction, the power decreases substantially when testing a growing number of haplogroups. We developed mitPower (http://bioinformatics.cesga.es/mitpower/), a web-based interface that implements the new statistical procedures and allows for the computation of the a priori statistical power in variable scenarios of case-control study designs, or e.g. the number of controls needed to reach fixed effect sizes. Conclusions/Significance The present study provides with statistical procedures for the computation of statistical power in common as well as complex case-control study designs involving 2×k tables, with special application (but not exclusive) to mtDNA studies. In order to reach a wide range of researchers, we also provide a friendly web-based tool – mitPower – that can be used in both retrospective and prospective case-control disease studies.


Introduction
The mitochondrion produces most of the ATP in the cell, an energy source on which almost all physicochemical processes depend. Each cell contains dozens or hundreds of mtDNA genomes that are inherited as a single haplotypic block from the mother to the offspring. Germ-line mutations accumulate on top of existing haplotypes, and these haplotypes aggregate in human populations according to their demographic histories. Due to the particularities of the mtDNA molecule (i.e. matrilineal inheritance and lack of recombination [1]), it is straightforward to reconstruct phylogenetic relationships between human haplotypes [2][3][4]. Phylogenetically related haplotypes in the population are commonly grouped into clusters or haplogroups [5]. Thus, haplogroups represent branches of the mtDNA phylogeny, and the set of diagnostic variants defining these clades are popularly known as the sequence motif [6,7]. Screening for these variants in a given mtDNA molecule can provide sufficient information to allocate a particular mtDNA genome into a given haplogroup [8][9][10].
In the last few years, a huge number of studies have been conducted addressing the presumable association of mtDNA haplogroups with different complex diseases, including cancer [11,12], Alzheimer [13,14], Parkinson [15][16][17], schizophrenia [18][19][20][21], infectious diseases [22,23], diabetes [24], LHON [25], etc. Most of these disease studies are population-based, that means, the mtDNA variability is compared between cohorts of cases and representative healthy control (case-control studies), where the statistically significant over-representation of a given variant in cases regarding controls might point to a biological association of this variant with the disease.
Statistical procedures are important in order to understand the presumable relationship between mtDNA haplogroups and diseases. Estimating a priori statistical power is fundamental in case-control association studies given that this is the way to evaluate to what extent a positive finding is likely to be not at random. However, case-control association studies targeting the mtDNA variation [22] rarely compute power mainly due to the lack of the statistical procedures that are necessarily different to those employed using autosomal DNA markers. To the best of our knowledge, only Samuels et al. [26] investigated the issue of statistical power in regards to cases-control mtDNA studies involving 26k. These authors used a simulation-based permutation test (Monte-Carlo) in order to estimate power calculations for prospective case-control studies. According to these authors, very large cohorts are needed to reliably detect and association between mtDNA haplogroups and complex diseases. This study however only deals with the restricted scenario where the number of cases equals the number of controls. The particular biological application of Sánchez et al. [27] on a mtDNA case has to do specifically with 263 tables, comparing a RFLP polymorphism (binary) and the three genotypes derived from a biallelic albumin marker.
Several software packages and statistical procedures were designed for the calculation of statistical power and sample size. Most of the procedures developed to date can only deal with 262 tables (the great majority) or 263 tables [27] (Table S1). Thus, most of the software packages have been designed for the estimation of power or/and sample size in the most common scenario involving allele frequencies deriving from autosomal binary markers (SNPs), that is, involving allele or genotype association tests. Only two software packages, namely G*Power 3 [28] and Pass 12 [29], are able to treat tables r6k; however, these two packages only deal with scenarios where the number of cases equals number of controls. Finally, osDesign [30] is based on logistic regression, and although it can deal with r6k tables it does not allow estimating samples sizes.
In the present study we consider more general case-control disease scenarios involving any number of cases and controls and 26k tables. For instance, it is a common situation that only a limited number of patients can be recruited in a particular study; however, an increase in the number of controls could contribute to reach a reasonable statistical power. The model elaborated in the present study is based on simulations (Monte Carlo method) as a way to estimate the statistical power in case-control studies where there is interest in investigating the presumable relationship between a certain disease and a number of mtDNA haplogroups (or haplotypes or mtDNA SNPs [mtSNPs]). We consider the frequency of the risky allele or haplogroup in controls (p 0 ) and in cases (p 1 ), and the difference between these two parameters is proportionally distributed to the frequencies of the remaining allele or haplogroup categories in cases. In addition, a web-tool named mitPower has been also developed to implement all the statistical procedures developed in the present study.

Data simulation
We first build 26N H (in general, 26k) tables (10,000 simulations), where N H denotes the number of haplogroups considered (but could also be any number of haplotypes or mtSNPs). Two multinomial samples are used to build the contingency tables, taking as frequencies the estimated frequencies, and as size, the number of controls and cases of our study. In the simulated tables the row variable represents the status of case or control, while the columns represent the allele variables or haplogroups into which individuals are sorted. For the sake of simplicity, N H was set up to 11 (unless otherwise stated) but the method and mitPower has been designed to accept any number of haplogroups. As done in Samuels et al. [26], the following 11 haplogroup frequencies were considered as example: H (41%), I (2%), J (11%), K (8%), M (1%), T (13%), U (15%), V (13%), W (2%), X (2%) and a residual haplogroup (2%).
The power values obtained using MitPower have been validated with other tools (Table S1) in comparable scenarios that consider 262 tables and equal numbers of cases and controls. MitPower was additionally validated for 263 tables with the procedure shown by Sánchez et al. [27].

Statistical analysis
First, the computation of the probabilities of 262 contingency tables is the best option to test the homogeneity of control and case sample populations; however, the computational requirements increase with the dimension of the contingency tables. A way to overcome this problem is to implement a Fisher's exact test that estimates the probability of a contingency table using a Monte Carlo simulation approach. On the other hand, the Chi-squared statistic is computationally feasible for 26k tables being k any entire positive number. Both tests yield very similar results ( Figure 1 and Figure S1) and both are implemented in Mit-Power. From here onwards, we have used the Chi-square statistic, which compares the values obtained in our contingency tables against the values expected under the null hypothesis of homogeneity.
First, we generate a number of tables 26N H under a given hypothesis. A random number between 0 and 1 is generated using the R function runif, and this number is used as the seed for simulations. Power value estimators are obtained as the percentage of simulated tables with P-value below a fixed significance level.
In order to obtain the P-value for each simulated table, the distribution of our statistic has to be known. This distribution is approached here using two alternative procedures: the asymptotic and the permutation approach. The former approach is based on the asymptotic distribution of the Chi-square statistics. Note that some authors argued [31] that an increase of the false-positive rate occurs when the Cochran's rule is not verified; so the asymptotic approximation should be considered acceptable when the Cochran's rule is verified [31]; that means that a contingency table cannot contain expected values below one, and that no more than 20% of the expected value can be below five. The permutation method aims to overcome this problem. First, a large number of permuted tables of our initial data (contingency tables) are generated, with the only restriction that total sums by rows and columns have to remain constant. For each of these permutations, the Chi-square statistic is computed, and the Pvalue is obtained as the proportion of permutations with a Chisquare statistics higher than the statistics in the original data set [32,33]. There are several ways to obtained permuted tables [32,34,35], and we chosen the method provided by the function chisq.test [36].
Theoretically, the asymptotic and the permutation approaches should have similar values as the sample size increases [37]. Some experiments have been done in this direction ( Table 1, and see text below) in order to corroborate this expectation. Along the simulation experiments carried out in the present manuscript, the permutation method was preferred given that it generally performs better than the asymptotic one (see below).
All the computations were carried out in R (http://www.rproject.org/), and using the functions chisq.test, fisher.test, and pchisq of package stats.
mitPower: a web interface to estimate statistical power in 26k tables mitPower is a web-based tool (http://bioinformatics.cesga.es/ mitpower/) that allows estimating the statistical power in casecontrol association disease studies. Several other utilities are available in mitPower such as the estimation of: (i) the a posteriori statistical power, (ii) the sample size needed in order to reach a given statistical power, and (iii) the minimum deviation from the null hypothesis (of no association) detectable under a given statistical power (expressed as OR and haplogroup frequency in cases).
The software mitPower allows using two calibration methods: the asymptotic and the permutation procedure. The permutation procedure can be computationally demanding (see below) so the asymptotic procedure might be more convenient for complex scenarios.  The mitPower web interface is written in PHP, allowing users to enter their inputs through an HTML form. All the mitPower analyses are executed using R scripts (see above), which are called from the interface through Rscript and run at the web server. Their output is ultimately formatted for web display again by the PHP interface. Results links are kept for 24 hours in the server.
The underlying R scripts in mitPower run on a web server hosted by the Supercomputing Center of Galicia (CESGA; http:// www.cesga.es) located in Santiago de Compostela (Galicia, Spain).

Results and Discussion
Two procedures were followed to calibrate the distribution of our statistic: the asymptotic and the permutation method. In order to test which of the two approaches performs better, a simulation experiment has been performed considering different sample sizes, number of simulations, and permutations. The experiments indicate that the permutation method performs better than the asymptotic one given that power estimates approach closer to the significant value under the null hypothesis for low sample sizes. However, both approaches yield good estimates when considering large sample sizes ( Table 1). This is in agreement with theoretical expectations given that for large samples, their statistical distribution should be equivalent, as the permutation distribution should converge to the tabulated distribution [37].
The results indicate that (i) the permutation method tends to fit better to the significance level than the asymptotic approach when the null hypothesis is true (specially for low sample sizes), and (ii) computational requirements using permutation can be an issue when considering a large number of iterations (large sample sizes); in such situations, the asymptotic calibration method might be more convenient ( Table 1).
As done in Samuels et al., we would assume that haplogroup mtDNA frequencies in controls are known (note that there exist hundred of human population studies carried out to a local, regional or continental scale where these frequencies are available, at least for the most common haplogroups). We then simulated increases in the frequency of a risky haplogroup in cases, with the differences distributed proportionally between the remaining haplogroups (therefore, assuming there is no a priori assumption of an association with any of the remaining haplogroups considered) [26].
In agreement with Samuels et al. [26], we observed that power strongly depends on sample sizes, haplogroup population frequencies, and the deviation from the null hypothesis when using equal numbers of cases and controls (see solid lines in Figure 1). We next evaluate the situation where the number of cases differs from the number of controls. As shown in Figure 1, statistical power strongly depends on the case:control ratio when the other parameters are fixed, but this dependence is not as simple. As expected, power can increase very substantially as more controls exist relative to the number of cases. For instance, the statistical power to detect an association of haplogroup J increases from 60% to 80% when doubling the number of controls respect to cases in the example provided in Figure 1.
Samuels et al. [1] introduced the N scaled parameter for the estimation of the power. This parameter considers the difference between haplogroup frequency for equal numbers of controls and cases: , being p 0 the frequency of the risky haplogroup in controls, p 1 the frequency of the risky haplogroup in cases, and N the number of cases and controls (the total sample size is 2N).
We further consider the more general situation where the number of cases (N ca ) can differ from the number of controls (N co ), . N scaled and N sc measures the squared standardized difference between frequencies in cases and in controls for the risky haplogroup. For 262 tables and a sample size large enough, the N scaled parameter follows a chi-square distribution with one degree of freedom due to the asymptotic normality of the standardized difference between frequencies [38].
As shown in Figure 2, there is a clear relationship between the parameter N sc and the power values. These values follow the theoretical curve obtained for 262 tables and equal numbers of cases and controls using the arcsin transformation [39]: Where p 0 in the frequency in controls, p 1 is the frequency in controls, N is the sample size for each arm and Z b and Z 1{ a = 2 are normal quantile for b and 1{ a = 2 .Simulations also indicate that the statistical power decreases as more haplotypes are tested ( Note that the value 0.37 seems to have been obtained empirically by Samuels et al, (no specific formulae or indications were given in this regard). In the analysis shown in Figure 4 we aimed to reproduce their findings. The simulations corroborate the fact that 0.37 is the value that allows to better fit the data to the theoretical curve for values of statistical power above 50%. Below 50% an exponent of 0.5 would perform better although it can assume that values of statistical power below 50% might be not relevant in association studies. Therefore, we observed that N H raised to the power of 0.37 allows fitting the statistical power to those scenarios where the number of cases differs from the number of controls (Figure 3).
Finally, N sc allows to relate all the parameters involved in the computation of the statistical power: Samuels et al. (see A1 in their Appendix A) propose to use N scaled to estimate directly the statistical power and to determine the minimum number of controls and disease cases (N Cmin ) required for a specific level of power (their formula (2)). However, this formula applies when number of cases equals the number of controls. The simulation method aims to overcome the limitation of this formula allowing for different sample sizes in cases and controls.
We then adjusted the parameter N Cmin for the more general scenario involving unequal numbers of cases and controls. Two  In the first situation, if N denotes the number of cases, a the control-case ratio, and hN the number of controls, the minimum number of cases needed to reach a power b with a significance level a (N ab ) can be estimated from (5) as follows: where N ab sc denotes the N sc value providing a desired power b and a significance level a, while [?] denotes the integer part function. The number of controls can be estimated as hN ab . Note that a value of h~1would reproduce the particular scenario considered by Samuels et al. [26].
In the second situation, the minimum number of controls (N ab co min ) or the minimum number of cases (N ab ca min ) given a significance level a and a powerb, can also be estimated when the number of cases or the number of controls is fixed, respectively: where N ab sc is the value providing the power desired a for a significance level b, and [?] denotes the integer part function. This allows estimating the number of controls (number of cases) needed to reach a required power given a number of cases (number of controls). Note that statistical power is limited by the restrictions in equations (7) and (8); this is the reason of why power becomes stationary when the number of controls increases in regards to the number of cases (Figure 1).
It is also worthwhile to estimate the minimum deviation of the null hypothesis that can be detected for a power value b and a significance level a (D ab min ), assuming we know haplogroup frequencies in controls and considering a given number of cases and controls. We can express p 1~po zD ab min p 0 being D ab min w0 the deviation of null hypothesis. Note that it must verify 0vp 1 v1. If we calculate from (1) This difference can be expressed in terms of odds ratio. Thus Where OR min denotes the minimum OR that can be detected for a power value b and a significance level a.
The applications above require knowledge of the N ab sc value given a significance level a and a power b. This parameter can be obtained by way of simulations and nonparametric regression ( Figure 3 [right] shows the scenario wherea~0:05). Note that nonparametric regression seems to perform well for power values above 60%. The package sm [40] implements local linear estimation, window-selector cross validation and Gaussian kernel, that allows to obtain N sc values for different significance levels a and power values. These values can be used in equations (6)- (10) in order to estimate the desired parameters (as done in Table 2).
The same simulation methods proposed to compute a priori statistical power can be applied for the estimation of the a posteriori power. Note however, that we treat a posteriori power in a different context as interpreted by others [41,42]. Our procedure involves generating new data (tables) using the sample proportions and sample sizes obtained from a particular study. Therefore, the null hypothesis is tested by simulating new contingency tables. The procedures are analogs to the ones used to compute the a priori power.
Computation of the statistical power is essential to anticipate if the positive findings obtained in case-control disease studies are reliable. The present study has been motivated by the fact that the only available procedures to date to compute statistical power in mtDNA association studies [26] only allows to deal with scenarios involving 262 tables (or 263 tables), or if 26k tables only study designs considering equal numbers of cases and controls (which does not represent the most common scenario in association studies). In the present study, we also provide with a web interface that implements the procedures developed in the present study (mitPower).

Conclusions
During the last decade, a large number of mtDNA case-control studies have been published in the literature, most of them pointing to a number of haplogroups presumably associated with a complex disease. The validity of many of these conclusions might be questionable, given that most of them are underpowered. Most of these studies did not estimate the a priori statistical power because statistical tools were not available at the time.
The procedures developed in the present study allow the computation of statistical power in common as well as complex casecontrol study designs involving 26k. The results indicate that underpowered studies could reach reasonable power by increasing the number of controls and reducing the number of hypothesis testing (i.e. haplogroups). In order to reach a wide range of researchers, we provide a friendly web-based tool (mitPower) that implements all the statistical procedures developed in the present study; this software can be used in both retrospective and prospective case-control disease studies. Note that the term retrospective is considered here as done before: ''the prospective power that can be obtained ignoring the fact that data have been gathered and a hypothesis has been tested. In essence, it computes the prospective power of the test as if: (a) the study and analyses had not yet been conducted and (b) the sample effect size is the hypothesized population effect size'' [43]. Further developments of mitPower could involve the implementation of multiple test corrections for the computation of the statistical power (in the sense as it was suggested before in 262 tables [44]) and in two-stage case-control designs [45]. Also challenging would be to explore the phylogenetic relationship existing between different haplogroups (the phylogenetic dependence) or mtSNPs and how this dependence could influence the estimation of power. Finally, other statistical/computational approaches could find their place in 26k tables, such as the use of Markov Chain Monte Carlo methods (MCMC), already explored for 26262 tables [46]. Figure S1 Representation of power values for three haplogroups (H, J. and I) as a function of the number of cases and using the Fisher's exact test instead of the Chisquare test (Figure 1)(significance level of a = 0.05). Colors indicate different deviations from the null hypothesis; thus, black represents a frequency in cases 100% higher than in controls, red represents an increment of 50%, and green an increment of 25% (with the difference distributed proportionally between the remaining non-risky haplogroups). The different lines indicate different case-control odds. The continuous line denotes an odd control-case of 1:1, the dotted line of 2:1, and the pointed line of 3:1. Frequencies in controls for each haplogroup are indicated above each plot. (TIF)