Effect Sizes for 2×2 Contingency Tables

Sample size calculations are an important part of research to balance the use of resources and to avoid undue harm to participants. Effect sizes are an integral part of these calculations and meaningful values are often unknown to the researcher. General recommendations for effect sizes have been proposed for several commonly used statistical procedures. For the analysis of tables, recommendations have been given for the correlation coefficient for binary data; however, it is well known that suffers from poor statistical properties. The odds ratio is not problematic, although recommendations based on objective reasoning do not exist. This paper proposes odds ratio recommendations that are anchored to for fixed marginal probabilities. It will further be demonstrated that the marginal assumptions can be relaxed resulting in more general results.


Introduction
Sample size calculations are an integral part of scientifically useful and ethical research [1]. A study which is too small may not answer the research question, wasting resources and potentially putting participants at risk for no purpose [2]. Studies which are too large can also waste resources and expose participants to the potential harms of research needlessly, as well as delaying results and their translation into practice. The computation of sample size a priori is usually dependent upon predetermined values for power and level of significance, an estimate of the expected variability in the sample and an effect size of practical or clinical importance. By convention, the choice of power and level of significance is usually at least 80% and no more than 5% respectively. When a practically important effect size is unknown, there are several recommendations in the literature to guide the researcher. In his seminal paper, Cohen [3] gives operationally defined small, medium and large effect sizes for various, common significance tests. The use of effect size recommendations should not replace differences of clinical or practical importance [4] and may not be appropriate for all disciplines. In basic science research, for example, large effect sizes by Cohen's criteria are common and, therefore, require small sample sizes. On the other hand, clinical and epidemiological research often deals with small effect sizes and often requires large, population-based studies. While there are some approaches to estimating a minimum important effect [5], there are instances where this information is simply not known. Thus, effect size recommendations assist with the balance between overly small and overly large sample sizes.
When the researcher is interested in 2|2 contingency tables, a common measure of effect size is w which, in this instance, is equivalent to Pearson's correlation coefficient [6]. Cohen [3] recommends effect sizes of w~0:1,0:3 and 0:5 for small, medium and large effect sizes respectively and are identical to his recommendations for the correlation coefficient. Although Cohen [3] denotes this statistic as v, much of the literature uses w [6][7][8][9][10] and the remainder of this manuscript follows this convention. To support his recommended effect sizes for correlation coefficients, Cohen [11] chose equivalent values for the difference in two means through the connection with point biserial correlation. Additionally, w is applicable to logistic regression since it can be converted to an odds ratio (OR) when the row (or column) marginal probabilities of the 2|2 table are fixed. For example, when the marginal probabilities are uniform (i.e., 0:5 for row and column probabilities), Cohen's recommended effect sizes are equivalent to odds ratios of 1:49,3:45 and 9:0. It will be demonstrated that the connection between the odds ratio and w is largely dependent on the marginal probabilities and these OR values should not be used in general.
A problem arises when using the effect size w for 2|2 tables as the full range of correlation coefficients are only possible under very restrictive circumstances and are not justified in general [12]. On the other hand, odds ratios are valid effect size measures that are not constrained by the marginal probabilities. Ferguson [10] recommends small, medium, and large odds ratio effect sizes of 2:0,3:0 and 4:0, but urges caution in their use as they are not ''anchored'' to Pearson's correlation coefficient. Although many have pointed out problems with w as an association measure and advocate the use of odds ratios as an alternative, effect size recommendations for odds ratios do not exist in general.
It is common in randomised controlled trials and case-control studies to fix one of the marginal probabilities in the 2|2 table as it directly relates to the ratio of participant allocation. For instance, a marginal probability of 0:5 corresponds to a 1:1 case-control ratio while a 2:1 ratio is a marginal probability of 0:67 (or equivalently 0:33 for 1:2).
The aims of this paper are to demonstrate: (1) the equivalence of effect size measures for 2|2 contingency tables, in particular the relationship between w and the odds ratio; (2) that recommended odds ratio effect sizes can be derived from Cohen's work using the maximum value of w as a guideline for fixed marginal probabilities; (3) the shortcomings of w and the strength of the odds ratio as an effect size measure; and (4) that conservative odds ratio effect size recommendations can be derived without relying on fixed margins. We provide an example that investigates the association between helmet wearing by bicyclists and overtaking distance by automobiles.

2|2 Contingency tables
The two-way classification or contingency table is a common method for summarising the relationship between two binary variables, say X and Y . Table 1 gives the joint probability distribution of X and Y when their individual outcomes are from the set f0,1g.
In this formulation, p ij~P (X~i,Y~j), for i~0,1,j~0,1, is the joint probability of X and Y , p iz~P (X~i) is the marginal probability of X , and p zj~P (Y~j) is the marginal probability of Y . Under an assumption of independence between X and Y , the product of the marginal probabilities equals the cell probabilities, i.e., p ij~piz p zj . Alternatively, the 2|2 table could be represented by the frequency of observations so that n ij~n |p ij where n~X i,j n ij . Similarly, the marginal frequencies are n iz~ni0 zn i1 and n zj~n0j zn 1j . Note that p ij is assumed to be the population proportion as the focus of this paper is the use of effect sizes as a planning tool and not statistical inference per se. In a case-control study, for example, X may indicate the presence or absence of disease while Y is an indication of exposure. Thus, p 11~P (X~1,Y~1) would represent the joint probability of being diseased and exposed.

Effect size w and Equivalences for 2|2 Tables
There are many association measures applicable to 2|2 tables which, with the exception of the odds ratio and relative risk, are equivalent or similar to w. The equivalence of some of these association measures is outlined below.
For the random sample (X 1 ,Y 1 ),(X 2 ,Y 2 ), . . . ,(X n ,Y n ), Pearson's correlation coefficient is where X X and Y Y are the sample means of the X ' and Y ' respectively. Although used primarily as a measure of linear association, Pearson's correlation coefficient can be applied to binary variables and is often given the notation w. For the 2|2 table case, we get So, Pearson's correlation coefficient for binary random variables X and Y is Since p 11~p1z p z1 under the hypothesis of independence, w can be interpreted as measuring the departure from independence between X and Y . Note that Cramér's w is equivalent to this equation for the 2|2 table case [11] as well as the square root of Goodman and Kruskal's t [13].
For the analysis of contingency tables, in general (not just the 2|2 table case) the effect size formula for K total cells is where P 0k and P 1k are cell probabilities under the null and alternative hypotheses respectively. Note that v is related to the usual chi-square statistic x 2 by v~ffi ffiffiffiffiffiffiffiffi ffi x 2 =n q and is sometimes called the contingency coefficient. Using this formula, Cohen [3] recommends v~0:1,0:3 and 0:5 for small, medium and large effect sizes. Making note that P 1k is the probability of each cell (p ij ) and P 0k is the cell probability under an independence assumption (so that P 0k~piz p zj ), we can then write the effect size formula for the 2|2 table as follows Simple arithmetic demonstrates the equivalence of v with w. The sgn function is used to give the appropriate sign since the chisquare statistic is inherently non-directional.

The relationship of w to the odds ratio
The odds ratio for the association between X and Y is p 11 p 00 =(p 10 p 01 ). When the marginal probabilities are held constant and the cell probability p 11 is known, the remaining cell probabilities can be written as Therefore, when the marginal probabilities are fixed, the odds ratio can be computed directly from p 11 , which can then be expressed as It is clear from the above formula that the odds ratio will be greater than one (or less than one) precisely when the joint probability p 11 is greater (or less) than expected under an assumption of independence, i.e., p 11 wp 1z p z1 . Additionally, the formula for w can be rearranged to solve for p 11 , i.e., Although mathematically unattractive, it is clear the odds ratio can then be computed from w, p 1z , and p z1 . Note that when w~0 (i.e., no correlation), we get p 11~p1z p z1 (i.e., X and Y are independent) and the odds ratio is OR~1. When w=0, the term w ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p 1z p z1 (1{p 1z )(1{p z1 ) p is then a measure of the departure from independence.

Maximum w and Modified Effect Sizes
When the marginal probabilities are fixed constants, w is an increasing linear function of p 11 . Further, p 11 is bounded by max (0,p z1 {p 0z )ƒp 11 ƒ min (p 1z ,p z1 ) These bounds are due to all cell probabilities being nonnegative and the relationship of p 11 with the other cell probabilities given above. As a result, w is bounded as well and attains its maximum when p 11~m in (p 1z ,p z1 ). Using the upper bound of the above inequality, it can be shown that where p 1z vp z1 to ensure w max ƒ1. It is clear from the formula for w max that the full range of correlation coefficients, i.e., {1ƒwƒ1, is attainable only when the marginal probabilities are equal, i.e., p 1z~pz1 or p 1z~( 1{p z1 ). This has an intuitive appeal as perfect correlation for two binary variables is only possible when two cell probabilities are zero. For example, when all observations are in either the (0,1) or (1,0) cells,ŵ w~{1. However, it would appear highly unlikely both marginal probabilities will be equal in practice. For example, in a 1:1 case-control study with mortality as the primary outcome, half of all patients would need to die for perfect correlation to be possible. On the other hand, if 10% of all patients die, the maximum correlation possible is w max~1 =3 which is near a medium recommended effect size. So, in this situation, all estimates of w, computed from observed proportions, are bounded by {1v{1=3ƒŵ wƒ1=3v1 Importantly, odds ratios are not bounded with possible values of ½1,?) as w varies on the interval ½0,w max . In fact, as w approaches w max , the OR increases without bound. Figure 1 demonstrates this relationship. Importantly, this indicates w has serious limitations as a measure of association and that these limitations are not applicable to the odds ratio.

Effect Sizes Relative to w max
In many practical instances, the marginal probabilities are not equal, making the full range of values for w impossible with the potential of making Cohen's recommended effect sizes unusable for 2|2 tables. Although not equivalent to perfect correlation, w max can be interpreted as the maximum possible correlation given the marginal probabilities. In fact, w/w max has been proposed as an association measure with the interpretation as the proportion of observed correlation relative to the maximum attainable with fixed marginal probabilities [7], although the researcher is cautioned when the marginal probabilities diverge [6]. Note that w is not equivalent to Cohen's similarity/agreement measure k. However, k suffers from the same boundary problems as w and the two are equivalent when scaled to their maximum values, i.e., w/w max~k / k max , making the two measures similar [6].

Recommended effect sizes in terms of the odds ratio
As an alternative to Cohen's recommendations, increments of w max can be related to the odds ratio, say aw max , where a[(0,1). Note that values of a~0:1,0:3 or 0:5 coincide with Cohen's usual recommendations when w max~1 . The relationship between aw max and the odds ratio can be simplified by choosing marginal probabilities for commonly used participant allocations. As an example, Figures 2 and 3 demonstrate the relationship between p 1z and odds ratios for 0:1w max , 0:3w max and 0:5w max for 1:1 and 1:2 allocations respectively. Note that the minimal odds ratios, and therefore most conservative when used to compute sample size, occur when p 1z tends to 0. Although the odds ratio does not exist when p 1z~0 , the limit exists and is Additionally, the maximal odds ratio, and therefore most anticonservative, occurs when the marginal probabilities are equal, as expected. Below is the maximum attainable odds ratio for equal margins p 1z~pz1~p for increments a of w max , It is important to note that when 0vp z1 ƒ0:5, as is often true for case-control studies where cases are harder to identify or enrol than controls, the minimal odds ratio will be smallest for evenly allocated studies, i.e., p z1~0 :5. Further, it is generally recommended to use 1:1 allocation as it is the most statistically efficient ratio, i.e., maximum power for a fixed overall sample size. So, odds ratios of 1:22,1:86 and 3:00 can be used as small, medium and large effect sizes without assumptions regarding marginal probabilities. Sample sizes computed using these odds ratios for 1:1 allocation are given in Table 2 for 80% power and 5% level of significance. A SAS macro that will compute sample sizes from given marginal probabilities for small, medium and large odds ratios has been provided as a supplementary file.
Interestingly, Haddock et al. [12] as a rule of thumb consider odds ratios greater than 3 large effect sizes, although there is no clear justification given. In a situation where an allocation ratio other than 1:1 is used, recommended odds ratios can be computed directly using the above formula. These results are also applicable for other values of p z1 w0:5 through its complement 0vp z0 ƒ0:5. This is equivalent to swapping the columns (or rows) and the researcher should be aware the recommended odds ratio effect sizes are now the reciprocals of those above, i.e., 0:82,0:54 and 0:33 for small, medium and large respectively.
This approach can also be applied to the relative risk and risk difference. If X is taken as the grouping variable and Y as the outcome, the relative risk is p 11 (1{p z1 )=(p 10 p z1 ). Simple substitution of aw max and the marginal probabilities p 1z and p z1 results in a relative risk identical to OR min for p 1z w0, i.e.,

RR~1z
a (1{a)p z1 Therefore, recommendations can also be derived for relative risk and are identical to those given for the odds ratio above. This result is expected as the odds ratio converges to the relative risk as the incidence rate approaches 0.
Instead of comparing the risk between two groups as a ratio, it is sometimes useful to compare their differences [14]. Again taking X as the grouping variable and Y as the outcome, the risk difference can be written as where p 1z ƒp z1 to ensure w max ƒ1 as above. It is clear from the numerator in this representation that RD is a measure of the departure from independence, i.e., p 11~p1z p z1 . Simple substitution of aw max into RD yields where the subscript a is used to distinguish between risk difference formulae. This formula can be simplified somewhat for 1:1 allocations, i.e., RD a (p z1~0 :5)~2ap 1z ; however, a general result independent of the marginal probabilities is clearly not possible in this instance as 0ƒp 1z ƒ0:5 and therefore 0ƒRD a (p z1~0 :5)ƒa. Alternatively, the OR min formula can be solved for a and compared to previously given odds ratio recommendations. In terms of OR min and p z1 , we get When the allocation ratio is 1:1, this formula simplifies to a(p z1~0 :5)~(OR min {1)=(OR min z1) which has a form identical to Yule's Q [15]. So, Ferguson's [10] odds ratio recommendations of 2:0,3:0 and 4:0 therefore correspond to proportions of maximum correlation of a~0:33,0:5 and 0:6. This suggests Ferguson's recommendations have the potential to be anticonservative from a sample size viewpoint.

Example
This paper was motivated by a reanalysis of passing distances for motor vehicles overtaking a bicyclist [16]. One of the primary results of this study was a significant association between helmet wearing and less overtaking distance, supporting a theory of risk perception for motor vehicle drivers directed towards bicyclists. Prior to collecting data, Walker [16] reported computing a sample size of n~2259 overtaking manoeuvres based on a 2|5 fixed effects factorial ANOVA for a small effect size f~0:1, 5% level of significance and 98% power. The factors for this study were helmet wearing (2 levels) and bicycle position relative to the kerb (5 levels). It has been noted, however, that passing distances are often recommended and sometimes legislated to one metre or more [17]. So, passing manoeuvres of at least a metre are considered safe and less than a metre unsafe, with the implication that large differences in passing distance are unimportant beyond one metre in terms of bicycle safety. When compared with helmet wearing, safe/unsafe passing distances can be analysed using a 2|2 table. Since Walker's study was powered at an unusually high level with subsequent increased probability of a type I error, bootstrap standard errors were estimated for more reasonable values for power of 80%, 85% and 90%. Operationally defined small, medium and large effect sizes were also used since a meaningful difference in overtaking distance is unknown.
The relevant observed data from Walker [16] is given in Table 3. The observed marginal proportions here are p z1~0 :488 for helmet wearing and p 1z~0 :047 for unsafe passing manoeuvres. Using the marginal probabilities, the maximum attainable effect size is w max &0:227 and the estimated correlation iŝ w w~0:028. A consequence is the effect size for the association between helmet wearing and safe passing distance is, at best, much less than a small effect size by Cohen's index. The corresponding small, medium and large odds ratio effect sizes using increments of aw max are 1:24,1:94 and 3:21 for a~0:1,0:3 and 0:5. Note that these values are not much greater than the minimal recommended odds ratios mentioned in the previous section, further suggesting the association between safe/unsafe passing distance and helmet wearing is, at best, a small effect size. In fact, the unadjusted odds ratio is OR~1:3 and non-significant by the chi-square test (p~0:182). Conversely, sample sizes for a future study can be computed from the observed probabilities using G*Power for logistic regression with a single binomially distributed predictor for a~0:05 and 80% power [18] resulting in 16237,1409 and 383 observations for small, medium and large odds ratios. To put these sample size computations into perspective, a future study would need to extend the sampling period by a factor greater than seven to detect a significant association between helmet wearing and safe/unsafe overtaking distance given a small effect size and identical marginal probabilities.

Discussion
We present a demonstration that many contingency table correlation measures are equivalent for the 2|2 case and their use is limited due to constraints created by fixed marginal probabilities. The odds ratio, which is a function of these measures for fixed marginal probabilities, is not problematic, is regularly used in statistical analyses and has a direct application to logistic regression. Recommended odds ratios have been proposed from Cohen's small, medium and large effect sizes for w relative to the maximum attainable correlation w max . Further, minimal odds ratios can be computed with only knowledge of participant allocation.
The use of effect size recommendations should be avoided in situations in which clinical or practical differences are known. However, they can help the researcher balance between overly large or overly small sample size calculations when such information is unknown. In these situations, conservative estimates for odds ratio effect sizes can be derived from only the allocation ratio leading to a general result and, when a 1:1 allocation is chosen for optimal power, odds ratios of 1:22,1:86 and 3:00 correspond to small, medium and large effect sizes.

Supporting Information
File S1 SAS Macro to compute sample sizes from marginal probabilities for small, medium and large odds ratios. (SAS)