Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Improved procedures and computer programs for equivalence assessment of correlation coefficients

  • Gwowen Shieh

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    gwshieh@nycu.edu.tw

    Affiliation Department of Management Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

Abstract

The correlation coefficient is the most commonly used measure for summarizing the magnitude and direction of linear relationship between two response variables. Considerable literature has been devoted to the inference procedures for significance tests and confidence intervals of correlations. However, the essential problem of evaluating correlation equivalence has not been adequately examined. For the purpose of expanding the usefulness of correlational techniques, this article focuses on the Pearson product-moment correlation coefficient and the Fisher’s z transformation for developing equivalence procedures of correlation coefficients. Equivalence tests are proposed to assess whether a correlation coefficient is within a designated reference range for declaring equivalence decisions. The important aspects of Type I error rate, power calculation, and sample size determination are also considered. Special emphasis is given to clarify the nature and deficiency of the two one-sided tests for detecting a lack of association. The findings demonstrate the inappropriateness of existing methods for equivalence appraisal and validate the suggested techniques as reliable and primary tools in correlation analysis.

Introduction

Practical guidelines and suggestions for selecting, calculating, and interpreting effect size indices in statistical analyses have been frequently advocated in the literature. Comprehensive reviews and general principles concerning effect size measures are available in the recent works of Fritz, Morris, and Richler [1], Grissom and Kim [2], Kelley and Preacher [3], Kline [4], Pek and Flora [5], and the references therein. According to the summary in Ferguson [6], effect size measures can fall into four general categories: (1) group difference, (2) strength of association, (3) corrected estimates, and (4) risk estimates. Particularly, Pearson product-moment correlation coefficient or sample correlation coefficient r is the most commonly used strength of association measure in applied research across virtually all disciplines of social sciences. The popularity of sample correlation coefficient in the psychological literature has been documented in de Winter, Gosling and Potter [7], Hemphill [8], and Richard, Bond and Stokes-Zoota [9], among others.

Under the normality assumption, the probability density function of the sample correlation coefficient r is extremely complicated as shown in Fisher [10]. Theoretical details and related issues can be found in Chapter 32 of Johnson et al. [11] and Chapter 16 of Stuart and Ord [12]. Exact statistical analyses of the correlation coefficient ρ require complex procedures and involved computation, such as Shieh [13, 14]. To facilitate practical analysis, numerous investigations were devoted to give various expressions, approximations, and computing algorithms for the distribution of the sample correlation coefficient. Notably, the asymptotic normal distributions of the sample correlation coefficient and the Fisher’s [15] z transformation have proven to provide reasonable alternatives with satisfying performance in many cases. The intrinsic properties of the Fisher’s z transformation in terms of conversion accuracy, geometric interpretation, normalization acceleration, and variance stabilization are demonstrated in Bond and Richardson [16], Hotelling [17], Silver and Dunlap [18], and Winterbottom [19].

It is noteworthy that most presentations of correlational techniques deal primarily with the conventional tests of significance. But methodologists have been strongly advocated to consider replacements for or extensions of the null hypothesis of strict equality to deliver more profound implications in statistical analysis. Specifically, the method of equivalence testing is potentially useful in behavioral and psychological sciences as emphasized in Rogers, Howard, and Vessey [20], Seaman and Serlin [21], Stegner, Bostrom, and Greenfield [22], and Steiger [23]. Meyners [24] presented a discussion of the different types of equivalence tests. Moreover, fundamental principles on the design and analysis of equivalence studies are described in Chow and Liu [25], Hauschke, Steinijans, and Pigeot [26], and Wellek [27].

The two one-sided tests (TOST) procedure of mean equivalence, first described by Schuirmann [28] and Westlake [29], is the most common method in equivalence methodology. Because of the approximate nature, the TOST method possesses conceptual simplicity and computational ease. More importantly, the procedure adequately maintains the Type I error rates and the notion gains general acceptance in practical equivalence problems. Berger and Hsu [30], however, cautioned that the TOST principle may not always preserve the nominal Type I error rates in other circumstances. Within the context of correlation analysis, there are few attempts that study the equivalence testing techniques. The particular case of Goertzen and Cribbie [31] suggested a direct extension of mean equivalence TOST to detect a lack of association. Naturally, the TOST method for assessing the lack of association is presumed to share the same desirable properties of the counterpart TOST for establishing mean equivalence.

It is prudent to note that the lack of association examined in Goertzen and Cribbie [31] concerns what sort of strength of association is so small that it should be described as negligible. It is also constructive and more versatile to evaluate whether a target correlation is close enough to any specific magnitude of substantive interest with respect to the designated equivalence boundaries. The simulation results of Goertzen and Cribbie [31] revealed that the TOST method based on the Fisher’s transformation has a serious disadvantage in achieving the nominal Type I error rates. However, no analytic examination and technical illustration have been provided in the literature to elucidate the causes of the problematic behavior. A thorough investigation is required to clarify the nature of such deficiency and its implications for equivalence testing. Goertzen and Cribbie [31] suggested that the detection of a lack of association requires substantially large sample sizes. Monte Carlo simulation methods may give a potential solution to sample size calculation. It is of practical importance to derive the power function and then combine a numerical search to determine the optimal sample sizes.

In view of the importance of equivalence testing and limitations of the current TOST method for correlation coefficients, this paper has four major goals. First, a general framework is considered for appraising correlation equivalence with respect to a designated reference range that may not be equidistant around the zero value or may not even include the zero value. Therefore, the lack of association is a special case of the presented unified structure. Second, analytic examination and numerical assessment are conducted to illustrate the relative performance of the proposed equivalence procedures. In the process, detailed appraisals and graphic displays are presented to explicate the inherent deficiencies of the TOST method in detecting a lack of association. Third, explicit power functions and sample size algorithms are derived and examined to reveal the exact functional relation and individual impact of the influential factors. They provide researchers a better understanding of the inherent difference that exists between the planned sample sizes conditional on the model configurations. Fourth, it is of practical interest to alleviate the computational demands in equivalence studies. The accompany SAS/IML and R software algorithms are available for conducting the equivalence tests, power calculations, and sample size determinations.

Methods

Suppose that the paired random variables (Yi, Xi), i = 1,…, N, are independent and identically distributed with bivariate normal distribution with means μX, μY, variances , and correlation ρ. Notably, the correlation coefficient ρ represents an essential effect size measure for the strength of linear relationship between the two variables. The widely used Pearson product-moment correlation coefficient r is a natural estimator for the correlation coefficient ρ. It is noteworthy that the normality assumption of (Yi, Xi), i = 1,…, N, provides a convenient and useful setting. However, exact statistical inferences of the correlation coefficient ρ with the sample counterpart r demand considerable analytic and computational complexity. Large-sample approximations are often considered to provide feasible solutions in practical applications.

Fisher’s z transformation

A highly regarded approach to the analysis of population correlation coefficient ρ is based on the famous Fisher’s [15] z transformation. Fisher’s statistic has an approximately normal distribution (1) where ζ = ln{(1 + ρ)/(1 –ρ)}/2 and . The large-sample approximations of the sample correlation coefficient r and Fisher’s z transformation provide convenient alternatives to correlation assessments. The conventional concerns of correlation analysis focus on the detection of correlation difference with respect to the hypotheses where ρ0 is a chosen quantity. Accordingly, the hypothesis testing can be conducted by rejecting the null hypothesis at the significance level α if |Z*| > zα/2 where , ζ0 = ln{(1 + ρ0)/(1 –ρ0)}/2, and zα/2 is the upper 100(α/2)-th percentile of the standard normal distribution.

On the other hand, the corresponding large-sample approximation for the distribution of r is (2) where . Fisher’s z transformation is largely recommended because the transformation substantially improves the normality approximation, especially for small sample sizes and extreme sample correlations. Nonetheless, the sample correlation coefficient can still have intrinsic values in specific problems and complex situations such as Olkin and Finn [32, 33] and Steiger [34]. Despite the great interest in correlation analysis, there exist few studies that explicitly address the problem of how to appraise correlation equivalence. With the asymptotic normality properties of the sample correlation coefficient and Fisher’s z transformation, extended procedures are proposed for equivalence assessment of correlation coefficients.

The extended sample correlation coefficient procedure

The primary focus of this article is on the equivalence test of correlation coefficient with respect to the null and alternative hypotheses: (3) where ρL and ρU are two constants that (ρL, ρU) represents the designated range for declaring equivalence. Related discussions for selecting a specific margin or threshold for equivalence research are available in Piaggio et al. [35], Walker and Nowacki [36], and Wiens [37]. The general theorem to deriving optimal parametric tests for equivalence hypotheses was presented in Wellek [27], Section 3.3. Also, the determination of rejection region of the optimal procedure follows from the general results in Lehmann and Romano [38], Section 3.4, for tests in families with monotone likelihood ratio. To claim the population correlation ρ is within the interval (ρL, ρU), a natural rejection region to the null hypothesis is (4) where the two critical values and are chosen to simultaneously attain the nominal Type I error rate (5)

Due to the complexity of the exact distribution function of r, the asymptotic normal distribution given in Eq 2 is a feasible method. Thus, the two probabilities and can be evaluated by the approximate normal distributions and , respectively, where and . Note that the two quantities and are functions of the configurations {α, N, ρL, ρU}. Essentially, they have no explicit analytic expression and require a computer program to calculate the actual values.

The extended Fisher’s z transformation procedure

In view of the widely used Fisher’s transformation for correlation analysis, an alternative approach to assessing correlation equivalence is testing the null and alternative hypotheses: (6) where ζL = ln{(1 + ρL)/(1 –ρL)}/2, ζU = ln{(1 + ρU)/(1 –ρU)}/2. Accordingly, the interval (ζL, ζU) indicates the designated bounds for declaring equivalence with respect to the transformed parameter ζ. In this case, the rejection region is of the form (7) where the two critical values and simultaneously achieve the nominal Type I error rate (8)

Following the accurate approximation of given in Eq 1, the two probabilities and can readily be evaluated by the approximate normal distributions and , respectively. For ease of application, the rejection region EQUT- is commonly converted into the scale of r by the conversion formula . Thus, a useful expression of EQUT- is (9) where .

Under the asymptotic theory, Fisher’s transformation has vital implications in normalization acceleration and variance stabilization relative to the sample correlation coefficient. The discrepancy between the two equivalence approaches with the designated rejection regions EQUT-r and EQUT- will be explicated in the subsequent numerical illustrations.

Numerical examples

The summary of Hemphill [8] revealed that approximately one third of the correlation coefficients are less than 0.20, one third fall between 0.20 and 0.30, and one third are more than the magnitude 0.30 in the research literature of psychological assessment and treatment. Also, the comprehensive review of Richard et al. [9] showed that the average magnitude of correlation coefficients in psychological literature is 0.21. Accordingly, only the values between 0 and 0.3 are evaluated for the reference bounds ρL and ρU in the numerical illustration. With the significance level α = 0.05, the rejection regions of the two equivalence tests are computed for the reference range (ρL, ρU) = (0, 0.20), (0.05, 0.15), (0.10, 0.30), and (0.15, 0.25) and sample size N = 25, 50, 100, and 500.

Simulation study of 10,000 iterations was also conducted to assess the accuracy of rejection regions through the differences between the simulated Type I error rate and the nominal alpha level 0.05. The associated results of rejection regions and simulation errors are summarized in Table 1. Although both test procedures are constructed under asymptotic theory, they achieve nearly the specified Type I error rate even for small sample sizes N = 25 and 50. To visualize the similarities and differences between the two procedures, the rejection regions for (ρL, ρU) = (0, 0.20) and (0.10, 0.30) are also plotted in Figs 1 and 2, respectively. The rejection regions of the two procedures have distinct outcomes for small sample sizes N < 150 and are nearly identical for larger sample sizes N ≥ 150.

thumbnail
Fig 1. The rejection regions for (ρL, ρU) = (0, 0.2) and α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.g001

thumbnail
Fig 2. The rejection regions for (ρL, ρU) = (0.1, 0.3) and α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.g002

thumbnail
Table 1. The critical intervals and simulated errors of the suggested correlation equivalence tests for α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.t001

Results

An important scenario in equivalence assessment is the detection of a lack of association or the population correlation ρ is practically zero. Accordingly, the asymptotic normal distributions of the simple correlation r and the associated transformation have zero mean when the population correlation ρ = 0. Due to the symmetric feature of normal distributions for the two principal statistics, it is sensible to adopt an equidistant reference range about zero in assessing the lack of association. Thus, the problem of probing a lack of association can be viewed as a special setting of the proposed general framework for correlation equivalence detection.

The proposed lack of association tests

To examine the lack of association, the prescribed hypotheses for equivalence testing are readily modified with ρL = –ρB and ρU = ρB with ρB > 0: (10) where the designated bound ρB indicates the maximal tolerance magnitude to claim a lack of association. The equivalence procedures based on the two statistics r and can immediately be applied to the current problem for testing a lack of association.

With the symmetric equivalence range (–ρB, ρB) around zero, the subsequent explication shows that two critical values and of the prescribed equivalence procedure have a simple relation . Note that the sample correlation coefficient r has the approximate distribution and for ρ = –ρB and ρB, respectively, where . Hence, the approximate distribution of under ρ = –ρB coincides that of under ρ = ρB. As described earlier, the actual values and are uniquely determined by the two probabilities and . The normal approximation of r implies the former equality is closely related to the latter:

Accordingly, this examination establishes that and the rejection region can be simplified as (11) where is chosen so that (12) and .

Under the notion of Fisher transformation, the lack of association test can alternatively be conducted in terms of the hypotheses: (13) where ζB = ln{(1 + ρB)/(1 –ρB)}/2. Following the arguments similar to the previous case for r, the rejection region for the transformed test statistic is of the form (14) where the quantity satisfies (15) and . The rejection region EQUT- can also be transformed into an interval on r as (16) where .

Two one-sided tests procedures

With the popular mean equivalence TOST procedure of Schuirmann [28] and Westlake [29], it is temping to generalize the appealing principle for correlation evaluation with the sample correlation r and the transformation . Using the asymptotic normal distribution of r, a TOST procedure for detecting a lack of association can easily be constructed with the approximate normal distribution . Specifically, the null hypothesis H0: ρ ≤–ρB or ρB ≤ ρ is rejected at the significance level α if (17) where and zα is the upper 100 α-th percentile of the standard normal distribution. For ease of explication, the procedure is termed as the TOST-r test and the associated rejection region is expressed as (18) where . Regarding the Type I errors, the TOST-r procedure should attain the nominal alpha level when ρ = ρB or ρ = –ρB. Accordingly, the true Type I error rate of TOST-r is (19) where .

Similarly, a TOST procedure can be obtained with the Fisher’s transformation for detecting a lack of association as previously suggested by Goertzen and Cribbie [31]. This procedure is denoted by TOST- and it rejects the null hypothesis H0: ζ ≤–ζB or ζB ≤ ζ at the significance level α if (20)

The resulting rejection region can also be written as: (21) where . Moreover, the asymptotic distribution of reveals that the true Type I error rate is (22) where .

Type I errors

The most important property of a test procedure is to provide acceptable level of Type I errors. Without the adequate or excellence adherence to the nominal α levels, the accompanying power evaluations and statistical assessments are meaningless on the basis of distorted Type I error behavior. It follows from the analytic justifications in Eqs 12 and 15 that the two suggested equivalence procedures EQUT-r and EQUT- have excellent performance in maintaining the nominal Type I error rates. In contrast, the other two TOST counterparts are problematic as explained next.

Note that the (supremum) Type I error rate of the mean equivalence TOST method is exactly equal to the nominal alpha level as the sample size goes to infinity, even though the true rejection probability is less than the designated alpha level for all possible configurations under the null hypothesis. For the direct generalization of TOST-r procedure for correlation equivalence, however, the rejection region TOST-r given in Eq 18 is a proper interval only when . It is clear that suggests that ρB > zασB or . Detailed numerical inspections at α = 0.05 reveal that TOST-r degenerates as an empty set if N < 1090.6352 when ρB = 0.05, and if N < 278.9924 when ρB = 0.10. To notify this crucial deficiency, the related minimum sample sizes for a nonempty TOST-r are summarized in Table 2 for α = 0.01 and 0.05, and ρB = 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30.

thumbnail
Table 2. The minimum sample sizes of TOST procedures to have a nonempty critical interval for detecting a lack of association.

https://doi.org/10.1371/journal.pone.0252323.t002

On the other hands, the one-to-one relation between r and implies that the rejection region TOST- shares the same disadvantage as TOST-r. The last quantity in Eq 22 indicates that the Type I error rate of the TOST- procedure usually does not attain the nominal level α. However, the Type I error rate of the TOST- method also has the supremum α as the other TOST-r method when the sample size goes to infinity. For finite sample sizes, the rejection region becomes invalid when or . Specifically, the rejection region TOST- is empty if N < 1083.4132 when ρB = 0.05, and if N < 271.7488 when ρB = 0.10. The minimum sample sizes for a nonempty rejection region TOST- are also listed in Table 2 for α = 0.01 and 0.05, and ρB = 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30.

To further demonstrate the fundamental characteristics of the contending equivalence methods, the vital properties of actual Type I error rates are investigated. Specifically, with the significance level α = 0.05, the rejection regions of the four equivalence tests are calculated for the lack of association with the range (–ρB, ρB) = (–0.1, 0.1) and (–0.2, 0.2), and sample size N = 25, 50, 100, and 500. The rejection regions of the four test procedures for (–ρB, ρB) = (–0.1, 0.1) and (–0.2, 0.2) are also plotted in Figs 3 and 4, respectively. Moreover, the adequacy of Type I error rate was examined through simulation study of 10,000 iterations and was determined by the deviation between the simulated Type I error rate and the nominal alpha level. The resulting rejection regions and simulation results are listed in Table 3. These numerical evidences suggest that the proposed equivalence procedures EQUT-r and EQUT- have outstandingly performance in achieving the nominal significance level. The two TOST procedures generally do not provide proper rejection regions and adequate levels of Type I errors for small sample sizes, and the situation is more severe when a smaller threshold is considered.

thumbnail
Fig 3. The rejection regions for (ρL, ρU) = (-0.1, 0.1) and α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.g003

thumbnail
Fig 4. The rejection regions for (ρL, ρU) = (-0.2, 0.2) and α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.g004

thumbnail
Table 3. The critical intervals and simulated errors of the lack of association tests for α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.t003

The problematic behavior of TOST- was also demonstrated in the numerical examination (Table 2) of Goertzen and Cribbie [31]. Specifically, their simulation results showed that the resulting Type I error rates of TOST- and two related procedures can be zero for small sample sizes and small correlation bounds. The analytic and empirical findings presented here illustrate the undesirable behavior of the TOST-r and TOST- procedures.

Power comparisons

The examination of different equivalence procedures further explicates their power behavior for detecting the lack of association through simulation study. With the significance level α = 0.05, the simulated powers of the EQUT-r, EQUT-, TOST-r, and TOST- tests are computed for 10,000 independent samples. The model configurations of correlation coefficient, reference range, and sample size are chosen as ρ = 0 and 0.05, (–ρB, ρB) = (–0.1, 0.1) and (–0.2, 0.2), and N = 25, 50, 100, 200, 300, 400, and 500, respectively. The simulated powers of the combined twenty-eight settings are summarized in Table 4 for the four equivalence procedures. The results show that the two suggested procedures have more power than the other two TOST counterparts. Although the differences between these methods diminish for large sample sizes, their discrepancy can be substantial for small and moderate sample sizes. In particular, due to the extremely conservative behavior or the degeneration of rejection region of the two TOST methods, the resulting power values are zero for ten cases in Table 4. For example, both TOST-r and TOST- methods give no power when (–ρB, ρB) = (–0.1, 0.1) for N ≤ 200, or when (–ρB, ρB) = (–0.2, 0.2) for N ≤ 50. In view of these results, the two TOST procedures are not recommended for detecting a lack of association. The rejection regions EQUT-r and EQUT- assure that the proposed equivalence procedures have superior Type I error rate and power performance.

thumbnail
Table 4. The simulated powers of the lack of association tests for α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.t004

Discussion

A research study requires adequate statistical power and sufficient sample size to examine vital questions and target effects. The importance and implications of statistical power analysis in equivalence testing are also demonstrated in Wellek [27], Murphy, Myros, and Wolach [39], Shieh [40], and Chow et al. [41], among others. To enhance the usefulness of the suggested equivalence procedures, the related issues of power analysis and sample size determination are considered.

Power and sample size calculations

According to the rejection region EQUT-r defined in Eq 4 of the extended sample correlation procedure, the power function is given by (23) where and ρL < ρ < ρU. Moreover, the rejection region EQUT- defined in Eq 7 of the extended Fisher transformation procedure suggests that the associated power function is of the form (24) where and ζL < ζ < ζU. Under the asymptotic normality assumptions, the attained power levels of the two equivalence tests can readily be computed with Ψr and for the specified configurations of equivalence limits (ρL, ρU), population correlation ρ, and significance level α. For advance planning of a research design, the two power formulas can be employed to calculate the sample size N needed to attain the specified power 1 –β for the chosen significance level α, chosen correlation ρ, and equivalence threshold (ρL, ρU).

Simulation study

Because of the approximate nature of the proposed equivalence procedures, a Monte Carlo simulation study was utilized to appraise the similarities and differences between the suggested power and sample size calculations under a wide variety of correlation configurations. The numerical study was conducted in two steps. First, under the specified settings, the minimum sample sizes required to meet the nominal power 0.80 and α = 0.05 were determined by the power formulas Ψr and . The estimated powers or achieved powers are recorded for the optimal sample sizes. Second, with the designated sample sizes, simulated powers were computed with a Monte Carlo simulation study of 10,000 independent data sets to evaluate the accuracy of the two approaches. The accuracy of the two power and sample size procedures is determined by the error between the simulated power and estimated power.

The results of the two procedures EQUT-r and EQUT- are presented in Table 5 for various settings of population correlation ρ, and equivalence range (ρL, ρU). It can be seen that the optimal sample sizes noticeably vary with the combined characteristics of ρ and (ρL, ρU). Specifically, when ρ is a varying factor, the sample size increases with decreasing distance = min(ρU−ρ, ρ–ρL) when the equivalence bounds (ρL, ρU) and other settings are fixed. When ρ is a constant, the sample size decreases with wider range of (ρL, ρU). The computed sample sizes of the EQUT-r procedure are slightly smaller than those of the EQUT- transformation for small ρ < 0.3. The situation is reversed when ρ = 0.4 with (ρL, ρU) = (0.3, 0.5), and when ρ = 0.5 with (ρL, ρU) = (0.4, 0.6). More importantly, the small discrepancy between the simulated power and estimated power reveals that the two techniques are extremely accurate for power and sample size calculations. In short, the extended sample correlation coefficient and Fisher’s z transformation procedures can be recommended as general tools for appraising correlation equivalence.

thumbnail
Table 5. Sample sizes, computed power, and simulated errors of the suggested equivalence tests for nominal power 0.80 and α = 0.05.

https://doi.org/10.1371/journal.pone.0252323.t005

Conclusions

A growing attention in the behavioral and psychological literature concerns how to make a decision about an observed effect that is small enough to be considered negligible. However, the conventional tests of difference are often inappropriately applied to conclude an effect is absent based a non-significant result. A widely recommended approach is to conduct an equivalence test to ascertain whether the observed effect size falls inside the selected equivalence boundaries. The TOST procedure of mean equivalence has been extensively applied in pharmacokinetics and various scientific disciplines. It is essential to note that there is little consensus in the literature on which method is most appropriate for equivalence testing. Conceptually, the preference varies with the right and proper criteria to select an optimal procedure. Considerations of more advanced aspects of TOST and alternative procedures for bioequivalence testing are beyond the scope of this article. The interested reader is referred to Meyners [24], Berger and Hsu [30], and the discussion therein for further details.

In view of the prevalent recognition of TOST, Goertzen and Cribbie [31] applied the same principle to the problem of assessing a lack of association. However, their numerical results showed that the TOST correlation procedure does not maintain nominal rejection rates when the sample sizes and correlation bounds are small. Despite the undesirable behavior of the TOST extension for correlation evaluations, no technical examinations and proper alternatives have been described in the literature. The present article aims to contribute to the correlation equivalence studies in four aspects. First, based on the Pearson product-moment correlation coefficient and the Fisher’s z transformation, their asymptotic properties are extended to construct equivalence procedures of correlation coefficients. Second, the empirical and analytic investigations not only clarify situations that the TOST principle does not adequately attain the nominal Type I error rates, but also justify the overall performance of the improved techniques for correlation assessments. Third, to enhance the utility of the suggested procedures, the corresponding power and sample size calculations for designing correlational research are also considered. Fourth, computer algorithms are developed to facilitate the practical use of the proposed equivalence procedures by providing efficient and accurate calculations of rejection regions, statistical powers, and sample sizes for correlation equivalence studies.

Supporting information

S1 File. R programs for performing the equivalence test procedures.

https://doi.org/10.1371/journal.pone.0252323.s001

(DOCX)

S2 File. SAS/IML programs for performing the equivalence test procedures.

https://doi.org/10.1371/journal.pone.0252323.s002

(DOCX)

References

  1. 1. Fritz C. O., Morris P. E., & Richler J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. pmid:21823805
  2. 2. Grissom R. J., & Kim J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York: Routledge.
  3. 3. Kelley K., & Preacher K. J. (2012). On effect size. Psychological Methods, 17, 137–152. pmid:22545595
  4. 4. Kline R. B. (2013). Beyond significance testing (2nd.). Washington, DC: American Psychological Association.
  5. 5. Pek J., & Flora D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23, 208–225. pmid:28277690
  6. 6. Ferguson C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research & Practices, 40, 532–538.
  7. 7. de Winter J. C., Gosling S. D., & Potter J. (2016). Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological Methods, 21, 273. pmid:27213982
  8. 8. Hemphill J. F. (2003). Interpreting the magnitudes of correlation coefficients. American Psychologist, 58, 78–80. pmid:12674822
  9. 9. Richard F. D., Bond C. F. Jr, & Stokes-Zoota J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7, 331–363.
  10. 10. Fisher R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10, 507–521.
  11. 11. Johnson N. L., Kotz S., & Balakrishnan N. (1995). Continuous univariate distributions (2nd ed., Vol. 2). New York: Wiley.
  12. 12. Stuart A., & Ord J. K. (1994). Kendall’s advanced theory of statistics (6th ed., Vol. 1). New York, NY: Halsted Press.
  13. 13. Shieh G. (2006). Exact interval estimation, power calculation and sample size determination in normal correlation analysis. Psychometrika, 71, 529–540.
  14. 14. Shieh G. (2010). Estimation of the simple correlation coefficient. Behavior Research Methods, 42, 906–917. pmid:21139158
  15. 15. Fisher R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 3–32.
  16. 16. Bond C. F., & Richardson K. (2004). Seeing the Fisher Z-transformation. Psychometrika, 69, 291–303.
  17. 17. Hotelling H. (1953). New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society. Series B, 15, 193–232.
  18. 18. Silver N. C., & Dunlap W. P. (1987). Averaging correlation coefficients: Should Fisher’s z transformation be used?. Journal of Applied Psychology, 72, 146–148.
  19. 19. Winterbottom A. (1979). A note on the derivation of Fisher’s transformation of the correlation coefficient. The American Statistician, 33, 142–143.
  20. 20. Rogers J. L., Howard K. I., & Vessey J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565. pmid:8316613
  21. 21. Seaman M. A., & Serlin R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3, 403–411.
  22. 22. Stegner B. L., Bostrom A. G., & Greenfield T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19, 193–198.
  23. 23. Steiger J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182. pmid:15137887
  24. 24. Meyners M. (2012). Equivalence tests–A review. Food Quality and Preference, 26, 231–245.
  25. 25. Chow S. C., & Liu J. P. (2008). Design and analysis of bioavailability and bioequivalence studies (3rd ed.). New York, NY: Chapman & Hall/CRC.
  26. 26. Hauschke D., Steinijans V., & Pigeot I. (2007). Bioequivalence studies in drug development: Methods and applications. Chichester: John Wiley & Sons.
  27. 27. Wellek S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). New York, NY: CRC Press.
  28. 28. Schuirmann D. L. (1981). On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics, 37, 617.
  29. 29. Westlake W. J. (1981). Response to T.B.L. Kirkwood: Bioequivalence testing–a need to rethink. Biometrics, 37, 589–594.
  30. 30. Berger R. L., & Hsu J. C. (1996). Bioequivalence trials, intersection-union tests and equivalence confidence sets (with discussion). Statistical Science, 11, 283–319.
  31. 31. Goertzen J. R., & Cribbie R. A. (2010). Detecting a lack of association: An equivalence testing approach. British Journal of Mathematical and Statistical Psychology, 63, 527–537. pmid:20030968
  32. 32. Olkin I., & Finn J. D. (1990). Testing correlated correlations. Psychological Bulletin, 108, 330–333.
  33. 33. Olkin I., & Finn J. D. (1995). Correlations redux. Psychological Bulletin, 118, 155–164.
  34. 34. Steiger J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245–251.
  35. 35. Piaggio G., Elbourne D. R., Altman D. G., Pocock S. J., Evans S. J., & Consort Group. (2006). Reporting of noninferiority and equivalence randomized trials: An extension of the CONSORT statement. Journal of the American Medical Association, 295, 1152–1160. pmid:16522836
  36. 36. Walker E., & Nowacki A. S. (2011). Understanding equivalence and noninferiority testing. Journal of General Internal Medicine, 26, 192–196. pmid:20857339
  37. 37. Wiens B. L. (2002). Choosing an equivalence limit for noninferiority or equivalence studies. Controlled Clinical Trials, 23, 2–14. pmid:11852160
  38. 38. Lehmann E. L., & Romano J. P. (2006). Testing statistical hypotheses. New York, NY: Springer Science & Business Media.
  39. 39. Murphy K. R., Myors B., & Wolach A. (2014). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests (4th ed.). New York, NY: Routledge.
  40. 40. Shieh G. (2016). Exact power and sample size calculations for the two one-sided tests of equivalence. PLoS One, 11, e0162093. pmid:27598468
  41. 41. Chow S. C., Shao J., Wang H., & Lokhnygina Y. (2017). Sample size calculations in clinical research (3rd ed.). Boca Raton, FL: Chapman and Hall/CRC.