Skip to main content
Advertisement
  • Loading metrics

New approaches to meta-analyze differences in skewness, kurtosis, and correlation

  • Pietro Pollo ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Software, Visualization, Writing – original draft, Writing – review & editing

    pietro_pollo@hotmail.com (PP); snakagaw@ualberta.ca (SN)

    Affiliations Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Sydney, Australia, School of Environmental and Life Sciences, University of Newcastle, Newcastle, Australia

  • Szymon M. Drobniak,

    Roles Formal analysis, Writing – review & editing

    Affiliations Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Sydney, Australia, Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland

  • Hamed Haselimashhadi ,

    Roles Writing – review & editing

    ☯ These authors contributed equally and are listed alphabetically.

    Affiliation European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, United Kingdom

  • Malgorzata Lagisz ,

    Roles Writing – review & editing

    ☯ These authors contributed equally and are listed alphabetically.

    Affiliations Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Sydney, Australia, Department of Biological Sciences, University of Alberta, Biological Sciences Building, Edmonton, Canada

  • Ayumi Mizuno ,

    Roles Writing – review & editing

    ☯ These authors contributed equally and are listed alphabetically.

    Affiliation Department of Biological Sciences, University of Alberta, Biological Sciences Building, Edmonton, Canada

  • Laura A. B. Wilson ,

    Roles Writing – review & editing

    ☯ These authors contributed equally and are listed alphabetically.

    Affiliations School of Archaeology and Anthropology, The Australian National University, Acton, Australia, School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, Australia, ARC Training Centre for Multiscale 3D Imaging, Modelling and Manufacturing, Research School of Physics, The Australian National University, Acton, Australia

  • Daniel W. A. Noble ,

    Roles Formal analysis, Software, Supervision, Visualization, Writing – review & editing

    ‡ These authors share senior authorship on this work.

    Affiliation Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, Australia

  • Shinichi Nakagawa

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    pietro_pollo@hotmail.com (PP); snakagaw@ualberta.ca (SN)

    ‡ These authors share senior authorship on this work.

    Affiliations Evolution & Ecology Research Centre, School of Biological, Earth & Environmental Sciences, University of New South Wales, Sydney, Australia, Department of Biological Sciences, University of Alberta, Biological Sciences Building, Edmonton, Canada

Abstract

Biological differences between males and females are pervasive. Researchers often focus on sex differences in the mean or, occasionally, in variation, albeit other measures can be useful for biomedical and biological research. For instance, differences in skewness (asymmetry of a distribution), kurtosis (heaviness of a distribution’s tails), and correlation (relationship between two variables) might be crucial to improve medical diagnosis and to understand natural processes. Yet, there are currently no meta-analytic ways to measure differences in these metrics between two groups. We propose three effect size statistics to fill this gap: Δsk, Δku, and ΔZr, which measure differences in skewness, kurtosis, and correlation, respectively. Besides presenting the rationale for the calculation of these effect size statistics, we conducted a simulation to explore their properties and used a large dataset of mice traits to illustrate their potential. For example, in our case study, we found that females show, on average, a greater correlation between fat mass and heart weight than males. Although calculating Δsk, Δku, and ΔZr will require large sample sizes of individual data, technological advancements in data collection create increased opportunities to use these effect size statistics. Importantly, Δsk, Δku, and ΔZr can be used to compare any two groups, allowing a new generation of meta-analyses that explore such differences and potentially leading to new insights in multiple fields of study.

Background

Sex is a biological attribute that can strongly impact organisms’ traits, with differences between males and females being central to questions in the biological sciences [1,2]. In contrast, biomedical research has primarily focused on male subjects [3], posing a danger to female health [4,5]. Aware of these issues, the US National Institutes of Health and other health agencies have demanded using multiple sexes in animal studies when possible [6]. As a consequence, the number of biological and biomedical studies using both female and male animals as research subjects has increased in the last decade [7], leading to the accumulation of data that can be used to synthesize and quantify sex differences across biological domains.

Realizing the accumulation of sex-specific data, many perspective pieces have encouraged researchers to investigate sex differences more carefully [810]. Yet, some of these pieces, and most of the biological literature, focus exclusively on mean differences between males and females. A fixation on mean differences has been present for a long time in science because researchers tend to focus on dimorphism in trait averages [11], lack sufficiently powerful data, or have limited statistical tools available (or difficulty using them). Yet, measures such as variance, correlation, skewness, and kurtosis can be critical to understanding sex differences. For example, certain traits in mice may exhibit no disparity in average values between sexes, but substantial differences emerge in terms of variability [12,13]. These differences could be more easily assessed because of an effect size statistic that measures differences in variability between two groups (proposed by [14]), illustrating how novel statistical tools can expand possible research questions and provide new scientific insights, such as identifying sex differences in trait selection or canalization.

Beyond variability, the relative shape of trait distributions to the normal distribution (measured by skewness and kurtosis, i.e., asymmetry of a distribution and heaviness of a distribution’s tails, respectively; Fig 1A and 1B) can also be crucial to understanding ecological and evolutionary processes and patterns [1519], as well as improving medical diagnostics [20,21]. For instance, skewness can bias heritability estimates because evolutionary biologists assume that phenotypic components (genetic and environmental) are normally distributed [18]. Furthermore, kurtosis can be used to understand community assembly processes [16]. Besides the shape of trait distributions, evolutionary biologists and quantitative geneticists can quantify correlation matrices to understand trait plasticity and evolvability [2224], which could then be used for group comparisons (as in [25]; Fig 1C). Although location-scale-shape models [2628] may be used to explore between-group differences (e.g., males and females) in skewness, kurtosis, or within-group correlations, there are no effect size statistics that can easily measure such differences (but see also [29]).

thumbnail
Fig 1. Simulated trait distributions for two groups with different shapes (A: distinct skewness, B: distinct kurtosis), and different correlations between two traits for two groups (C).

The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.g001

Here, we propose three new effect size statistics to evaluate between-group differences in skewness (Δsk), kurtosis (Δku), and correlation (ΔZr), key moments of a distribution that are usually unexplored. These effect size statistics will be valuable to explore sex differences but can also be applied in other fields of study and used to compare differences between any two groups of interest. Meta-analyses using these new effect sizes will create multiple avenues for novel biological enquiries. The present moment is particularly conducive for analyses using these new effect sizes because the individual-level data (e.g., individual participant data [30,31]) required for their calculation are increasingly available from new technological advances that allow faster data collection and sharing (e.g., automated phenotyping).

Difference in skewness and kurtosis

The mean and variance represent the first and second moments of a distribution, respectively. However, the third and fourth moments of a distribution (i.e., skewness and kurtosis, respectively) can also be valuable as they characterize the distribution’s shape. More specifically, skewness reflects the distribution’s asymmetry around its mean. While positive skewness indicates an elongated right tail with an excess of high values, negative skewness suggests an elongated left tail with an excess of low values. This asymmetry can influence the interpretation of means and variation, as the mean tends to be larger than the median in positively skewed distributions, while the mean tends to be smaller than the median in negatively skewed distributions. Note that a perfectly normal distribution is symmetric (i.e., skewness = 0), where the mean is equal to the median. Sample skewness (sk) [32] can be expressed as:

(1)

where xi is a raw data value, x̄ is the sample mean, and n is the sample size. Skewness sampling variance (s2sk) [32] can then be expressed as:

(2)

On the other hand, kurtosis measures tail heaviness: high kurtosis distributions have heavier tails (i.e., proportionally more extreme values than central values), whereas low kurtosis distributions have lighter tails. For comparison, a normal distribution is expected to have kurtosis = 3. Sample excess kurtosis (ku) [32] can be expressed as:

(3)

with sampling variance (s2ku) [32] as:

(4)

Evaluating skewness and kurtosis provides valuable insights into a variable distribution, which is crucial for interpreting means, assessing variability, and making informed decisions in statistical analyses. Although meta-analyses can use skewness (Eq 1) and kurtosis (Eq 3) to investigate single variables, effect size statistics that compare these metrics between two groups are lacking. Thus, we propose the difference between two groups in skewness (Δsk), expressed as:

(5)

and its sampling variance (s2Δsk) as:

(6)

where ρsk represents the sampling correlation in skewness between the two groups (zero if assumed to be independent). Similarly, we propose the difference between two groups in kurtosis (Δku), expressed as:

(7)

and its sampling variance (s2Δku) as:

(8)

where ρku represents the sampling correlation in kurtosis between the two groups (zero if assumed to be independent).

However, we note that Eqs 2 and 4 assume normality for sampling variances. When the underlying distributions are skewed or heavy-tailed, sampling error variances for skewness and kurtosis (Eqs 2 and 4) and, by extension, for their between-group contrasts (Eqs 58), can misestimate uncertainty. To assess robustness and to provide distribution-free alternatives, we complemented the analytic formulas with resampling-based estimators computed within each group and summed for the difference (i.e., jackknife [33]; see our simulation study below).

Difference in correlation

Numerous meta-analyses estimate the correlation between two variables [34,35]. To do so, researchers use the effect size statistic Zr [36], which can be expressed as:

(9)

and its sampling variance (s2Zr) [36] as:

(10)

where r is Pearson’s correlation coefficient between two variables and n is the sample size.

Although Zr alone remains extremely useful to test correlational hypotheses, researchers from all fields would benefit from being able to compare Zr values between two groups. Although Cohen [37] proposed the difference between two groups in Zr as q, he did not provide an equation to calculate its sampling variance. Consequently, this effect size statistic has not been used despite its potential. We therefore propose the difference between two groups in Zr with a new name (ΔZr), as:

(11)

and its sampling variance (s2ΔZr) as:

(12)

where ρZr represents the sampling correlation in Fisher’s Zr between the two groups (zero if assumed to be independent).

Simulation study

We conducted Monte-Carlo simulations to evaluate bias and variance estimation for our new effect sizes Δsk, Δku, and ΔZr. For Δsk and Δku, we simulated independent samples for two groups from Pearson distributions with known moments using the rpearson function from the R package PearsonDS v. 1.3.2 [38]. We conducted two simulations: (1) by changing skewness between groups that involved moderate departures from normality in which group-specific skewness from sk ∈ {−1, −0.5, 0, 0.5, 1} and kurtosis was fixed at 3; (2) by holding skewness constant (sk = 0) while manipulating kurtosis from ku ∈ {2.5, 3, 4, 5, 6}. In all cases, we simulated scenarios where: (i) the variance between each group was the same ( =  = 1) or different ( versus ); (ii) the mean between the two groups was the same (u2 = u1 = 0) or different (u2 = 5, u1 = 0). For simplicity, we assumed equal sample sizes between groups with sample size varying from n ∈ {10, 20, …, 100, 150, 500}. We created all unique combinations of the above scenarios resulting in 1,200 independent scenarios (when considering each of the 100 scenarios at each sample size). We estimated Δsk and Δku for each scenario using formulas for within-group sample skewness with small-sample correction (Eq 1) and excess kurtosis with small-sample correction (Eq 3) to estimate point estimates. To estimate associated sampling variance for Δsk and Δku we used the analytical variance estimators derived here (Eqs 2 and 4) and an associated re-sampling (jackknife) approach to compute group sampling variances separately followed by pooling. Importantly, our simulations assume no correlation between groups.

For ΔZr simulations, we simulated two groups each containing two variables with known correlations within each group. For ΔZr, we drew bivariate normal data with target within-group correlations r ∈ {−0.8, −0.4, −0.2, 0, 0.2, 0.4, 0.6, 0.8} using the mvnorm function from the package MASS v. 7.3.61 [39]. Marginals were standard normal and group sizes varied from n ∈ {10, 20, …, 100, 150, 500}. We created all unique combinations of scenarios resulting in 768 unique scenarios. We estimated ΔZr using Fisher’s Z transformation Zr and calculating ΔZr as the difference of Zr across groups (Eqs 911). Sampling variance for ΔZr used Eq 10 and a jackknife approach. Again, we assumed no correlation between our groups.

Note that our simulations did not explore differences in sample size between groups. However, many groups being compared in meta-analyses have the same or very similar sample size. Additionally, simulations often show relatively small impacts of unbalanced sample sizes [40,41], which is why we originally did not vary sample size between groups in our simulations.

We resampled 2,500 times for each scenario across all simulations. Performance metrics were (a) bias of the point estimator, (b) relative bias of the sampling-variance estimator, (c) coverage (95%), and (d) Monte-Carlo standard errors (MCSEs). See Supporting information for full formulas. We also evaluated the performance of these effects for meta-analysis (see details in Sections 8.4 and 9.4 of the Supporting information).

Simulation results

In all cases, we found the Monte Carlo standard error (MCSEs) to be low for all our performance metrics (range of MCSEs for Δsk: 0 to 0.01; Δku: 0 to 0.624; ΔZr: 0 to 0.004). Δsk, Δku, and ΔZr point estimators exhibited small sample bias with less than 20–30 samples, except for Δku, which showed this bias below n < 50–60, indicating effect sizes involving kurtosis are more challenging to estimate (S1 and S2 Figs). Differences in the mean and variance between groups did not differentially affect bias (S3 Fig). Regardless, small sample biases were moderate, and there was rarely a consistent over- or under-estimation in point estimates across the scenarios evaluated (S1 Fig). Bias-corrected jackknife estimates reduced the small-sample bias relative to analytical bias corrected-moment estimators (mean square bias, jackknife, and analytical, for Δsk: 1.109, 3.375; Δku 477.71, 891.659; ΔZr 0.029, 0.214).

In contrast to point estimators, the effectiveness of sampling variance estimators for Δsk, Δku, and ΔZr varied. Analytical sampling variance formulas for Δsk and Δku were consistently biased (S4 Fig). Jackknife resampling when combined with analytical point estimates (Fig 2) performed the best. Under these conditions, estimators performed well when n > 50. In contrast, the performance of sampling variance estimators for ΔZr was best when using the analytical formulas for both the point estimator and its associated sampling variance (Fig 2).

thumbnail
Fig 2. Bias in Δsk, Δku, and ΔZr effect estimates (A, D, G), relative bias in their sampling variance using jackknife-based approximation (B, E, H), and coverage of effect estimates (C, F, I) across simulations where samples ranged in group sample sizes between n ∈ {10, 20, …, 100, 150, 500}.

A total of 100 simulated scenarios were assessed for Δsk and Δku whereas 64 simulated scenarios were assessed for ΔZr. We ran 2,500 simulations for each scenario. For simplicity, we only present results from our recommended point estimators and sampling variance estimators using jackknife. See supplementary material for full simulation results. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.g002

Coverage was close to nominal (95%) for Δsk and ΔZr across sample sizes (Fig 2C and 2I). Coverage for Δku, however, was poor across many simulated scenarios (Fig 2F). Increased sample size did not improve coverage. Poor coverage was the result of skewed sampling distributions from Jackknife approaches (S5 and S6 Figs). At small sample sizes, Δku was estimated poorly when true Δku was high, leading to non-skewed distributions with good coverage. In contrast, large sample sizes improved point estimation of Δku when differences existed, but the sampling distribution became highly skewed leading to poor coverage (S5 and S6 Figs). These problems stem from the fact that the standard error formula for kurtosis assumes normality (see [42]).

Considering these simulation results, we suggest pairing the formula-based point estimators for skewness (Eq 1) and kurtosis (Eq 3) with jackknife standard errors for Δsk and Δku. For ΔZr, the standard analytic variance is recommended (Eqs 912). This choice balances efficiency under normality with robustness to realistic deviations from it and aligns with our broader guidance to avoid very small group sizes for these statistics. Given the challenges in estimating Δku, and the poor properties of its sampling variance [42], we recommend weighted meta-analytic models using sample size instead of sampling variance (see Supporting information and [41]).

Worked examples: Sex differences in mice

To illustrate the application of our proposed effect size statistics, we used data compiled by the International Mouse Phenotyping Consortium (IMPC, version 18.0; [43]; http://www.mousephenotype.org/). We examined differences between male and female mice in two pairs of traits from distinct functional domains: morphology (fat mass and heart weight) and physiology (glucose and total cholesterol). We selected these traits because they are widely understood traits, even by non-specialists, and had a large sample size (more than 10,000 individuals measured). More specifically, we assessed differences between the sexes in mean (using the natural logarithm of the response ratio [44], hereby lnRR), variability (using the natural logarithm of the variance ratio [14], hereby lnVR), skewness (using Δsk), and kurtosis (using Δku) for each trait, as well as in the difference in correlation for each trait pair (using ΔZr). The IMPC dataset contains data from multiple phenotyping centers and mice strains, so we selected the ones with the most data points for our analyses here, computing the aforementioned effect size statistics separately for each one of them.

We performed a meta-analysis for each effect size statistic to obtain a mean effect size for each trait (or pair of traits, in the case of ΔZr), using “effect size ID,” “phenotyping center,” and “mice strain” as random factors in meta-analytical models (due to substantial heterogeneity, Table 1). In the case of Δku, we fitted a weighted meta-analytic model using sample size instead of sampling variance (see previous sections and [41]). In all these analyses, positive effect sizes denoted a greater estimate (mean, variability, skewness, kurtosis, or correlation) for males than females. We conducted all statistical analyses in the software R 4.5.1 [45]. We used the functions moment_effects and cor_diff, which have been incorporated into the package orchaRd v. 2.1.3 [46], to compute Δsk, Δku, and ΔZr. We fitted meta-analytical models using the rma.mv function from the package metafor v. 4.8-0 [47]. All methodological details and additional information can be found in our tutorial, at https://pietropollo.github.io/new_effect_size_statistics/.

thumbnail
Table 1. Heterogeneity estimates (I2) for each meta-analytical model fitted in our study.

https://doi.org/10.1371/journal.pbio.3003653.t001

We found that males, on average, had greater fat mass and heart weight than females regardless of phenotyping center and mice strain (Fig 3A, 3B, 3F, and 3G). The variability among individuals regarding these traits was also greater for males than for females, except for fat mass from one specific phenotyping center and mice strain (Fig 3C). By contrast, females had a similar skewness in fat mass and heart weight compared with males (Fig 3D and 3I). However, Δsk values for fat mass and heart weight varied across phenotyping centers and mice strains, with negative and positive values present (Fig 3D and 3I). Sex differences in kurtosis for fat mass and heart weight followed a very similar pattern to the one described for skewness: Δku values overlapping zero with some variation across individual effect sizes (Fig 3E and 3J). Moreover, the correlation between fat mass and heart weight was, on average, greater for females than males (Fig 4A and 4B). However, this difference in correlation was absent for some phenotyping centers and mice strains (Fig 4A and 4B).

thumbnail
Fig 3. Examples of morphological sex differences in mice (fat mass, A–E; heart weight, F–J) for various phenotype centers (each with a different color in panels B–E and G–J) and mice strains (each with a different shape in panels B–E and G–J), with the bottom estimate in panels B–E and G–J (turquoise diamond) representing the mean effect size.

A and F show distributions of these traits (scaled by subtracting the mean from each value and then dividing the result by the standard deviation) for males (black with dashed borders) and females (white with solid borders), with the sample size of females and males shown as Nf and Nm, respectively. Panels B–E and G–J show effect sizes (lnRR: natural logarithm of the response ratio; VR: variance ratio; Δsk: difference in skewness; Δku: difference in kurtosis), with their respective point estimate and 95% confidence interval stamped. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.g003

thumbnail
Fig 4. Relationship between fat mass and heart weight (A, B) and glucose and total cholesterol (C, D) in mice.

Panels A and C show these relationships (with variables scaled by subtracting the mean from each value and then dividing the result by the standard deviation) separately for males (dashed line) and females (solid line), each subpanel representing a different phenotyping center and/or mice strain, with the sample size of females and males shown as Nf and Nm, respectively. Panels B and D then show differences in correlation (ΔZr) between males and females (point estimate and 95% confidence interval stamped), where each color represents a distinct phenotype center and each shape represents a distinct mice strain, with the bottom estimate in each panel (turquoise diamond) representing the mean effect size. Note that panels A and C contain individual data points, which may appear as background shading in cases with large sample sizes. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.g004

We also found that male and female mice were, on average, similar in terms of blood glucose levels (Fig 5A and 5B), although males had higher total cholesterol than females (Fig 5F and 5G). We observed the same pattern regarding the variability of these traits: on average, the sexes were similarly variable in glucose (Fig 5C), but the variability of total cholesterol was greater in males than in females (Fig 5H). Contrasting with morphological traits, sex differences in skewness and kurtosis were mostly absent (Fig 5D, 5E, 5I, and 5J). Lastly, males and females showed a similar relationship between glucose and total cholesterol, albeit this relationship was stronger for males than for females in some instances (Fig 4C and 4D).

thumbnail
Fig 5. Examples of physiological sex differences in mice (glucose, A–E; total cholesterol, F–J) for various phenotype centers (each with a different color in panels B–E and G–J) and mice strains (each with a different shape in panels B–E and G–J), with the bottom estimate in panels B–E and G–J (turquoise diamond) representing the mean effect size.

A and F show distributions of these traits (scaled by subtracting the mean from each value and then dividing the result by the standard deviation) for males (black with dashed borders) and females (white with solid borders), with the sample size of females and males shown as Nf and Nm, respectively. Panels B–E and G–J show effect sizes (lnRR: natural logarithm of the response ratio; VR: variance ratio; Δsk: difference in skewness; Δku: difference in kurtosis), with their respective point estimate and 95% confidence interval stamped. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.g005

Our findings that females have, on average, lower (Fig 3B and 3G), less variable (Fig 3C and 3H), but similar skewness (Fig 3D and 3I) and extreme values (kurtosis; Fig 3E and 3I) of fat mass and heart weight compared with males may contribute to sex-related differences in the development of diseases associated with these traits and their biomarkers (e.g., QTc interval length [48]). Moreover, a stronger relationship between fat mass and heart weight in females than in males (Fig 4B) may represent a greater risk of cardiohypertrophy arising from obesity in the former compared with the latter [49]. Meanwhile, absent or less pronounced sex differences in glucose and total cholesterol (Fig 4) may suggest other sources of variation may contribute to sex differences in the symptomology of diseases associated with these measurements (e.g., [5052]). Characterizing sex differences in biological traits, as we have done here, can provide new perspectives on evolutionary, ecological, and medical patterns, possibly improving healthcare and environmental interventions.

Limitations

Despite the enormous potential of the effect size statistics we proposed here, they are not free of limitations. For instance, skewness and kurtosis (and therefore the difference in these estimates between two groups; i.e., Δsk and Δku, respectively) are more likely to become extreme with small sample sizes and with variables with few unique values, either because the variable is discrete or because it is naturally constant (e.g., number of vertebrae in mice). We thus recommend that researchers only compute Δsk and Δku for continuous variables with a minimum sample size of 50 for each group (as shown in our simulations). Importantly, we found that Δku variance estimates can be biased in many situations, highlighting that exploring Δku should be a priority for future work. Because of this issue, meta-analyzing Δku requires sample size-based weights instead of the standard sampling variance (see supplementary material and [41]). Lastly, although Δsk, Δku, and ΔZr can be calculated, respectively, from reported skewness, kurtosis, or within-group correlations for different samples, empirical studies rarely report these estimates. Therefore, calculating these effect sizes will probably require raw data, which, fortunately, are now becoming more readily available.

Future opportunities

The effect size statistics proposed in the present study can be useful across the life sciences, social sciences, and medicine. This is because skewness and kurtosis, and consequently differences between any two or more groups in these estimates (i.e., Δsk and Δku), may help researchers to understand epidemiological trends [53], genetic patterns relevant to medical diagnosis [20,21], disruptive selection on quantitative traits [54], body size patterns across individuals [55] and species [56], reproductive patterns [57], regime shifts in ecosystems [58], heritability [18], community assembly processes [16], and possibly many other topics. Meanwhile, comparisons regarding correlations have been used to explore memory processing during sleep [59], physiological patterns in patients with certain medical conditions [60], and selection patterns [2224], to name a few. Because ΔZr can be used in virtually any comparison between two groups of correlational data, the opportunities for its use are endless. Most importantly, Δsk, Δku, and ΔZr are unitless measures, so they can be meta-analyzed to uncover patterns between two groups (e.g., males and females). Moreover, the growing availability of raw data and big data approaches, facilitated by technological advances, makes these effect size statistics particularly valuable for modern research.

Supporting information

S1 File. An HTML file containing all steps to reproduce simulations and meta-analyses presented in our study.

https://doi.org/10.1371/journal.pbio.3003653.s001

(HTML)

S1 Fig. Bias in Δsk, Δku, and ΔZr effect estimates across simulations where samples ranged in group sample sizes between n ∈ {10, 20, …, 100, 150, 500}.

A total of 100 simulated scenarios were assessed for Δsk and Δku whereas 64 simulated scenarios were assessed for ΔZr. We ran 2,500 simulations for each scenario. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.s002

(TIF)

S2 Fig. Bias of analytical point estimators in relation to the absolute difference in skewness and kurtosis between groups.

(A) Skewness and (B) kurtosis. Color of points correspond to the sample size and each point is a single simulated scenario. The dotted line is the zero bias line. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.s003

(TIF)

S3 Fig. Bias for Δsk and Δku for simulated scenarios was not related to group means or variances being different.

We ran 2,500 simulations for each scenario. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.s004

(TIF)

S4 Fig. Relative bias in Δsk, Δku, and ΔZr effect estimates across simulations where samples ranged in group sample sizes between n ∈ {10, 20, …, 100, 150, 500}.

A total of 100 simulated scenarios were assessed for Δsk and Δku whereas 64 simulated scenarios were assessed for ΔZr. Note that for relative bias different combinations of point estimates and sampling variance estimates were used in their calculation as indicated in their titles which show the calculation. Notation is as follows ku and sk are the skewness and kurtosis calculated using original formulas. sk_sv and ku_sv are the sampling variance estimates using the original formulas. jack_skew_sv and jack_ku_sv are the sampling variance estimates for skewness and kurtosis using jackknife. jack_skew_bc and jack_ku_bc are the bias-corrected point estimates from the jackknife. We ran 2,500 simulations for each scenario.

https://doi.org/10.1371/journal.pbio.3003653.s005

(TIF)

S5 Fig. Coverage of 95% confidence intervals for Δsk, Δku, and ΔZr effect estimates across simulations where samples ranged in group sample sizes between n ∈ {10, 20, …, 100, 150, 500}.

A total of 100 simulated scenarios were assessed for Δsk and Δku whereas 64 simulated scenarios were assessed for ΔZr. We ran 2,500 simulations for each scenario. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.s006

(TIF)

S6 Fig. Example sampling distributions of three different scenarios (Δku = 0, 1, or 2.5) for n = 10 and n = 500 samples for each group.

We ran 2,500 simulations for each scenario. The data and code needed to generate this Figure can be found in https://zenodo.org/records/18386956.

https://doi.org/10.1371/journal.pbio.3003653.s007

(TIF)

Acknowledgments

We thank Yefeng Yang for his contribution in the early stage of this study.

Declaration of AI use: The authors declare that they occasionally used GPT-4-turbo (OpenAI) to improve the clarity and readability of this work. After using these tools, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.

References

  1. 1. Maklakov AA, Lummaa V. Evolution of sex differences in lifespan and aging: causes and constraints. Bioessays. 2013;35(8):717–24. pmid:23733656
  2. 2. Harrison LM, Noble DWA, Jennions MD. A meta-analysis of sex differences in animal personality: no evidence for the greater male variability hypothesis. Biol Rev Camb Philos Soc. 2022;97(2):679–707. pmid:34908228
  3. 3. Zucker I, Beery AK. Males still dominate animal studies. Nature. 2010;465(7299):690. pmid:20535186
  4. 4. Karp NA, Mason J, Beaudet AL, Benjamini Y, Bower L, Braun RE, et al. Prevalence of sexual dimorphism in mammalian phenotypic traits. Nat Commun. 2017;8:15475. pmid:28650954
  5. 5. Zucker I, Prendergast BJ, Beery AK. Pervasive neglect of sex differences in biomedical research. Cold Spring Harb Perspect Biol. 2022;14(4):a039156. pmid:34649925
  6. 6. Clayton JA, Collins FS. Policy: NIH to balance sex in cell and animal studies. Nature. 2014;509(7500):282–3. pmid:24834516
  7. 7. Woitowich NC, Beery A, Woodruff T. A 10-year follow-up study of sex inclusion in the biological sciences. Elife. 2020;9:e56344. pmid:32513386
  8. 8. Tannenbaum C, Ellis RP, Eyssel F, Zou J, Schiebinger L. Sex and gender analysis improves science and engineering. Nature. 2019;575(7781):137–46. pmid:31695204
  9. 9. Phillips B, Haschler TN, Karp NA. Statistical simulations show that scientists need not increase overall sample size by default when including both sexes in in vivo studies. PLoS Biol. 2023;21(6):e3002129. pmid:37289836
  10. 10. Drobniak SM, Lagisz M, Yang Y, Nakagawa S. Realism and robustness require increased sample size when studying both sexes. PLoS Biol. 2024;22(4):e3002456. pmid:38603525
  11. 11. Fairbairn DJ, Blanckenhorn WU, Székely T. Sex, size and gender roles. Oxford University PressOxford; 2007.
  12. 12. Zajitschek SR, Zajitschek F, Bonduriansky R, Brooks RC, Cornwell W, Falster DS, et al. Sexual dimorphism in trait variability and its eco-evolutionary and statistical implications. Elife. 2020;9:e63170. pmid:33198888
  13. 13. Wilson LAB, Zajitschek SRK, Lagisz M, Mason J, Haselimashhadi H, Nakagawa S. Sex differences in allometry for phenotypic traits in mice indicate that females are not scaled males. Nat Commun. 2022;13(1):7502. pmid:36509767
  14. 14. Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist L, Lagisz M, et al. Meta‐analysis of variation: ecological and evolutionary applications and beyond. Methods Ecol Evol. 2014;6(2):143–52.
  15. 15. McGuigan K, Van Homrigh A, Blows MW. Genetic analysis of female preference functions as function-valued traits. Am Nat. 2008;172(2):194–202. pmid:18597623
  16. 16. Cornwell WK, Ackerly DD. Community assembly and shifts in plant trait distributions across an environmental gradient in coastal California. Ecol Monogr. 2009;79(1):109–26.
  17. 17. Reid JM, Arcese P, Nietlisbach P, Wolak ME, Muff S, Dickel L, et al. Immigration counter-acts local micro-evolution of a major fitness component: migration-selection balance in free-living song sparrows. Evol Lett. 2021;5(1):48–60. pmid:33552535
  18. 18. Pick JL, Lemon HE, Thomson CE, Hadfield JD. Decomposing phenotypic skew and its effects on the predicted response to strong selection. Nat Ecol Evol. 2022;6(6):774–85. pmid:35422480
  19. 19. Stemkovski M, Dickson RG, Griffin SR, Inouye BD, Inouye DW, Pardee GL, et al. Skewness in bee and flower phenological distributions. Ecology. 2023;104(1):e3890. pmid:36208124
  20. 20. Church BV, Williams HT, Mar JC. Investigating skewness to understand gene expression heterogeneity in large patient cohorts. BMC Bioinformatics. 2019;20(Suppl 24):668. pmid:31861976
  21. 21. Kulminski AM, Philipp I, Loika Y, He L, Culminskaya I. Haplotype architecture of the Alzheimer’s risk in the APOE region via co-skewness. Alzheimers Dement (Amst). 2020;12(1):e12129. pmid:33204816
  22. 22. Rausher MD. The measurement of selection on quantitative traits: biases due to environmental covariances between traits and fitness. Evolution. 1992;46(3):616–26. pmid:28568666
  23. 23. Blows MW. Complexity for complexity’s sake?. J Evol Biol. 2006;20(1):39–44.
  24. 24. Hansen TF, Houle D. Measuring and comparing evolvability and constraint in multivariate characters. J Evol Biol. 2008;21(5):1201–19. pmid:18662244
  25. 25. Noble DWA, Radersma R, Uller T. Plastic responses to novel environments are biased towards phenotype dimensions with high additive genetic variation. Proc Natl Acad Sci U S A. 2019;116(27):13452–61. pmid:31217289
  26. 26. Rigby RA, Stasinopoulos DM. Generalized additive models for location, scale and shape. J R Stat Soc Ser C Appl Stat. 2005;54(3):507–54.
  27. 27. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) inR. J Stat Soft. 2007;23(7).
  28. 28. Umlauf N, Klein N, Zeileis A. BAMLSS: Bayesian additive models for location, scale, and shape (and beyond). J Comput Graph Stat. 2018;27(3):612–27.
  29. 29. Malgady RG. How skewed are psychological data? A standardized index of effect size. J Gen Psychol. 2007;134(3):355–9. pmid:17824403
  30. 30. Riley RD, Tierney JF, Stewart LA. Individual participant data meta-analysis: a handbook for healthcare research. Hoboken, NJ: John Wiley & Sons; 2021.
  31. 31. Tierney JF, Stewart LA, Clarke M. Individual participant data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane handbook for systematic reviews of interventions. Hoboken, NJ: John Wiley & Sons; 2024. p. 643–58.
  32. 32. Joanes DN, Gill CA. Comparing measures of sample skewness and kurtosis. J R Stat Soc D. 1998;47(1):183–9.
  33. 33. Efron B. The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics; 1982.
  34. 34. Pollo P, Lagisz M, Macedo-Rego RC, Mizuno A, Yang Y, Nakagawa S. Synthesis of nature’s extravaganza: an augmented meta-meta-analysis on (putative) sexual signals. Ecol Lett. 2025;28(9):e70215. pmid:40955568
  35. 35. Machado G, Macedo-Rego RC. Benefits and costs of female and male care in amphibians: a meta-analytical approach. Proc Biol Sci. 2023;290(2010):20231759. pmid:37935362
  36. 36. Statistical methods for meta-analysis. Elsevier. 1985.
  37. 37. Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
  38. 38. Becker M, Klößner S. PearsonDS: Pearson distribution system. 2025. Available from: https://cran.r-project.org/package=PearsonDS
  39. 39. Venables WN, Ripley BD. Modern applied statistics with S. Springer New York; 2002.
  40. 40. Lajeunesse MJ. Bias and correction for the log response ratio in ecological meta-analysis. Ecology. 2015;96(8):2056–63. pmid:26405731
  41. 41. Nakagawa S, Noble DWA, Lagisz M, Spake R, Viechtbauer W, Senior AM. A robust and readily implementable method for the meta-analysis of response ratios with and without missing standard deviations. Ecol Lett. 2023;26(2):232–44. pmid:36573275
  42. 42. Wright DB, Herrington JA. Problematic standard errors and confidence intervals for skewness and kurtosis. Behav Res Methods. 2011;43(1):8–17. pmid:21298573
  43. 43. Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK, et al. High-throughput discovery of novel developmental phenotypes. Nature. 2016;537(7621):508–14. pmid:27626380
  44. 44. Hedges LV, Gurevitch J, Curtis PS. The meta-analysis of response ratios in experimental ecology. Ecology. 1999;80(4):1150–6.
  45. 45. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2025. Available from: https://www.r-project.org/
  46. 46. Nakagawa S, Lagisz M, O’Dea RE, Pottier P, Rutkowska J, Senior AM, et al. orchaRd 2.0: an R package for visualising meta‐analyses with orchard plots. Methods Ecol Evol. 2023;14(8):2003–10.
  47. 47. Viechtbauer W. Conducting meta-analyses in R with the meta for Package. J Stat Soft. 2010;36(3).
  48. 48. Yazdanpanah MH, Bahramali E, Naghizadeh MM, Farjam M, Mobasheri M, Dadvand S. Different body parts’ fat mass and corrected QT interval on the electrocardiogram: the Fasa PERSIAN Cohort Study. BMC Cardiovasc Disord. 2021;21(1):277. pmid:34090333
  49. 49. Cuspidi C, Rescaldani M, Sala C, Grassi G. Left-ventricular hypertrophy and obesity: a systematic review and meta-analysis of echocardiographic studies. J Hypertens. 2014;32(1):16–25. pmid:24309485
  50. 50. Regitz‐Zagrosek V. Sex and gender differences in health. EMBO Reports. 2012;13(7):596–603.
  51. 51. Regitz-Zagrosek V, Gebhard C. Gender medicine: effects of sex and gender on cardiovascular disease manifestation and outcomes. Nat Rev Cardiol. 2023;20(4):236–47. pmid:36316574
  52. 52. Kautzky-Willer A, Leutner M, Harreiter J. Sex differences in type 2 diabetes. Diabetologia. 2023;66(6):986–1002. pmid:36897358
  53. 53. Guharay S. A data-driven approach to study temporal characteristics of COVID-19 infection and death Time Series for twelve countries across six continents. BMC Med Res Methodol. 2025;25(1):1. pmid:39754044
  54. 54. Débarre F, Yeaman S, Guillaume F. Evolution of quantitative traits under a migration-selection balance: when does skew matter?. Am Nat. 2015;186 Suppl 1:S37-47. pmid:26656215
  55. 55. Poulin R, Morand S. Parasite body size distributions: interpreting patterns of skewness. Int J Parasitol. 1997;27(8):959–64. pmid:9292313
  56. 56. Kozłowski J, Gawelczyk AdamT. Why are species’ body size distributions usually skewed to the right?. Funct Ecol. 2002;16(4):419–32.
  57. 57. Olivier LA, Higginson AD. Tests of reproductive skew theory: a review and prospectus. Evol Ecol. 2023;37(6):871–92.
  58. 58. Guttal V, Jayaprakash C. Changing skewness: an early warning signal of regime shifts in ecosystems. Ecol Lett. 2008;11(5):450–60. pmid:18279354
  59. 59. Verma K, Pandey K, Kashyap N. Relation between sleep spindles and semantically induced false memory. Sleep Breath. 2024;29(1):26. pmid:39612034
  60. 60. Fedulovs A, Janevica J, Kruzmane L, Sokolovska J. Glucose control and variability assessed by continuous glucose monitoring in patients with type 1 diabetes and diabetic kidney disease. Biomed Rep. 2024;22(2):23. pmid:39720301