Reader Comments

Post a new comment on this article

Authors' Reply

Posted by marco_dg on 07 Jan 2012 at 14:40 GMT

First of all, we wish to thank Janet Hyde and Richard Lippa for taking the time to comment on our paper. Here we briefly respond to Hyde’s critical remarks.

A. We are not advocating a one-dimensional model of personality. It is a mistake to conflate the dimensionality of the personality model with the fact that D is a single number. The weakness of this criticism can be appreciated by directing it to the measurement of physical distances: if one were to say that the distance between the Earth and the Sun is ~150 millions Km, would this amount to advocating a one-dimensional model of the universe?

B. Hyde wrote, “A point that is not mentioned in the Del Giudice article is that this dimension is the first discriminant function. Aside from the fact that the linear combination introduces bias by maximizing differences, the resulting dimension here is uninterpretable. What does it mean to say that there are large gender differences on this undefined dimension in 15-dimensional space created from latent variables? The authors call it global personality, but what does that mean?” This paragraph contains a number of inaccuracies. First of all, we did mention that D “represents the standardized difference between two groups along the discriminant axis” (p. 3). Second, we never talked about “global personality” (an admittedly vague concept), but rather about “global sex differences” (operationalized with D). Third, neither the D statistic nor the discriminant axis are uninterpretable. The discriminant function can be interpreted as a blend of different traits, but that is not the main point of using D. The most informative way to interpret D is by converting it to an overlap coefficient, as we did in the paper. The same logic (i.e., supporting the interpretation of Cohen’s d with overlap coefficients) was followed by Hyde in her 2005 paper [1]; it is unclear to us why distribution overlap should be a valid criterion for univariate comparisons, but not for multivariate ones. Fourth, it is true that estimates of D are affected by sampling error (which may bias the results as more variables are included), but we were careful to compute and report confidence intervals for all the D’s in the paper. Unsurprisingly given the large sample size, they turned out to be quite narrow (e.g., 2.66-2.76).

C. Hyde’s assertion that “an assumption of multivariate normality is crucial to Mahalanobis D if it is to be accurate” is potentially misleading. On the one hand, we agree that multivariate normality must be assumed in order to accurately convert D to an overlap coefficient. In our analysis using parceled data, the absolute values of the largest skewness and kurtosis statistics were 1.009 and 1.238 respectively, indicating no substantial departure from normality. On the other hand, one may ask whether the univariate estimates and correlations used as input to calculate D have been generated by appropriate methods. In our analysis, explained in detail in a previous paper [2], we used robust maximum likelihood in order to estimate the correlation matrix and the vector of univariate standardized mean differences required to estimate D. Given the low values of skewness and kurtosis and the large sample size, our parameter estimates using robust maximum likelihood can be regarded as highly accurate [3,4].

D. The validity of self-reports is, of course, a long-standing issue in personality research. We acknowledge that self-reports have limitations, and in the paper we call for further research employing multiple assessment methods. In the paper, we also cite research on sex differences in aggression [5] showing that effect sizes on a sex-typed trait can be similar regardless of the assessment method; but the real problem is, does a method exist that is immune to the effects of stereotyping? We worry that the same criticism could easily be raised against other-reports, as well as against behavioral observations (unless one is willing to assume that stereotypes do not influence behavior). Unfortunately, this will make it impossible to satisfy the critics. Finally, Hyde wrote: “the two personality factors that show the largest univariate gender differences in the Del Giudice study are Sensitivity (d = -2.29) and Warmth (d = -.89). It is no accident that warmth and sensitivity are highly female stereotyped traits.” We agree, but of course this is a chicken-and-egg problem; do these traits show large differences because they are stereotyped, or are they stereotyped because they reliably differ between the sexes? Cross-cultural studies may help resolve this issue, and in the paper we noted that cross-cultural evidence supports the existence of robust sex differences on a personality dimension that closely matches the Sensitivity factor [6]. Whatever the ultimate reason for the large sex differences in sensitivity and warmth, the fact that these traits are regarded as stereotypically feminine does not, by itself, invalidate our findings.

E. Hyde cites two studies in which small sex differences were found [7,8], in support of her argument that sex differences in self-reports are inflated by stereotypical responding. One study is a meta-analysis of children aged 3 months to 13 years [7]; the other is a study of children aged 4 to 10 years [8]. Clearly, some sex differences can be expected to increase with age – especially after puberty. Findings of smaller effect sizes in young children do not invalidate findings of larger effect sizes in adults. Furthermore, none of these studies attempted to estimate effect sizes on latent variables, and the amount of error contained in the measures is unknown. If unreliable or otherwise “noisy” measures are employed, the apparent size of sex differences can be substantially attenuated. This is why, in our paper, we stressed that methods for correcting measurement error should be employed whenever possible.

F. Finally, Hyde returns to her own review of meta-analyses of sex differences [1]. While we acknowledge the very valuable contribution that meta-analyses have made to our understanding of sex differences, we must note that meta-analytic studies often suffer from the same methodological limitations we highlighted in our paper. First, many meta-analyses are not psychometric, that is, they do not correct for artifacts such as measurement error or restriction of range. For this reason, they often provide systematic underestimates of effect sizes. Second, meta-analytic results are dependent on the level of analysis. For example, Hyde and Linn [9] report an overall female advantage on verbal ability of 0.11. However, there is evidence that large sex differences may exist in the specific components of verbal ability. For example, Lynn and Irwing [10] report an overall male advantage on general knowledge of 0.51. In contrast, women evidence advantages on reading (-0.18 to -0.30) and writing (-0.53 to -0.61) [11]. Thus, sex differences at the component level may be averaged out in the broader construct of verbal ability, much as we have argued for personality. Finally, most meta-analyses are conducted on observed scores, and the resulting estimates of effect sizes are therefore distorted by various sources of error.

These points aside, we hope that readers will not miss the major point of our paper, which is eloquently discussed by Lippa in his comment. Our argument is not about how many large or small sex differences there are, it is rather about the methodology for measuring them as accurately as possible. It is also worth stressing that we do not wish to replace univariate effect sizes with their multivariate counterparts; they answer different kinds of questions, and indeed, in our analysis we reported and discussed sex differences on individual factors before turning to multivariate effects. Exactly what the most appropriate methods to measure sex differences are is an important topic of scientific debate. The main goal of our article was to add to this debate. We encourage other researchers to replicate our study with different samples and measures, and of course we look forward to future refinements and extensions of our methodological proposal.


1. Hyde JS (2005) The gender similarities hypothesis. Amer Psychologist 60: 581-592.
2. Booth T, Irwing P (2011) Sex differences in the 16PF5, test of measurement invariance and mean differences in the US standardisation sample. Pers Indiv Diff 50: 553–558.
3. Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In Cudeck, R., Du Toit, S. and Sörbom, D. (Eds), Structural Equation Modeling: Present and future. Lincolnwood, IL: Scientific Software International.
4. Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models. Biometrica, 74, 375-384.
5. Archer J (2009) Does sexual selection explain human sex differences in aggression? Behav Brain Sci 32: 249–266.
6. Costa PTJ, Terracciano A, McCrae RR (2001) Gender differences in personality traits across cultures: Robust and surprising findings. J Pers Soc Psychol 81: 322–331.
7. Else-Quest NM, Hyde JS, Goldsmith, HH, Van Hulle, CA (2006) Gender differences in temperament: A meta-analysis. Psych Bull 132: 33-72.
8. McCarthy AM Kleiber C, Hanrahan K, Zimmerman MB, Westhus N, Allen S (2010) Factors explaining children’s responses to intravenous needle insertions. Nurs Res 59: 407-416.
9. Hyde JS, Linn MC (1988) Gender differences in verbal ability: A meta-analysis. Psychol Bull 104: 53-69.
10. Lynn R, Irwing P, & Cammock, T. (2001) Sex differences in general knowledge. Intelligence 30: 27-39.
11. Hedges LV, Nowell A (1995) Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science 269: 41-45.

No competing interests declared.