Estimating group differences on latent variables is clearly preferable to relying on observed scores, but this methodology depends on the assumption of measurement invariance, i.e., the assumption that the construct being measured is actually the same in both groups [56]–[58]. Booth and Irwing [54] found that between-sex invariance was violated for the five global scales of the 16PF (analogous to the Big Five), but satisfied for the 15 primary factors of personality. There is evidence that the same may apply to FFM inventories [59]. Measurement invariance is thus another reason to measure sex differences at the level of narrow traits, instead of focusing on broad traits like the Big Five.
http://plosone.org/article/info:doi/10.1371/journal.pone.0029265#article1.body1.sec1.sec4.sec2.p3
Again, it seems that if you base your models on the assumption that the best measure is the one with the most difference, it is no surprise when your models turn up the most variance.