Psychometric Evaluation of Chinese-Language 44-Item and 10-Item Big Five Personality Inventories, Including Correlations with Chronotype, Mindfulness and Mind Wandering

The 44-item and 10-item Big Five Inventory (BFI) personality scales are widely used, but there is a lack of psychometric data for Chinese versions. Eight surveys (total N = 2,496, aged 18–82), assessed a Chinese-language BFI-44 and/or an independently translated Chinese-language BFI-10. Most BFI-44 items loaded strongly or predominantly on the expected dimension, and values of Cronbach's alpha ranged .698-.807. Test-retest coefficients ranged .694-.770 (BFI-44), and .515-.873 (BFI-10). The BFI-44 and BFI-10 showed good convergent and discriminant correlations, and expected associations with gender (females higher for agreeableness and neuroticism), and age (older age associated with more conscientiousness and agreeableness, and also less neuroticism and openness). Additionally, predicted correlations were found with chronotype (morningness positive with conscientiousness), mindfulness (negative with neuroticism, positive with conscientiousness), and mind wandering/daydreaming frequency (negative with conscientiousness, positive with neuroticism). Exploratory analysis found that the Self-discipline facet of conscientiousness positively correlated with morningness and mindfulness, and negatively correlated with mind wandering/daydreaming frequency. Furthermore, Self-discipline was found to be a mediator in the relationships between chronotype and mindfulness, and chronotype and mind wandering/daydreaming frequency. Overall, the results support the utility of the BFI-44 and BFI-10 for Chinese-language big five personality research.


Introduction
Over the last few decades many questionnaire-based and lexical studies have led to international personality research being dominated by the Big Five model of personality, involving The current psychometric evaluation included assessment of the construct validity of the BFI-44 and BFI-10 scales, through a nomological network of predicted external correlates [16]. The establishment of cross-cultural comparability of personality measures can also be facilitated when ". . . functional equivalence can be demonstrated by showing the trait scales relate to external variables in similar ways." [11, p.175]. The current research assessed associations with age, gender, chronotype (morningness-eveningness preference), mindfulness (". . .the state of being attentive to and aware of what is taking place in the present" [25, p.822]), and mind wandering/daydreaming (distraction of attention away from the external environment, and any ongoing task, towards internally focused mentation [26,27]).
There have been some mixed findings regarding age and gender differences in big five personality dimensions, which may be partly related to factors such as study design, measures (questionnaires) used, cohort effects, etc [28]. However, for age differences, previous longitudinal and cross-sectional research has typically found that (between early adulthood and middle age) agreeableness and conscientiousness are positively associated with increasing age, while neuroticism and openness are negatively associated, and extraversion is relatively stable [6,13,28,29,30]. For gender differences, the most consistently reported findings have been that females score higher for agreeableness and neuroticism [28]. For example, with samples from 55 nations (N = 17,637), Schmitt et al. [31] found that females scored significantly higher than males: in 49/55 nations for neuroticism (men did not score significantly higher in any); in 34/ 55 nations for agreeableness (males significantly higher in 1/55); in 25/55 nations for extraversion (males significantly higher in 2/55); in 23/55 nations for conscientiousness (males significantly higher in 2/55); and in 4/55 nations for openness (males significantly higher in 8/55).
Chronotype (morningness-eveningness preference), was also included as an external correlate in the nomological network, as previous research indicates that a correlation with conscientiousness can be expected. Tsaousis' [32] meta-analysis of 31 studies found that conscientiousness is the strongest personality correlate of chronotype (positive with morningness; r = .29); agreeableness ranked second (r = .13), with weak correlations for openness (-.09), neuroticism (-.07), and extraversion (-.06). Additionally, Hogben et al. [33] found that conscientiousness is the strongest predictor of chronotype after controlling for variables including age, gender and sleep disorders. Previous research also indicates the correlations that can be expected between personality and mindfulness. Giluk's [34] meta-analysis found that mindfulness has a strong negative correlation with neuroticism (r = -. 45), moderate positive correlations with conscientiousness (.32) and agreeableness (.22), and weaker positive correlations with openness (. 15) and extraversion (.12).
Although there has been little research on personality correlates of mind wandering or daydreaming since the development of the big five personality model, these variables were also included in the nomological network, because some predictions are suggested by the patterns of inter-correlation with chronotype and mindfulness. Mind wandering/daydreaming frequency negatively correlates with mindfulness [35,36,37], while morningness positively correlates with mindfulness [38], but negatively correlates with mind wandering/daydreaming frequency [37,39]. Consistently, Jackson and Balota [40] found that mind wandering frequency is negatively correlated with conscientiousness (while, as noted above, conscientiousness positively correlates with morningness). Also, morningness is associated with positive affect [41], as is mindfulness [34,38], while mind wandering/daydreaming frequency are associated with negative affect [39,42,43,44], suggesting a possible positive correlation with neuroticism.
Finally, as the BFI-44 questionnaire allows for assessment of two facets for each big five dimension [5], a third aim of the current research was to undertake an exploratory study of how these facets are associated with chronotype, mindfulness, mind wandering, and daydreaming frequency. each survey was varied, so that the order of presentation was counter-balanced to some degree. For example, for a survey involving the Daydreaming Frequency (DF) scale, 44-item Big Five Inventory (BFI- 44), and the reduced Morningness-Eveningness Questionnaire (rMEQ), some participants completed the scales in the order DF/BFI-44/rMEQ, while others completed them in the order rMEQ/BFI-44/DF. Sample details are shown in Table 1. Participants were mostly full-time or part-time students (of various subjects) from several Beijing universities/institutes, who completed the surveys during class breaks. A few community residents were also sampled. Participation was voluntary and unpaid. Participants either completed the questionnaires anonymously, or gave written informed consent. Ethical approval was obtained from the Internal Review Board of the Institute of Psychology, Chinese Academy of Sciences.
For Study 1a the BFI-44, BFI-10 and rMEQ were completed. The BFI-10 was added to the end of the BFI-44, producing a 54-item scale. A sub-group (n = 91) completed a retest 5-6 weeks after the first survey. Study 1b involved the DF scale, BFI-44, and rMEQ; Study 1c involved the MW scale, BFI-44, and rMEQ; Study 1d involved the MLO scale, BFI-44, and rMEQ.
For Study 2a each participant completed one of six separate surveys, each involving the BFI-10 and rMEQ, plus different combinations of questionnaires in each survey: for DF, n = 363; MW, n = 198; MLO, n = 193; CES, n = 265. These were part of a larger series of surveys, with a total of N = 1852, reported in Carciofo et al. [37]. Study 2b involved the sample reported in Carciofo et al. [39,Study 1]. The BFI-10, DF scale and rMEQ were completed. A sub-group (n = 79) provided test-retest data for the BFI-10, approximately 5 weeks after the first survey. Study 2c involved a sub-sample of that reported in Carciofo et al. [39,Study 2]. The BFI-10, MW scale and rMEQ were completed. A sub-group (n = 91) provided test-retest data for the BFI-10, approximately 5 weeks after the first survey. Study 2d involved a sub-sample of that reported in Carciofo et al. [39,Study 3]. The BFI-10, DF scale, MW scale, and rMEQ were completed (the BFI-10 and MW scale were administered seven days after the DF scale and rMEQ). A sub-group (n = 174) also completed the MLO scale 4-6 weeks after completing the BFI-10.

Statistical analysis
There has been discussion about the optimal approach to the analysis of scale structure in personality research (e.g., [12,28,52,53]). Confirmatory factor analysis (CFA) has become widely used (e.g., [15]), but various approaches to exploratory factor analysis (EFA) also continue to be employed (see, e.g., [18,24]), and it has been shown that EFA and multi-trait, multi-method analysis may support the five-factor model, while CFA (with its more stringent assumptions) may not [28,52]. Also, although there are issues with the use of principal components analysis (PCA) in conjunction with varimax rotation [54,55], the big five structure has been replicable using this approach, although large sample sizes may be required [12]. Although neither CFA nor EFA may be appropriate/effective for short scales such as the BFI-10 [18], both approaches have been used as part of psychometric evaluation (e.g., [10,19,22]). Following other studies (e.g., [9,12,14,22]) we undertook PCA with varimax rotation for the BFI-44 and the BFI-10. We also note the results of CFA for the BFI-44, to allow for comparisons with other CFA studies, including that of Leung et al. [15].
Descriptive statistics include the mean, standard deviation and Cronbach's coefficient alpha (measure of internal consistency) for each BFI-44 and BFI-10 dimension, in each study. Coefficient alpha is influenced by the number of items in a scale [16,56], so the BFI-10, with only two items to cover each personality dimension, is likely to have relatively poor values of alpha. Although alpha coefficients may be difficult to interpret for such short scales [17], we include them for comparison with other studies. Test-retest was also undertaken (in studies 1a, 2b, and 2c), which may be more appropriate than coefficient alpha for assessing the reliability of short scales [57]. Correlations were assessed with Pearson product-moment coefficients. Absolute mean average coefficients were calculated using Fisher's r to Z transformation. Linear regression was used to assess the associations of age and gender with each big five domain. Reported p-values are for two-tailed tests.  Table 2). Leung et al. [15] also found that many of their poor-loading items were reverse-scored, and their CFA analysis led them to exclude 15 items, leaving 29 items which showed acceptable/ good fit to the five-factor model.  For Study 2b (N = 211) five components were extracted; e1 (.920) and e6 (.790) loaded on C1; a2 (.805) and a7 (.663) loaded on C2; n4 (.811) and n9 (.766) loaded on C3; o5 (.901) and o10 (.564) loaded on C5.
For Study 2d (N = 268) five components were extracted, with all items loading on the relevant big five domain, with minimal cross-loadings (Table 3).
In summary, although the expected BFI-10 structure was clearly shown in one study, it was not consistently revealed across samples, or in the combined data. The importance of this is considered further in the Discussion.

Convergent and discriminant validity
Inter-correlations between big five personality dimensions are shown in Table 4 for the BFI-44, BFI-10, and also the BFI-10X (the alternative version of the BFI-10, extracted from the relevant items of the BFI-44). Extraversion, agreeableness, conscientiousness, and openness all had positive inter-correlations, and all were negatively correlated with neuroticism. Inter-correlations between the BFI-44 dimensions (Table 4, top left) had a mean of .329 (all < .5), the largest being negative correlations between neuroticism and extraversion, agreeableness and conscientiousness, and a positive correlation between extraversion and openness. The same general pattern was found for the BFI-10X (Table 4, centre), for which all inter-correlations were < .4 (mean = .184), and the BFI-10 ( Table 4, lower right), for which all inter-correlations were < .3 (mean = .100).
Test-retest coefficients for each BFI-10 item, and the alpha coefficients for only those participants who completed both test (T1) and retest (T2) sessions, are given in Table 7. Test-retest correlations for each BFI-10 item showed generally consistent coefficients across the three studies, although the values for agreeableness (and also openness and neuroticism) were somewhat lower in Study 1a. The alpha values for T1 and T2 sessions were generally similar within each study, suggesting that the variation between studies was related to sample differences. However, there were some larger variations between T1 and T2 values, in particular for agreeableness in Study 2c (and to a lesser extent in Study 1a), which may suggest situational influences on item responses.  Table 8, for both the BFI-44 and the BFI-10, there were trends (although not always reaching significance), for more extraversion, agreeableness, neuroticism and openness in females, and for older age to be associated with more agreeableness and conscientiousness, but less neuroticism and openness.

Nomological network
Across all of the studies, the BFI-44 and BFI-10 showed similar, generally consistent correlations with chronotype (morningness-eveningness preference), as assessed with the rMEQ (Table 9). In particular morningness (higher rMEQ scores) showed a consistent positive correlation with conscientiousness (rs ranging .127 to .331; significant in all studies except Study 1c). Morningness also showed significant positive correlations with agreeableness in three studies, while correlations with extraversion, neuroticism and openness were variable though generally weak. Overall, these results are consistent with the findings of Tsaousis' [32] metaanalysis, which showed conscientiousness to be the strongest personality correlate of chronotype, with agreeableness ranking second.
Mindfulness (the MLO scale) had significant negative correlations with neuroticism in all studies, and significant positive correlations with extraversion and conscientiousness in 2/3 studies (Table 10). Also, BFI-44 Study 1d showed significant positive correlations with agreeableness and openness, which did not reach significance in the BFI-10 studies. Overall, the observed coefficients show reasonably close correspondence with those from Giluk's [34] meta-analysis: all correlations were positive, except the (stronger) negative correlations with neuroticism.
The Attention-related Cognitive Errors Scale (CES) showed significant negative correlations with extraversion and conscientiousness, and a significant positive correlation with neuroticism (Table 10). The CES correlates strongly with the Cognitive Failures Questionnaire [60,61] which also positively correlates with neuroticism [62]. .203** .127 (164) .194** .149* .204*** .218** .331*** .214*** Both mind wandering (the MW scale) and daydreaming frequency (the DF scale) had significant negative correlations with conscientiousness in all studies (Table 10; cf. [40]). Also, they both had significant positive correlations with neuroticism (except DF in Study 2a). Coefficients with the BFI-44 were generally stronger than with the BFI-10. Furthermore, in BFI-44 studies 1b and 1c, DF and MW had significant negative correlations with extraversion, which were not significant with the BFI-10 (except for DF in Study 2a). Three studies (one with the BFI-10) also showed significant negative correlations with agreeableness. However, DF was significantly positively correlated with openness in two of the BFI-10 studies; MW did not show any significant correlations with openness.

Exploratory study of associations with Big Five facets
Further, exploratory correlational analysis (Table 11) was done with the big five facets (two for each domain) that can be assessed with 35 of the BFI-44 items (for details, see [5]). Across the four BFI-44 studies, chronotype only showed consistent significant correlations (all rs > .1) with Self-discipline (example items: "Perseveres until the task is finished" / "Is easily distracted", reverse-scored); more Self-discipline was associated with more morningness. Self-discipline and the other conscientiousness facet of Order (items: "Tends to be disorganised", reversescored / "Can be somewhat careless", reverse-scored) were also consistently correlated with .021 .375*** .128 .088 (174) .365*** .121 .252** (174) .32 N .389*** .073 .255*** .208** .435*** .274*** .224** .222*** .199** mind wandering, daydreaming frequency and mindfulness, as were several other facets, in particular the neuroticism facets of anxiety and depression. Previous research [39] found that sleep quality and positive affect were mediators in the associations between chronotype (rMEQ; predictor) and mind wandering (MW), and chronotype (rMEQ; predictor) and daydreaming frequency (DF). As Self-discipline was found to consistently correlate with rMEQ, MW, DF, and MLO, it was tested whether Self-discipline might also be a mediator in these associations (with age and gender as control variables). Following the procedures suggested by Preacher and Hayes [63,64]: path a represents the effect of the predictor (independent variable/IV) on the mediator; path b represents the mediator's direct effect on the criterion (dependent variable/DV), while controlling for the IV; path c represents the total effect of the IV on the DV, when the proposed mediator is excluded (this does not need to be significant for the existence of a mediation effect); path c' represents the IV's direct effect on the DV (controlling for the mediator); and, path ab represents the indirect effect of the IV on the DV, through the mediator. This indirect effect was tested with a non-parametric bootstrapping procedure [63,64], in which 5000 re-samples from the data established 95% confidence intervals, with the exclusion of zero indicating a significant mediation effect.
As shown in Table 12, Self-discipline was a significant mediator in the associations between chronotype and: daydreaming frequency (Model 1), mind wandering (Model 2), and mindfulness (Model 3). Chronotype retained a significant direct effect for mindfulness, and a marginally significant direct effect for daydreaming frequency, suggesting partial mediation by Selfdiscipline in these models. Chronotype did not have significant total or direct effects for mind wandering.  [12,13] for the NEO-PI-R, and [14] for the IPIP), the reasons and implications need consideration. It might be that translations of some items could be revised. In addition, many of the poorer-loading items were reverse-scored, which might produce some culturally specific effects in responding [15,28]. Also, there could be some cultural differences in the meaning, interpretation or relevance of some items. In this regard, it is noteworthy that three of the openness items (30, 41, and 44) loaded strongly on a sixth component. Throughout the development of the big five model there has been particular disagreement regarding the definition of the openness/intellect domain, and cross-cultural studies, including studies in China, have shown less consistency in producing a clear openness/intellect factor [3,11,65]. Use of a translated BFI-44 constitutes an imposed etic, in which a personality structure from another culture is imposed, rather than seeking to discover an indigenous structure [9,11]. However, differences found in cross-cultural research may be due to cultural differences, translation-related Table 12. Self-discipline as a mediator between chronotype and daydreaming frequency, mind wandering, and mindfulness. differences in the instrument, and/or sample differences. Further cross-cultural research involving bi-lingual participants would help to illuminate translation issues and culture differences [9]. Cross-language generality may also be underestimated due to mistranslations that remain undetected in research using monolingual samples [3]. Although additional factors (beyond the big five), indicated in some cross-cultural studies, have not always been consistently replicated [2], some evidence suggests that including a sixth factor (of Honesty/Propriety) may improve validity for some criterion measures and have more cross-cultural replicability [8]. In Chinese research, Cheung et al. [65] supported a sixfactor model, while Zhou et al.'s [66] lexical study suggested a seven-factor emic structure. Further research is needed on these proposed personality structures, and before firm conclusions are reached about personality structure within a group, repeated testing on large samples may be prudent. Methods of exploratory and confirmatory factor analysis are more effective with large samples [55]; Costello and Osborne [54] found that even with a subject/item ratio of 20:1, 30% of solutions in simulated studies failed to produce a known population factor structure. Therefore, further testing of the current, and/or any revised versions of the BFI-44, should be undertaken, with large, comparable samples (including a more balanced gender ratio and specific age range coverage), if a stable, replicable structure is to be established. For example, the 29-item BFI (in traditional Chinese characters) developed by Leung et al. [15] from a sample of 480 smokers and ex-smokers (mean age = 40.6; 83.8% male) was not strongly supported in the current research involving part-time and full-time students (N = 798; mean age = 27.11; 38.6% male). Only one index of model fit (RMSEA) was within an acceptable range for the 29-item BFI, but this was also the case for the full BFI-44.
Some previous studies have reported clear scale structures for the BFI-10 and Ten-Item Personality Inventory (TIPI; e.g., [10,22]). Likewise, in Study 2d of the current research, PCA of the BFI-10 clearly showed the expected big five structure, with each item loading strongly on its corresponding component. However, for the other BFI-10 studies (and the combined data), only the extraversion and openness items clearly loaded on the same component in each study, while the items for the other big five dimensions showed less consistency. Variation in demographic characteristics between samples might have had some influence (Study 2d had the most homogeneous sample in terms of age, all being 18-21), which could be investigated in further research. However, while it might be that some of the BFI-10 item translations could be improved, the significance of the scale structure of the BFI-10, when compared with other evidence for its validity, is not clear; as Gosling [57] states: ". . . Criteria like alpha and clean factor structures are only meaningful to the extent they reflect improved validity. In cases like the TIPI (using a few items to measure broad domains), they don't." In summary, PCA of the BFI-44 and BFI-10 indicated the big five structure, although not as clearly or consistently as would be preferred. This might be improved by revisions to some items, but further testing with large samples is needed for a stable, replicable structure to be established.

Convergent and discriminant validity
Inter-correlations between BFI-44 dimensions were generally low, the largest being between .4 and .5, with a mean of .329. These values are somewhat higher, though comparable to the mean of around .2, and high of around .3, reported in other studies [2,9,10]. All inter-correlations were < .3 for the BFI-10, with a mean of .100, which is very close to the mean of .11 reported by Rammstedt and John [10]. All inter-correlations were < .4, with a mean of .184 for the BFI-10X (extracted from the BFI-44). Extraversion, agreeableness, conscientiousness, and openness were all positively inter-correlated, and each negatively correlated with neuroticism (cf. [9,10,15]). The strongest inter-correlations were a positive correlation between extraversion and openness, and negative correlations between neuroticism and extraversion, agreeableness and conscientiousness (cf. [9,19]).
There were clear convergent and discriminant correlations between the BFI-44 and BFI-10, and also between the BFI-10 and BFI-10X, where, in all cases, each dimension on each scale correlated most strongly with the corresponding dimension on the other scale. Nearly all convergent correlations were > .7, and discriminant correlations < .4 (most being < .3). As the BFI-44 and BFI-10 in the current studies were independent translations, the 10-item scale extracted from the BFI-44 (the BFI-10X) is also available as an alternative version, which allows for use of equivalent forms. However, the Chinese BFI-44 and BFI-10 were both translated from the same original English-language scale, so further research should include comparisons with other measures of the big five personality dimensions, such as the NEO-PI-R, NEO-FFI, or the IPIP.

Mean rankings, age and gender
The mean rankings of the big five dimensions were very consistent across the four BFI-44 studies, with highest to lowest generally being agreeableness (A), openness (O), conscientiousness (C), extraversion (E), and neuroticism (N). Highest to lowest rankings of A/O, C, E, N were reported by Benet-Martinez and John [9] for both English and Spanish versions of the BFI-44, and by Zheng et al. [14] for Chinese-language IPIP scales with their sample of heterosexual and homosexual Chinese, while Leung et al. [15], with their reduced, 29-item BFI, found highest to lowest of C, A, E, O, N, in their sample of Chinese smokers and ex-smokers. The mean rankings for the BFI-10 were generally consistent across studies, and similar to the rankings for the BFI-44, with highest to lowest generally being O, A, C, E, N. Credé et al. [20] reported highest to lowest of A, O, C, E, N, for the BFI-10, in a student sample, and C, A, O, N, E, in a sample of workers.
Regarding age differences, the results of the current research are consistent with previous findings (e.g., [6,28,29,30]): age was positively associated with agreeableness and conscientiousness, and negatively associated with neuroticism and openness. Likewise, in a Chinese sample of mostly psychiatric patients, Yang et al. [13], using the NEO-PI-R, found that age positively correlated with conscientiousness and agreeableness, and negatively correlated with openness and neuroticism.
Regarding gender differences, the current research with Chinese participants found that females generally scored higher for agreeableness, neuroticism, extraversion, and openness, as has been found in many other countries [31]. Likewise, in previous Chinese research, Yang et al. [13] found that females scored higher for agreeableness, and Zhang and Huang [67], using the NEO-FFI with university students, found that females scored higher for openness and agreeableness. With their 29-item BFI, Leung et al. [15] found (in 368 male smokers/exsmokers compared with 71 females) significantly higher scores in females for neuroticism, but significantly lower scores for openness.

Internal consistency
The Chinese version of the BFI-44 showed acceptable/good values of internal consistency (coefficient alpha) in all studies. Values ranged .698-.807; agreeableness had the lowest ranking value in 3/4 studies. These values are generally comparable to those reported in other studies of the BFI-44. Alphas for the English BFI-44 usually range .75-.90, and agreeableness often ranks lower than the other domains [2,3]; for example, Benet-Martinez and John [9] found that agreeableness had the lowest rank in both English and Spanish versions of the BFI-44 (< .7 for the Spanish version). The current alpha values are also comparable to those reported for other Chinese scales, including Leung et al.'s [15] 29-item BFI with alphas ranging .69 (agreeableness) to .81 (neuroticism); also, Zhang and Huang [67] found alpha values ranging .56 (openness) to .75 (neuroticism), for a Chinese translation of the NEO-FFI.
The Chinese version of the BFI-10 had alpha values ranging .593-.752 for extraversion; .037-.466 for agreeableness; .251-.462 for conscientiousness; .331-.628 for neuroticism; and .364-.525 for openness. The highest values for each domain were all reasonable for two-item scales (all > .45), and comparable to alpha values reported in other studies using the BFI-10, or Ten-Item Personality Inventory (TIPI), many of which have also reported the highest value for extraversion, and lowest for agreeableness, as generally found in the current studies. For example, Thalmayer et al. [8] reported BFI-10 alphas ranging .43 (agreeableness) to .72 (extraversion); Credé et al. [20] reported BFI-10 alphas ranging .37 (agreeableness) to .65 (extraversion); and Gosling et al. [16] reported TIPI alphas ranging .40 (agreeableness) to .73 (emotional stability).
However, the lower values of alpha in the current studies were < .3, with a low of .037 for agreeableness in Study 1a, which might indicate poor quality data, perhaps related to careless responding in that sample [68], at least for that subscale (the other domains had alphas > .4 in Study 1a). Some lower values of alpha have also been reported for the TIPI; for example, Woods and Hampson [17] reported .25 for openness, and Renau et al. [23] reported .21 for the agreeableness subscale of their Spanish TIPI. Low values have also been found for longer scales in some samples. For example, Oshio et al. [24] reported an alpha of .14 for the Values facet of the Openness dimension of the Japanese NEO-PI-R.
There has been some discussion about the importance of values of Cronbach's alpha, particularly in short scales such as the BFI-10 and TIPI (e.g., [8,16,57]). There is a trade-off between internal consistency and scale length (and so, administration time), although high internal consistency might indicate redundancy and narrowness in the scale [56]. Furthermore, McCrae et al. ( [68]; see also [69]) found that retest reliability (for the NEO facet scales) significantly predicted validity criteria (including heritability and cross-observer ratings), but internal consistency (alpha) did not. Item heterogeneity (i.e., how many aspects of a trait are covered by the items in a scale) diminishes alpha, but not test-retest reliability: ". . . specific variance in an item is not shared by other items in the scale, so it detracts from alpha. However, in retest designs, the same items, with the same specific variance, are readministered, and they may elicit the same response. . . . Item-specific variance differentiates retest from internal consistency reliability, and thus may account for the superiority of the former in predicting scale validity" [69, p.2-3].
For the BFI-10, generally good 5-6 week test-retest coefficients were shown in studies 1a, 2b, and 2c, with ranges of: .803-.873 (extraversion); .515-.811 (agreeableness); .645-.706 (conscientiousness); .610-.837 (neuroticism); and .698-.839 (openness). These coefficients are comparable to those found in other studies. For example, Rammstedt and John [10] reported BFI-10 retest coefficients ranging .65 (openness) to .79 (extraversion) for the English language BFI-10 (8-week interval), and a range of .66 (agreeableness) to .87 (extraversion) for the German version (6-week interval). For the TIPI, Gosling et al. [16] reported 6-week retest coefficients ranging .62 (openness) to .77 (extraversion). It is notable that extraversion was the strongest dimension in these studies. A meta-analysis of test-retest reliability for the big five personality domains [70] found that extraversion showed the highest reliability while agreeableness had the lowest. In the current research, extraversion had the highest retest coefficients in all studies (BFI-44 and BFI-10), while agreeableness was lowest for the BFI-44, and the BFI-10 in Study 1a (conscientiousness was lowest in BFI-10 studies 2b and 2c). Agreeableness ratings may be more susceptible to situational influences [70], which may have had a bearing on, for example, the relatively low test-retest reliability for agreeableness in Study 1a. Given the relatively weaker psychometric properties of the BFI-10 agreeableness subscale, an optional extra agreeableness item can be included (see S1 Appendix; cf. [10]) In view of the above cited arguments and evidence for test-retest reliability being more strongly associated with validity than is coefficient alpha [8,68,69], the current findings of generally good test-retest reliability bode well for the Chinese BFI-44 and BFI-10 scales used in the current studies. As McCrae [69, p.12] states ". . . When retest reliability is high, random error is low, and it is hardly surprising that validity is higher."

Nomological network
As discussed above, both the BFI-44 and the BFI-10 showed associations with age and gender that have been found in other Western and Asian studies. Other evidence for external/criterion-related validity was shown in the positive correlation between conscientiousness and morningness which was consistently found (significantly in all studies except BFI-44 Study 1c), replicating the finding that conscientiousness is the strongest personality correlate of chronotype [32,33]. Also, although Hsu et al. [71], for example, found a positive association between morningness and extraversion, this was found in only one of the current studies, while a metaanalysis found a small negative correlation (r = -.06; [32]). As Hsu et al. [71] suggest, such inconsistencies may possibly reflect culture differences, but they may also be at least partly due to the personality scale and theoretical approach used, in addition to variation in samples [32]. For example, Hsu et al. [71], with a sample of nearly 3000 Taiwanese undergraduates, used a Chinese version of the Maudsley Personality Inventory (MPI), rather than one of the big five scales (they did not assess the big five domains of conscientiousness, agreeableness or openness).
Personality correlates of mindfulness (the MLO scale) showed correspondence with the findings from Giluk's [34] meta-analysis: mindfulness was negatively correlated with neuroticism in all studies, while positive correlations were found with all of the other personality dimensions. There was some evidence of attenuation of coefficients with the BFI-10, as significant positive correlations between mindfulness and extraversion, agreeableness, conscientiousness and openness were found with the BFI-44 in Study 1d, but these correlations were generally weaker and not consistently significant in the BFI-10 studies. Such attenuation of correlations can occur when using shorter measures of a construct, which make less fine-grained distinctions between respondents [17,20]. However, some evidence for predictive validity (the 'acid test' for short scales [8,56]) was shown in BFI-10 Study 2d, in which conscientiousness and neuroticism were significant predictors of mindfulness.
Regarding mind wandering (the MW scale) and daydreaming frequency (the DF scale), the observed negative correlations between conscientiousness and both MW and DF, in all studies, is consistent with the findings of Jackson and Balota [40]. Also, MW and DF were significantly positively correlated with neuroticism (including the facets of anxiety and depression) in nearly all studies, consistent with the widely reported association between mind wandering/ daydreaming frequency and negative affect (e.g., [39,42,43,44]). More inconsistent correlations were found with extraversion (significantly negative in three studies), and agreeableness (significantly negative in three studies). Some of this inconsistency may again be due to attenuation with the BFI-10, but it is noteworthy that positive correlations between DF and openness were only found in two BFI-10 studies. Such evidence from the BFI-10 needs to be replicated with longer personality scales, to establish whether the associations may be with the general domain or only particular facets [69]. However, an association with openness has previously been reported. Zhiyan and Singer [72] investigated big five correlates of three 'styles of daydreaming' identified from second-order factor analysis of responses to the Imaginal Processes Inventory [49,73]. They found that the 'positive-constructive' style (frequent, vivid, and generally enjoyable daydreams) is most strongly associated with openness; the 'guilty-dysphoric' style (mixed heroic/achievement and failure/guilt daydreams) is most strongly associated with neuroticism; and the 'poor attentional-control' style (more mind wandering, more experience of boredom, and less sense of control) is most strongly associated with less conscientiousness (and more neuroticism). The current research did not assess these higher-order 'styles of daydreaming' so direct comparisons cannot be made, but the results have some correspondence, on the assumption that MW is more closely related to 'poor attention control', while DF may be more related to 'positive-constructive' and 'guilty-dysphoric' daydreaming. Further research could seek to replicate Zhiyan and Singer's [72] findings, and perhaps also investigate big five facet-level associations with the content of mind wandering/daydreaming episodes, including those more associated with positive affect [74,75].
Overall, a consistent pattern of personality correlates was found for chronotype, mindfulness and the frequency of mind wandering/daydreaming, substantially in accord with previous findings. In addition, the exploratory study of associations with facets of the big five dimensions found that the 'self-discipline' facet of conscientiousness was a common correlate. Furthermore, self-discipline was found to mediate the relationships between chronotype (predictor) and mind wandering, daydreaming frequency, and mindfulness. Such findings show the value of more fined-grained analysis, and suggest that further research at the facet level may be insightful, possibly indicating mechanisms underlying associations between aspects of personality and cognition [72,76].

Limitations and future research
The current research involved a series of eight surveys which allowed for testing of the replicability/consistency of the findings for the BFI-44 and the independently translated BFI-10. However, direct comparisons between the BFI-44 and BFI-10 were limited to only Study 1a, which involved the combination of these scales into one 54-item scale. Although other studies have also analysed sub-groups of items from longer scales (e.g., [10] in the development of the BFI-10, and [14] for 100-item and 50-item IPIP scales), it might be that responses to items can vary to some extent relative to whether they are part of a shorter or longer scale [77]. Also, longer scales may induce more boredom, fatigue, frustration, and/or mistrust (e.g., from respondents' perceptions of redundant items), and may produce careless responding at similarly worded items [16,20,78]. This may have been so in Study 1a, as the BFI-10 is similarly worded to the corresponding items of the BFI-44, and this may have influenced responses to some items more than others. Although the results for dimension means, alphas, test-retest coefficients, etc, in Study 1a were mostly comparable to those found in the other current studies, the BFI-10 agreeableness subscale had a very low value of alpha.
So, future research comparing these BFI-44 and BFI-10 scales should present them separately, in different test sessions (or separated by other scales if within the same test session).
Also, more extensive testing of convergent/discriminant validity should be done with other big five measures. Other evidence for validity, such as from peer ratings, and more extensive testing of predictive validity (including other kinds of non-questionnaire measurement, such as GPA; cf. [8]), would also be useful, as would further testing of reliability (e.g., test-retest over different intervals). Also, both scales may potentially benefit from revisions to some item translations, perhaps especially for reverse-scored items [15]. Finally, although the eight surveys in the current research involved a relatively large sample (total N = 2,496), the majority of these were students aged 18-30, although the demographics varied between studies. The establishment of reliable, valid normative data, and of stable, replicable scale structures, may require research with much larger samples, with different sub-groups appropriately represented. This would also allow for comprehensive investigation of any interactions between age and gender for personality changes over the lifespan (cf. [28,30]).

Conclusion
The current series of studies has contributed to Chinese-language personality research by detailing the psychometric properties of Chinese translations of the BFI-44 and BFI-10. Overall, there was good convergent and discriminant validity, good test-retest reliability, and expected associations with age, gender and other criterion variables. While further research may reveal important, replicable culture differences in personality structure, much international research, including in China (e.g., [13,14,15,67]), continues within the big five framework, which can facilitate the comparison of findings from different countries/cultures [19]. The BFI scales offer appropriate measures, freely available for non-commercial use. While short scales like the BFI-10 save time, and may partly off-set their poorer psychometric properties by reducing participant frustration or boredom, and the careless responding that this may produce [16,78], these advantages must be weighed against limitations, including the reduced content coverage, and attenuation of external associations [18,20]. Consequently, we would echo the view of Gosling et al. [16] and Rammstedt and John ([10]; see also [17,20,78]), that the use of very short personality measures is not to be encouraged, but that these scales may be useful when there is limited time available for the research, or personality is not the main focus.