Measurement properties of the German version of the Physical Activity Enjoyment Scale for adults

The physical activity enjoyment scale (PACES) is a measurement instrument that is commonly used in monitoring and intervention research to assess how much people enjoy being physically active, as this has been related to physical activity adherence. However, while the measurement properties of PACES are well-researched in the English language, there is a gap of research in the German language, especially when looking at adults. Thus, the purpose of this work was to examine reliability, factorial validity, criterion-related validity, and measurement invariance across sex, age groups and time of the PACES for German-speaking adults. Data was obtained from the Motorik-Modul-Study (MoMo) in which 863 adults (53.5% female; mean age = 20.9 years) were examined. To investigate measurement invariance across age groups, data from 2,274 adolescents (50.5% female; mean age = 14.4 years) was obtained additionally. The study provided a nationwide representative sample for Germany. Results showed high internal consistency of PACES in adults (Cronbach’s α = .94). Confirmatory factor analyses confirmed the invariance of the measure across age groups, time, and sex. Criterion-related validity could be shown as the global factor significantly correlated with overall physical activity, physical activity in sports clubs, and leisure-time physical activity. The analyses of factorial structure indicated a method effect for positively and negatively worded items. Correlated uniqueness, latent method factor and a hybrid model were applied to analyze the method effect and results indicated that the method effect of positively worded items was predictive of physical activity independently of the global factor. Overall, it can be concluded that PACES is reliable, valid and invariant measure of physical activity enjoyment to be used in German-speaking adults. Further studies are warranted to examine the factorial structure of the PACES and the consequences of the method effect.

Introduction due to differences in the understanding of the measurement instrument [33]. Vandenberg and Lance suggested five steps to test measurement invariance, including equivalence of structure, factor loading, measurement intercepts, structural covariance, and item errors (uniqueness) [32], which should be applied to measurement instruments of PA enjoyment.
A common measurement instrument to measure PA enjoyment and children and adolescents is the Physical Activity Enjoyment Scale (PACES), originally designed by Kendzierski and DeCarlo [34] with 18 items representing a scale between to bipolar adjectives (e.g. pleasant-unpleasant, enjoy-hate) on a 7-point scale. Motl and colleagues [35] revised the questionnaire by removing two items because of the lacking correspondence between items and the construct and redundancy of the items. The remaining 16 items were rephrased to improve comprehension of the questionnaire [35]. Measurement properties have been tested several times among children and adolescents for the English language [35][36][37], but only once for the German language [38].
Furthermore, measurement properties for PACES in adults are less well explored than in the younger age group, with heterogeneous results. The first PACES questionnaire was tested for reliability and validity with undergraduate students between 18 and 24 years, confirming the unidimensional structure [34]. Using Rasch models, the unidimensional structure of PACES was confirmed in a sample with adults aged 25 to 75 years, while item content did not fit all respondents [39]. In contrast, construct validity with 18 items could not be confirmed in a sample of older adults, but instead, an adapted scales with 8 items showed a good model fit together with measurement invariance across exercise groups and longitudinally [40]. However, some studies suggest that construct validity of PACES might be impacted by the valence of the items.
Two studies found that positively and negatively worded items shared unique variance, which could not be accounted for by a global factor [35,38]. This issue questions the dimensional structure of PACES. Intuitively, the basic assumption was that both positively and negatively worded items are interchangeable and assess the same construct [34]. However, the results of confirmatory factor analyses consistently suggested a poor fit of a singe-factor model [35,36]. Interestingly, a two-factor model led to an improvement of the model fit, whereas the overall fit was still not satisfying [36,38]. Motl and Dishman [35] assumed that this misspecification was due to a method effect of positively and negatively worded items and proposed the strategy of correlated uniqueness (CU). This strategy to allow uniqueness terms to correlate amongst each other provided an almost perfect model fit in several studies [35,38,41]. However, the CU strategy has been criticized for comingling of unsystematic influences and method effects [42]. As an alternative, Bagozzi [43] proposed the latent method factor (LFM) strategy, which captures variance between items with the same method (i.e. positively and negatively worded items) by providing latent factors to each method effect. This strategy allows for direct estimation of both the global construct (i.e. enjoyment) and the method effects (of positively and negatively worded items) directly. Furthermore, it is possible to separate the error variance from the method variance, which was shown for Rosenberg's Self-Esteem Scale to be related to depression and life satisfaction [44]. However, LMF led in several cases to improper solutions, like negative variances and correlations estimates greater than 1.0 [41,45]. In terms of PA enjoyment, the question arises whether the method effects provide additional predictive power for PA or other outcomes. In PA research, this issue has not been yet examined. In some studies, it has been supposed that the method effect is related either to positively or to negatively worded items [35], leaving the question open whether a hybrid model of CU and LMF might be a solution. Furthermore, while some studies in other areas suggest that method effects differ across groups, such as sex and age groups [45,46], this has not been considered in the PACES studies up to date [35,36,38,40].
In addition to the heterogenous study results regarding the measurement properties in adults of PACES, the measurement properties have never been tested in the German version of PACES. This version has only been validated for youth so far [38], based on an adapted 16-item scale of Motl and Dishman [35]. This is critical, given that research on PA enjoyment is prominent within the USA but lacking in European adults. Thus, this study aims to investigate a) reliability, b) construct validity, c) measurement invariance as well as d) criterionrelated validity of PACES for German adults.

Participants
Participants were part of the national, representative Motorik-Modul-Study (MoMo) on PA and physical fitness in children and adolescents from Germany, and its umbrella study, the German Health Interview and Examination Survey for Children and Adolescents (KiGGS) [47,48]. Study participants were selected based on a multi-stage sampling approach with two evaluation levels [49]. First, a systematic sample of 167 primary sampling units was selected from an inventory of German communities stratified according to the classification system that measures the level of urbanization and geographic distribution. Second, an age-stratified sample of randomly selected children and adolescents was drawn based on the official registers of local residents. Data for this study was obtained from Wave 1 (W1: 2009-2012) and Wave 2 (W2: 2014-2017). For Wave 1, 12,368 participants were part of the KiGGS measurement and 5,106 participants of the MoMo measurement. At Wave 2, 6,233 participants were randomly assigned from KiGGS to MoMo. As MoMo is a longitudinal cohort study, adolescents baseline participants transitioned into adulthood in Waves 1 and 2, resulting in the adulthood sample from which data was obtained for this study [50].
To be eligible for our study to test reliability, construct-and criterion-validity as well as measurement invariance for age and sex, participants must have been part of Wave 1 and must have been at least 18 years old. Data was obtained of participants that were at least 18 years at Wave 1. To examine if results of PACES in adolescence are comparable to the ones in adulthood via measurement invariance testing, an adolescent sample of the MoMo study was included cross-sectionally and longitudinally. To be eligible for our study to test measurement invariance across age groups, participants must have been part of Wave 1 and either in the adolescent age range (11-17 years) or adults (� 18 years). To be eligible for our study to test measurement invariance in the transition from adolescence to adulthood, participants were included that took part in both measurement occasions Wave 1 and Wave 2. In addition, participants had to be 11-17 years at the first measurement occasion (W1) and 18 years or older at the second measurement occasion (W2).

Study design and procedures
For this study, a nationwide representative sample of children and adolescents from Germany was planned. Thus, a stratified, multi-stage sample with three evaluation levels was drawn [49]. First, a systematic sample of 167 primary sampling units was selected from an inventory of German communities stratified according to the BIK classification system that measures the level of urbanization and geographic distribution. Second, an age-stratified sample of randomly selected children and adolescents was drawn from the official registers of local residents for the KiGGS sample [49]. On the third level, MoMo participants were selected from the KiGGS sample [51].
The study was conducted according to the Declaration of Helsinki. Ethical approval was obtained by the Charité Universitätsmedizin Berlin ethics committee and the Federal Office for the Protection of Data. For both the MoMo-and the KIGGS-study, participants gave their written consent to participate and were informed in detail about the study and data management by the Robert Koch Institute. Parents gave their written consent for minors and the presence of a legal guardian was mandatory for participants aged below 15 years.

Measures
Enjoyment. Enjoyment was measured using PACES. The adapted version has been developed by Motl and colleagues [35], consisting of 16 items (9 positive and 7 negative poled items) with responses on a 5-point Likert-Scale (1 = "I disagree a lot"; 5 = "I agree a lot"). All items were related to feelings concerning physical activity enjoyment suggesting face validity of the questionnaire. The items of the PACES seem to possess content validity. Item examples are "When I'm physically active, I enjoy it" (positively worded) and "When I'm physically active, it's not fun at all" (negatively worded). For the overall scale, negatively worded items are recoded to fit with the positively worded scale. Then, the average of the sum of the items is calculated [36,52]. The version of Motl and Dishman [35] has been translated into the German language which has already been reported elsewhere [38]. Briefly PACES was translated using forward and backward translation by qualified staff members (native speakers). Any wording differences were resolved by the translators. Comprehensibility of the items was assessed by 7 th grade students. The German version of PACES was then tested with around 700 adolescents in two studies [38]. Results showed that factorial validity and measurement invariance were inconsistent, but sufficient test-retest reliability (ICC = .76), internal consistency (α = .89), and criterion validity (r = .42 with PA diary and r = .16 with accelerometry data) were obtained [38].
Physical activity. Participants completed the MoMo Physical Activity Questionnaire [MoMo-PAQ; 53]. It consists of 28 items and measures frequency, intensity, time, and type of PA in four domains: PA at school, PA at organized sports clubs, PA outside of organized sports clubs, and daily PA. For those participants that were not in school any more, PA at the work place was assessed instead. Based on these four domains [38], an index was calculated considering moderate to vigorous PA (MVPA). Adults who were not in school were asked about activity at work. The outcome measure was MVPA minutes per week. Reliability and validity of this questionnaire were shown to be comparable to other international PA questionnaires [38].

Statistical analysis
To investigate if the two-factorial structure is also appropriate for the German version of PACES, we first conducted an exploratory factor analysis (EFA). Following that, confirmatory factor analyses (CFA) with full-information maximum likelihood estimation was performed in AMOS 25 [54]. Through this method, less biased estimates are obtained compared to classical missing data procedures, including list-/pairwise deletion or mean imputation [55]. Across all three datasets, missing data ranged between 0.96%-0.98% for the PACES items.
Preliminary analyses revealed that negatively worded Item 7 (PA makes me depressed) and Item 12 (PA frustrates me) had extremely low means and standard deviations causing high skewness and kurtosis. The multivariate normality value and its critical ratio were 130.1 and 79.3, respectively, indicating nonnormality in the sample [56]. Therefore, we used the bootstrap method [57] to find approximate standard errors.
Reliability. Internal consistency was estimated using Cronbach's α in SPSS 26 and by composite reliability. To calculate Cronbach's α for the overall scale, negatively worded items were recoded to fit with the positively worded items. Cronbach's α coefficient underestimates the reliability of the composite score due to the assumption of uncorrelated uniqueness among indicators, especially for multidimensional scales [58]. Based on the formula of Raykov [59], the composite score was estimated. All coeffcients are presented overall, by gender, and by age group.
Factorial validity. In order to examine the factorial validity of PACES, nine models were specified, following the approach of Marsh and colleagues [41] (Figs 1-3) and tested across sex and age groups. Model 1 suggests a global enjoyment factor without correlated error terms, thus not considering method effects, which is consistent with the version predominantly used in applied research. In Model 2, two latent factors are established, defined by negatively and positively worded items and without an overarching enjoyment factor (Fig 1). Models 3-5 ( Fig  2) apply the CU framework by correlating the uniqueness terms. Model 3 posits one enjoyment factor with method effects (correlated uniqueness) among the negatively worded items.

PLOS ONE
The German version of Physical Activity Enjoyment Scale for adults The same procedure is applied for Model 4 for the positively worded items. In Model 5, method effects are tested for negatively and positively worded items at the same time. For models 6-8 (Fig 3), the LMF strategy was applied. In Model 6, both positive and negative LMFs are included, in Model 7, only a negative LMF is included and in Model 8, only a positive LMF in included. Model 9 was a hybrid model of CU and LMF, in which the positive items are based on CU and negative items on LMF as preliminary analyses suggested that negatively worded items might predict PA independently of the global factor. A model of CU, a model of LFM and the hybrid model showing the best fit were chosen to test factorial validity, measurement invariance, and composite reliability.
Several fit indices were used to show the appropriateness of each model. The overall model fit was assessed using χ 2 -statistic, with a non-significant p-value indicating a good model fit [60]. However, the test depends on the study's sample size [61]. Thus, even minor differences are detected between the model implied and observed covariance matrix so that model misspecifications are overestimated, leading to the rejection of the null-hypothesis [58]. The Comparative Fit Index (CFI) is used to show the relative fit improvement by comparing the suggested with the baseline model [62]. CFI values around .90 indicate an acceptable fit, values around .95 are considered as good fit [61,62].
The Root-Mean-Square Error of Approximation (RMSEA) describes the error of approximation in the population, thus indicating the model's closeness of fit. RMSEA values of � .06 indicate a close and acceptable model fit. To show a good model fit, the zero should be contained in 90% confidence interval (CI) around the RMSEA point estimates [61,63].
The successive, nested models were tested by χ 2 -difference tests. In addition to the model parsimoniousness and absolute fit, Akaikes information criterion (AIC) were used to determine the best fitting model as the models were not nested. Lower AIC values indicate a better model fit. Measurement invariance. To investigate measurement invariance across age groups and gender, five nested models (Model A to Model E) were tested and compared using multiple group analysis [32,64]. Each successive model consisted of the previous model restrictions and additional constraints [65]. The following components were consecutively tested: equivalence of structure (Model A), equivalence of factor loadings (Model B), equivalence of measurement intercepts (Model C), invariance of structural covariances (Model D), and invariance of item uniquenesses and correlations between uniquenesses (Model E) across time, sex, and age groups [32]. The models were tested for differences using χ 2 -statistic (Δχ 2 ). However, as the χ 2 -difference tests depend on sample size, differences of CFI (ΔCFI) were tested additionally [66]. The null hypothesis of invariance should be accepted if the χ 2 -difference test is not significant or ΔCFI � 0.01 [66]. The significance level was set on 1%.
Criterion-related validity. To test the criterion-related validity of PACES, correlations between the PACES factors and MVPA overall, MVPA in sports clubs, and leisure MVPA were calculated in SPSS. For each latent factor, stability coefficients over a period of five years were used to estimate their systematic variance.

Reliability
Scale means, confidence intervals, standard deviations, Cronbach's α and composite reliabilities for Models 5, 6, and 9 are shown in Table 1. Cronbach's α of the questionnaire was 0.94, indicating very good internal consistency. The composite reliability of the global factor was 0.92-0.93, indicating very good reliability as well.

Exploratory factor analysis (EFA)
An EFA with varimax rotation was performed on the cross-sectional adults sample to explore the factor structure of the 16 items. We applied three criteria to decide on the number of factors: eigenvalues >1 and factor loadings >.0.60. The above-listed criteria were met with the two-factor solution: 9 positivley worded items loaded on the first factors, 7 negatively worded items loaded on the second factor, which confirms that structure observed by previous studies [35,38]. The first component had an eigenvalue of 8.31 and explained 51.96% of the variance,

PLOS ONE
The German version of Physical Activity Enjoyment Scale for adults the second factor had an eigenvalue of 1.50 and explained 9.34% of the variance, resulting in 61.31% of the variance explained through the factors.

Factorial validity
The results for factorial validity are presented in Table 2 The overall analysis showed that Models 5, 6 and 9 provide the best models fits with AIC = 336.3 for Model 5, AIC = 508.2 for Model 6 and AIC = 362.6 for Model 9. Thus, further analyses were conducted with the Models 5, 6 and 9. Amongst those three models, the model fits of Models 5 and 9 were superior to the fit of Model 6. All three models provided good fits for females and males. For the global factor, all factor loadings in the three models were significantly different from zero (Table 3). In Model 6, all factor loadings for the positive LMF were also significantly different from zero and had unexpectedly a negative sign. For the negative LMF in Models 6 and 9, the factors loadings of Item 7 ("It makes me depressed") and Item 12 ("It frustrates me") were dominant. Although the negative worded items were recoded into a

PLOS ONE
The German version of Physical Activity Enjoyment Scale for adults positive direction, the factor loadings had surprisingly a negative sign in the Model 6, but a positive sign in the Model 9.

Measurement invariance
Measurement invariance was tested across sex, age groups, and across time for Models 5, 6, and 9. Results for Model 5 are reported in Table 4. The χ 2 -difference test was significant for the difference between Model B and Model C and between Model D and E for sex, age groups, and across time. However, the CFI did not decrease more than 0.01 for any of these comparisons. These results suggest that Model 5 can be regarded as invariant for sex, age groups, and across time. Similar results were obtained for Model 6 and are presented in Table 5. The χ 2 -difference tests were significant for the differences between Model B and Model C and between Model D and E for sex, age groups, and across time. Additionally, the difference between Model C and Model D was significant for comparison between sex groups. However, again, the CFI did not decrease more than 0.01 for any of these comparisons. Therefore, these results suggest that Model 6 can be regarded as invariant for sex, age groups, and across time.
The results for Model 9 are presented in Table 6. The χ 2 -difference tests indicated that the differences between Model B and C as well as between Model D und E for all three comparisons across age groups, sex, and time were significant. However, once more, ΔCFI did not exceed .01 for any of these comparisons. Therefore, Model 9 can also be regarded as invariant across sex, age groups, and time.

Criterion-related validity
The results of criterion-related validity are presented in Table 7. For Model 5, the global factor was significantly (all p < .01) correlated with overall PA (r = .43), PA in sports clubs (r = .34) as well as PA in leisure time (r = .26). For Model 6, the global factor significantly correlated with overall PA (r = .33), PA in sports clubs (r = .27) and PA in leisure time (r = .20) (all p < .01). The latent factor for positively worded items significantly, but negatively correlated with all three indicators of PA. Finally, the latent factor for negatively worded items significantly positively correlated with overall PA and PA in sports clubs, but not with leisure time PA. For Model 9, the global factor significantly correlated with overall PA (r = 0.39), PA in sports clubs (r = 0.31) and PA in leisure time (r = 0.23) (all p < .01). The latent factor for negatively worded items showed correlations with overall PA (r = -0.20) and PA in sports clubs (r = -0.18), but not with PA in leisure time. The amount of explained variance was highest for Model 6 explaining 23% in overall PA, 15% in PA in sports clubs and 9% in leisure time PA.
The stabilities for a period of five years were significant for all latent factors in all models. The stabilities of the global factor ranged between .56 and .62. The stability of the LMF for positively worded items was .31 in Model 6 and the stability of the LMF for negatively worded items was .32 in Model 6 and .80 in Model 9, suggesting that LMFs also contain parts of the variance, which are stable over time.

Discussion
This study aimed to investigate reliability, construct validity, measurement invariance, and criterion validity of the German version of the PACES.

Psychometric properties of PACES
The results of this study suggest that the German version of PACES is a very reliable measure of PA enjoyment in adults. The coefficients of Cronbach´s alpha and composite reliabilities are well above 0.90. Comparable results were found in a sample of US adults [35] as well as German adolescents [38].
Concerning the invariance of PACES, significant deviations were found for equivalence of measurement intercepts (Model C) and invariance of item uniquenesses and correlations between uniquenesses (Model E) according to the χ 2 -difference tests. These deviations were found for all tested models across age groups, sex, and time. However, due to the hypersensitivity of the χ 2 -difference test, the criterion of ΔCFI � 0.01 [66] was applied. In this sense, there were no substantial deviations of the supposed invariance assumptions. Therefore, PACES is invariant across age groups, sex, and time.

Criterion-related validity
The results of the criterion-related validity support the assumption that PACES is a predictor of PA in adults. The global factor was related to overall PA, PA in sports clubs and leisure time PA. Additionally, the positive LMF significantly correlated with all three indicators of PA and the negative LMF significantly correlated with overall PA and PA in sports clubs. These results mean that both positive and negative LMFs are independent predictors of PA. Comparing the amount of explained variance of Model 6 with that of Model 5 and 9, we conclude that the positive LMF explains additional 4% of the variance in overall PA, 4% in PA in sports clubs and 2% in leisure time PA. However, the comparison between Model 9 and Model 5 does not suggest that the negative LMF substantially contributes to the explanation of the variance in PA. Thus, only the positive LMF has incremental predictive power for PA. Interestingly, the correlation between positive LMF and PA in Model 6 was negative, suggesting that the positive LMF explains some negative aspects of PA. This might be due to suppression effects that cause the shift of the sign of the factor loadings in the LMFs. The global factor represents general enjoyment in PA, whereas the positive and negative LMFs represent the remaining systematic variance. The sign of the loadings can change according to the configuration of the model and its relationship with the dependent variable. Furthermore, it is possible that the LMFs represent some negative aspects of PA enjoyment such as tediousness and frustration, which might not only be the opposite of enjoyment but rather suggest that enjoyment of PA and some negative emotions exist simultaneously. One possible way to explore the real meaning of the LMFs would be to employ, simultaneously to PACES, further measurement instruments measuring negative affective states during PA and to examine its interplay with PA.
In general, the results of this study suggest that both LMFs contain systematic parts of the variance, which can predict PA independently of the global factor. This idea is supported by the fact that both LMFs have significant stability coefficients over a period of five years. Therefore, the method effects of positively and negatively worded items on PA are systematic and need to be considered when predicting PA.

Factorial structure
Regarding the factorial structure of the PACES, the results suggest that a method effect of negatively and positively worded items exists. The CU and the LMF strategy have been proposed to deal with method effects [43]. Several variations of these strategies were shown to have better model fits than the pure single-factor model and the two-factor model. Among the CU models, Model 5, which allowed for correlated uniqueness terms among positively and negatively worded items, showed the best fit. Among the LMF models, Model 6, which postulates both a positive and a negative LMF, was shown to be superior over models with only one LMF. These results suggest that the method effect separately influences both positively and negatively worded items. Furthermore, a hybrid model with CU for positively worded items and LMF for negatively worded items provided a substantially better model fit than Model 6 with two LMFs, but a slightly worse model fit than Model 5 with CU for both positively and negatively worded items.
Regarding the model's fit, one should prefer Model 5 with CU for positively and negatively worded items. Regarding the criterion-related validity, however, one should take into consideration that both positive and negative LMFs were significantly related to the indicators of PA and had a moderate stability over a period of five years. According to Lance and colleagues [42], the LMF strategy should be preferred due to the advantages derived from the parameterization of method effects in the LMF. In this study, the hybrid model did not show to be an alternative to CU and LMF. CU provided a considerably better model fit and the model with two LMFs explained significantly more variance in PA than the hybrid model. According to the results of our study, the model with a positive and a negative LMF should be preferred compared to the CU and the hybrid model.

Implications for practice and science
The results of this study suggest that the German version of PACES is a reliable, valid and invariant measure of PA enjoyment in adults. It can be used in large population studies to examine and compare the levels of PA enjoyment across different population groups. The questionnaire is suitable for testing differences between specific population groups (e.g. age groups, sex). Furthermore, the PACES can be applied to examine intrapersonal changes in PA enjoyment due to its time invariance. Therefore, the PACES can be used to examine how interventions impact PA enjoyment. However, further research is required to deepen our knowledge on factorial structure of the questionnaire and to examine the predictive properties of the scale for PA behavior. Especially, the role of the positive and negative LMFs needs to be clarified in further studies.

Strengths and limitations
This study has several strengths. First, the sample of the MoMo-Study is a large and representative of German adolescents and young adults. Second, it provides longitudinal data, allowing to examine the invariance across time. Third, sophisticated statistical analyses using structural equation modelling were conducted to examine the psychometric properties of the measurement instrument. However, this study has also some limitations. This study does not contain data to estimate test-retest reliabilities of the PACES. Furthermore, a study including other related affective constructs (e.g. tediousness, stress or frustration) should be conducted to examine the relative meaning of the LMFs.

Conclusion
The German version of the PACES is a reliable and valid measurement instrument of PA enjoyment in adults. Furthermore, the measure is invariant across age groups, sex, and time. Therefore, the PACES can be used in population studies to compare the PA enjoyment of different population groups to examine the need for interventions as well as in intervention studies to examine the effectivity of PA interventions. The results of this study suggest that the PACES is associated with method effects for positively and negatively worded items, which need to be further examined in future studies.