Symptom-Checklist-K-9: Norm values and factorial structure in a representative German sample

Background The SCL-K-9 is the latest short version of the multidimensional Symptom-Checklist 90-R. Up to now, its psychometric properties have not been clarified sufficiently as the nine items have not yet been presented exclusively in a representative sample. Therefore, psychometric properties, model fit values as well as norm-values were analyzed. Methods For the sample, N = 2,507 participants aged 14 to 92, n = 1,379 women and n = 1,128 men, and a mean age of 48.79 (SD = 17.91), were selected from the general population by random-route sampling. Confirmatory factor analyses applying full information maximum likelihood (FIML) tested the model fit. The reliability estimations and effect sizes were reported. Results The items’ discriminative power ranged between .49 to .65, and the Cronbach’s Alpha was α = .87, which stands for a good reliability of the SCL-K-9. Norm values as well as gender and age specificities were presented in this section. The CFA with all nine items loading on one latent factor resulted in a good fit. There was evidence of invariance across age and gender groups. Summary Based on these results, the short screening version SCL-K-9 of the Symptom-Checklist 90-R showed good reliability and good model fit; specific norm values could be determined. Further studies should evaluate the usefulness of the standardization in clinical samples.


Results
The items' discriminative power ranged between .49 to .65, and the Cronbach's Alpha was α = .87, which stands for a good reliability of the SCL-K-9. Norm values as well as gender and age specificities were presented in this section. The CFA with all nine items loading on one latent factor resulted in a good fit. There was evidence of invariance across age and gender groups.

Summary
Based on these results, the short screening version SCL-K-9 of the Symptom-Checklist 90-R showed good reliability and good model fit; specific norm values could be determined. Further studies should evaluate the usefulness of the standardization in clinical samples. PLOS

Introduction
Within the last decade, the SCL-K-9 was employed in numerous studies as a research tool for measuring psychological distress, its sensitivity to changes by gestalt therapy in major depression [25], and by body image intervention in eating disorders [26]. Furthermore, it was applied as an indicator for psychological distress in lung transplant patients [27], fire victims [28], anxiety patients [29] as well as trauma and addiction patients [30].
Since the multidimensionality and the high scale inter-correlations of the SCL-90-R were critiqued, a more practical and applicable, one-dimensional short version with nine items was created for the measurement of the general factor 'psychological distress'. However, this onedimensionality has not yet been evaluated nor the invariances tested. Therefore, the aim of the present study is the examination of the dimensionality of the SCL-K-9 on a large Germanspeaking sample. Also, the multivariate influences on the one-factor structure will be examined for gender and age group. In order to be able to interpret the results of upcoming studies, the norm values of this representative German sample will be displayed.

Sample
In 2003, the USUMA (Unabhängiger Service für Umfragen, Methoden und Analysen) Berlin Polling Institute selected households and participants by random-route sampling [31]. The interviews were conducted at the participants´homes. Reasons for nonparticipation and corresponding figures can be obtained from Fig 1. Sixty-two percent of all contacted individuals filled out the questionnaire. A final sample of N = 2,507 native German speakers who had completed the German SCL-K-9 and the German Patient Health Questionnaire (PHQ-D) were examined (cf. Table 2). Using information from the Federal Statistical Office, the final sample was approved to be truly representative of the German residential population of 2003 in regard to age, gender, and region. All the participants volunteered and received a data protection declaration in agreement with the Helsinki Declaration. Verbal and written informed consent was obtained from all the participants. The study was approved according to the ethical guidelines of the "German Professional Institutions for Social Research" [31] and by the ethic committee of the University of Leipzig (050-13-11032013).

Instruments
The nine-and 27-item versions of the Symptom Checklist (SCL-9-K and SCL-27 [6; 13; 17; 21]) measure psychological distress. The SCL-27 assesses global distress and six subscales of specific symptoms: depressive, dysthymic, vegetative, agoraphobic, and socio-phobic, using  between four six items on a five-point scale ranging from 0 (Not at all) to 4 (Extremely) Internal consistency was α � .70 for the subscales and α = .93 for the GSI. The SCL-9-K, on the other hand, is a screener for global symptom severity and does not differentiate between individual types of symptoms. Its internal consistency in a previous study was α = .84 [32]. The Patient Health Questionnaire-D (PHQ-D) was used as an established measure of psychological distress [33]. It allows for the assessment of the severity of the symptoms of depression (α = .88) and somatization (α = .79). Rated on a scale from 0 = (Not at all) to 3 (Almost every day), the participants indicate to what extent a number of symptoms occurred during the two preceding weeks.

Statistical procedure
The internal consistency of the SCL-K-9 is reported as Cronbach's α-coefficient. Item selectivity (discriminatory power) as the correlation of the item with the sum of all other items was determined: item difficulty coefficients were calculated as quotients of the sum of the item values that were obtained and the sum of the maximum achievable item values multiplied by 100. Shapiro-Wilk was used to test for univariate non-normality on the item level. Gender differences were tested on the item level using Student's paired t-test. In order to quantify the gender differences, we estimated the effect size "g" (ES; Hedges & Olkin, 1985). In accordance with Cohen's convention (1988), ES > 0.2 is regarded as a small, ES > 0.5 as a medium, and ES > 0.8 as a large effect size.
For the confirmatory factor analysis (CFA), full information maximum likelihood (FIML) [34,35] estimation was used in order to incorporate the answers from participants with partially missing data. The norm values were based on participants with complete data (n = 2,486). CFA was conducted to test the one-factor solution of the SCL-K-9. Given the violation of the multivariate normality assumption, the Yuan and Bentler's [36] scaled χ 2 and standard errors (Maximum Likelihood Robust; MLR) [37] were used. MLR, in contrast to the asymptotically distribution-free method (ADF), can also be used on moderately-sized samples without restrictions [38].
To evaluate the goodness of fit of the relevant model, three different criteria were considered: while the root mean square error of approximation (RMSEA) as well as the 90% confidence interval assess the absolute model fit, the two additional calculated criteria (Comparative Fit-Index [CFI] and the Tucker Lewis Index [TLI]) are measurements of a relative model-fit compared to the "null" model. RMSEA values < .050 represent a "close fit", RMSEA values between .050 and .080 represent a "reasonably close fit", and RMSEA values > .100 represent an "unacceptable model" [39]. Regarding CFI and TLI, Hu and Bentler [40] suggested a CFI and TLI > .950 for a good model fit. The Standardized Root Mean Residual (SRMR) generally indicates good fit with values lower than .080 [40].
Furthermore, measurement invariance tests using multi-group factor analyses were conducted across gender (group 1 = men; group 2 = women) and age (group 1: < 25 years of age; group 2: 25 to 34 years of age; group 3: 35 to 44 years of age; group 4: 45 to 54 years of age; group 5: 55 to 64 years of age; group 6: 65 to 74 years of age; group 7: � 75 years of age). Measurement invariance tests were performed using the sequential strategy discussed by Meredith and Teresi [41]: First, a configural invariance model was tested, e.g., which item loads on which factor was imposed on the subgroups. Configural invariance refers to the equivalence of the factorial structure. It is given if the analyzed constructs show the same dimensionality and, in addition, the observed variables are correlated with the same latent constructs in both groups. Configural invariance is necessary but not sufficient for expecting an unbiased comparison of measurements between groups. Second, the weak invariance model was tested by constraining the estimate factor loadings to be equal across groups. If empirical support for weak invariance is provided, it allows the comparison of structural relationships (e.g., correlation coefficients, structural [path] coefficients) between latent constructs in groups. Third, the strong invariance model was tested by constraining both intercepts and loadings to be equal across groups. This level of invariance allows the comparison of means of the latent construct between groups. Finally, the strict invariance model was tested by constraining the loading, intercepts, and item error variances to be equal across groups. Different residual variances in groups may have two possible consequences. First, it may lead to different reliabilities of indices in those groups. Second, it may affect decisions in screening processes that depend on the expression of a construct, resulting in different error rates (e.g. sensitivity, specificity) for different groups [42] (please, see Fig 2 for further details). As noted by Chen [43], the commonly used chi-square differences tests of nested models is almost always significant in large samples and highly sensitive to departures from multivariate normality. Thus, we used scaled CFI differences (ΔCFI) as well as scaled RMSEA differences (ΔRMSEA) to compare the difference stages of measurement invariance. As recommended by Chen [43], a change of .010 in ΔCFI scaled , supplemented by a change of ΔRMSEA scaled = 0.015, was regarded as indicative of non-invariance. Furthermore, the absolute model fit of the relevant model was examined using the aforementioned cut-off values. In the case that one or more model parameters identified by invariance tests were found to be variant across samples (partial measurement invariance), the recommendation by Byrne et al. was followed [44] to conduct further invariance tests only when a minimum of two invariant parameters per invariance test (e.g., at least two factor loadings equivalent in metric invariance tests) were found. The data analysis was carried out in R using the packages lavaan and semTools [45,46].

Descriptive item analysis
There were missing data for 21 participants (n = 9 male and n = 12 female). Therefore, a final sample of N = 2,486 participant was used. As seen in Table 3, item selectivity values range from .49 to .65 and were all above the critical value of 0.3. Significant univariate non-normality was found via the Shapiro-Wilk test with all W > .51 (all p < .001) as well as for both skewness and kurtosis. Most items tended to be significantly right-skewed and spikier than the Gaussian distribution. The Cronbach's Alpha was .87, which stands for a good reliability of the SCL-K-9.

Effects of gender and age
In total, 1,367 women and 1,119 men responded to all the items of the SCL-K-9. In general, males (M = 3.28; SD = 4.53) reported lower values in the SCL-K-9 than females did (M = 3.91; SD = 4.95), t(2468.15) = 3.29, p = .001, ES = 0.13 . Males showed the lowest SCL-K-9 values below 24 years of age. The SCL-K-9 value of males rose continuously with progressing age. Females showed a different pattern concerning the trend of the SCL-K-9 across age groups. Young women (up to 24 years of age) and women older than 65 years of age reported the highest values in the SCL-K-9 questionnaire. In-between these limits females reported lower SCL-K-9 values. The lowest value was found in the age group ranging from 45 to 54. The results of the measurement invariance analysis regarding age and gender are depicted in Table 4. Regarding gender, the baseline model (Model 0; configural invariance), which simultaneously estimated all model parameters freed across groups, resulted in excellent model fit (CFI scaled = .948; RMSEA scaled = .054). Weak invariance was examined by comparing Model 0 with Model 1 (see Table 4), which constrained all factor loadings to be invariant across the aforementioned groups. ΔCFI and ΔRMSEA were below the cut-off recommended by Chen. Furthermore, the model fit was excellent to good (CFI scaled = .947; RMSEA scaled = .051). Strong invariance was examined by comparing Model 1 with Model 2 (see Table 4), which constrained all the item intercepts to be invariant across groups. Both, ΔCFI and ΔRMSEA (= .000) were below the cut-off, and the general model fit was excellent to good (CFI scaled = .941; RMSEAs caled = .051). Therefore, strong invariance can be assumed. Strict invariance was examined by comparing Model 2 with Model 3, which constrained all item residual variances to be invariant across groups. ΔCFI and ΔRMSEA were below the cut-off recommended by Chen. Furthermore, the model fit was excellent to good (CFI scaled = .940; RMSEA scaled = .048). Thus, strict invariance can be assumed for gender. Regarding age, the baseline model (Model 0) resulted in an excellent model fit (CFI scaled = .945; RMSEA scaled = 0.058). Weak invariance was examined by comparing Model 0 with Model 1 (CFI scaled = .942; RMSEA scaled = .054).; ΔCFI and ΔRMSEA were below the cut-off recommended by Chen. Strong invariance was examined by comparing Model 1 with Model 2, resulting in a considerable worsening of the model fit (ΔCFI = .033) and an unacceptable model fit (CFI scaled = .909; RMSEAs caled = .061). Subsequently, two-item intercepts were freed between groups (SCL3 "Feeling that you worry too much" and SCL7 "Feeling of heaviness in your arms and legs"). The resulting Model 2b exhibited an acceptable difference in fit compared with Model 1 (ΔCFI = .008; ΔRMSEA = .000) and a good to excellent model fit (CFI scaled = .934; RMSEAs caled = .054). Thus, partial strong invariance can be assumed. Strict invariance was examined by comparing Model 2b with Model 3, again resulting in a considerable worsening of the model fit (ΔCFI = .024; ΔRMSEA = .003). Two-item residual variance was freed between groups (SCL4 "Emotional vulnerability" and SCL7 "Feeling of heaviness in your arms and legs"), the resulting Model 3b exhibited an acceptable difference in fit compared to Model 2b (ΔCFI = .008; ΔRMSEA = .001). Furthermore, the model fit was good to excellent (CFI = .926; RMSEA = .053). Thus, partial strict invariance can be assumed for the SCL-9 regarding age.

Validation of the SCL-K-9 version: SCL-27 and PHQ
To analyze whether the SCL-K-9 is an acceptable, efficient tool for identifying mental health, the correlations between the general severity index of the SCL-K-9 and other questionnaires were calculated. The GSI of the SCL-K-9 and the SCL-27 correlated at r = .86, which stands for a very high correlation. The correlation coefficient between the GSI-K-9 and scales of the Patient Health Questionnaire were: Stress = .54, Somatic symptoms = .60, and Depression = .71. In this respect, these two questionnaires correlate highly as well.

Discussion
The SCL-90 [2] is the questionnaire most frequently used internationally to assess psychological distress, especially in clinical practice [47,48,49,50], but it is a very extensive and timeconsuming questionnaire. Therefore, short versions were developed for use in large representative studies. One of these is the SCL-K-9 version. The psychometric properties of this version were analyzed in the present study. Internal consistency measured with Cronbach's Alpha was .87, which stands for a good reliability of the SCL-K-9. Hereby, a low value of alpha could be due to, first of all, a low number of questions, second, a poor interrelatedness between items or, third, a heterogeneous construct. Furthermore, a too high Cronbach's Alpha value may suggest that some of the items are redundant as the questions refer the same matter but are Symptom-Checklist-9-K: Norm values and factorial structure phrased differently (i.e. item wording). Since in this context a maximum alpha value of 0.90 has been recommended [51], the value determined here may be judged as positive. Internal consistency is a necessary but insufficient condition for measuring uni-dimensionality in a set of items. Testing the hypothesized one-factor model using MLR-CFA resulted in an acceptable to good model fit. Hence, a unidimensional interpretation of the SCL-9 total score is given and a sum score can be calculated. Furthermore, evidence of strict invariance by sex and age could be found. Therefore, unbiased comparisons of means, correlation coefficients and path coefficients within SEM in multivariable studies are possible, independent of sex and age. Furthermore, undistorted screening of the sex and age groups is possible and explicitly relevant.
Given differences in covariance structure parameters for gender and age, it can be concluded that the SCL-K-9 is a robust instrument for the covariance structure of gender and age in any sample of a multivariable study.
Even though there are fewer items, the SCL-K-9 shows an internal consistency similar to that of other short versions. For example, the HSCL-25 showed a Cronbach's Alpha range from .84 to .87. However, the ultra-short version SCL-5 showed a lower internal consistency with the Cronbach's Alpha of .80. Therefore, it can be concluded that the internal consistency of the scale may be affected by a sufficient sample procedure.
The correlation between the GSI-9 and the GSI-90 was calculated as r = .93, which stands for a very high correlation. To our knowledge, there has never been another study reporting the associations between the short versions (SCL-K-9) and the full long version (SCL-90-R). With a value of .86, the correlation between the GSI-9 and the GSI-27 as well as the correlation between the GSI-9 and the PHQ-scales Stress = .54, Somatic Symptoms = .60 and Depression = .71, are moderate and high. In a Norwegian sample, the correlations of the SCL-5 with SCL-25 ranged from .91 to .97. In respect to the mental health MHI-5, the correlations were between -.76 and-.78 [9]. These correlations with the SCL-5 were slightly higher than the ones from the present study with the SCL-K-9. This Norwegian sample is also a very large representative sample. However, the sample was drawn over the course of 15 years, therefore, changes over time might have been measured as well. In addition, the SCL-K-9 was implemented by itself, whereby the SCL-5 was taken out of a data set of the longer versions (SCL-25). Hereby, the real associations based on a stand-alone SCL-5 data set and a longer version can only be assumed. Therefore, in comparison to the longer version, these results speak in favor of using the shorter and more efficient SCL-K-9 for assessing mental health.
High interpretation objectivity requires that the findings obtained by an instrument are interpreted in the same way by different diagnosticians. Thereby, it is important that all interpreters possess comparable knowledge regarding the measurements of a questionnaire and how individual or group values are to be interpreted quantitatively. The interpretation of a scale can be subjective if no clear interpretation instructions or reference values/norm values are given in the questionnaire documentation. Without any such information, it can only be said that person or group A has a value B on scale C. In order to interpret, e.g., value Q as high or low, comparison values/standard values are necessary for a representative sample. Therefore, the norm values of the present representative sample were included in the present study (see Tables 5 and 6).
The strength of the present study is its large representative sample and the statistical approach to the results. However, the SCL-K-9 is only a screening instrument, and additional assessments would be necessary for more profound conclusions. The SCL-K-9 enables the screening of mental symptoms in psychotherapy in a time-saving manner. After screening, intervention programs can be implemented more precisely for the population in need, thus avoiding a possible chronification of diseases and their expensive treatment. However, the Symptom-Checklist-9-K: Norm values and factorial structure SCL-K-9 is not suitable for an extensive individual diagnostic as its results merely offer an overview regarding the current psychological state. Therefore, detailed examinations would be called for in the presence of high values.