Screening for depressive symptoms in adolescents at school: New validity evidences on the short form of the Reynolds Depression Scale

The main purpose of the present study was to assess the depressive symptomatology and to gather new validity evidences of the Reynolds Depression Scale-Short form (RADS-SF) in a representative sample of youths. The sample consisted of 2914 adolescents with a mean age of 15.85 years (SD = 1.68). We calculated the descriptive statistics and internal consistency of the RADS-SF scores. Also, confirmatory factor analyses (CFAs) at the item level and successive multigroup CFAs to test measurement invariance, were conducted. Latent mean differences across gender and educational level groups were estimated, and finally, we studied the sources of validity evidences with other external variables. The level of internal consistency of the RADS-SF Total score by means of Ordinal alpha was .89. Results from CFAs showed that the one-dimensional model displayed appropriate goodness of-fit indices with CFI value over .95, and RMSEA value under .08. In addition, the results support the strong measurement invariance of the RADS-SF scores across gender and age. When latent means were compared, statistically significant differences were found by gender and age. Females scored 0.347 over than males in Depression latent variable, whereas older adolescents scored 0.111 higher than the younger group. In addition, the RADS-SF score was associated with the RADS scores. The results suggest that the RADS-SF could be used as an efficient screening test to assess self-reported depressive symptoms in adolescents from the general population.


Introduction
Adolescence is a particularly important developmental stage for socio-emotional development, but it is also marked by the emergence of mental health problems like stress and anxiety, psychotic spectrum disorders or depressive disorders among others [1,2]. Specifically, major a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 lack of studies addressing the reliability and new sources of validity evidence of the instrument around the world in order to be considered a useful tool for early detection of depressive symptoms (clinical and subclinical) in children and adolescents.
Taking everything into account the lack of previous research, the reliability of the scores, as well as the internal structure of the RADS-SF, are questions that still remain unclear. In addition, and although the RADS-SF is being used in adolescent populations, only few studies have analyzed the measurement equivalence of the instrument across gender and age. If MI does not hold, inferences and interpretations drawn from the data may be erroneous or unfounded [20,21]. Also, new data about the manifestation of depressive symptoms in adolescents with this measure is still needed and new sources of validity evidence with other measures are valuable.
Within this framework, the main objective of the current study was to assess the selfreported depressive symptoms and to analyze the psychometric properties of the RADS-SF scores in a large sample of non-clinical adolescents. Derived from this main goal are the following objectives: a) to estimate the reliability of the RADS-SF scores by means of the Ordinal alpha; b) to study the internal structure of the RADS-SF using CFAs; c) to study the measurement equivalence of the unidimensional model of the RADS-SF across gender and age; d) to compare latent mean scores across gender and age; and e) to analyze the sources of validity evidence with the RADS. In line, with previous literature, we hypothesized that the one-factor model of the RADS-SF would have adequate goodness-of-fit indices. In addition, we further hypothesized that the RADS-SF would be equivalent across gender and age. Finally, we expected that the reliability estimation of the RADS-SF scores would be adequate across groups.

Method Participants
Stratified random cluster sampling was conducted at the classroom level, in an approximate population of 37,000 students selected from the Principality of Asturias and La Rioja, two regions located in northern Spain. The students belonged to different public and concerted Educational Centers of Compulsory Secondary Education and Vocational Training, as well as to different socio-economic levels. The layers were created as a function of the geographical zone and the educational stage. Partial data of this study have been published elsewhere [22]. There were 3052 students in the initial sample, although some participants were excluded due to their high scores on the Infrequency scale (more than three points) (n = 48), being older than 19 years of age (n = 20), not completing demographical data (e.g., gender and age) (n = 18) or not completing all the administered self-reports (n = 52). Thus, the final sample was composed of 2914 students, 1287 males (44.3%) from 41 centers and 95 classrooms. The mean age was 15.90 years (SD = 1.44), with an age range between 13 and 19 years. The distribution by age was: 13 years-old (n = 65), 14 year-olds (n = 425), 15 year-olds (n = 775), 16 year-olds (n = 712), 17 year-olds (n = 499), 18 year-olds (n = 285), and 19 year-olds (n = 153).

Instruments
The Reynolds Adolescent Depression Scale (RADS) [13] is used to assess the severity of depressive symptomatology in adolescents from 11 to 20 years of age. It is composed of 30 items in a Likert response format with 4 options (1 = almost never, 2 = hardly ever, 3 = sometimes, 4 = most of the time). The total scores range from 30 to 120, with the cut-off score for determining the severity of the depressive symptomatology set at 77 points [13]. Reynolds [12] proposed four scales: Anhedonia/ Negative Affect, Somatic Complaints, Negative Self-Evaluation and Dysphoric Mood. The RADS has been extensively employed in a variety of topics, samples and nationalities presenting adequate psychometric properties [23][24][25][26][27]. The Spanish version, was used for the present study [28][29][30].
The Oviedo Infrequency Scale (INF-OV) [31] was administered to the participants to detect those who responded in a random, pseudorandom or dishonest manner. The INF-OV instrument is a self-report composed of 12 items in a 5-point Likert-scale format (1 = completely disagree; 5 = completely agree) which has been developed following guidelines for test construction [32]. Items of the INF-OV included questions like for instance: "The distance between Madrid and New York is higher that the distance between Madrid and Barcelona". Students with more than three incorrect responses on the INF-OV scale were eliminated from the sample. For this study, a total of 48 participants were excluded based on their responses to the INF-OV scale.

Procedure
Contact with the principals of Compulsory Secondary Education and Vocational Training centers was made by letter or telephone. The administration of the questionnaires was conducted in a collective manner in groups of 15-25 participants, in a classroom within the school timetable. Students were reminded that their participation was voluntary and that their responses would be kept confidential. Written informed consent to participate in the study was obtained from the adolescents. For participants under 18, parents were asked to provide written informed consent in order to allow their children to participate in the study. Participants did not receive any type of incentive for their participation in the study. The administration took place under the supervision of the researchers. The study was approved by the Research and Ethics Committee at University of La Rioja and University of Oviedo.

Data analyses
First, we calculated the descriptive statistics and internal consistency of the RADS-SF scores. Ordinal alpha coefficient for Likert data was calculated as a measure of the reliability of the scores [33]. Ordinal alpha is conceptually equivalent to Cronbach's alpha and it is more adequate for dichotomous and ordinal data. The critical difference between the two is that Ordinal alpha is based on polychoric correlation matrix, rather than Pearson matrix.
Second, we conducted several CFAs at the item level: 1) In the first step, we tested the hypothesis of the RADS-SF being one-dimensional; 2) In addition, as proposed by Szabo et al., (15) we tested the one-dimensional including the error covariance of items 19-20, as both intend to measure the same symptom, items 3-6, and 3-7 as they all belong to the dysphoric mood subdimension. Due to the ordinal nature of the data, we used the Weighted Least Squares Means and Variance adjusted (WLSMV) estimator, and the polychoric correlation matrix.The following goodness-of-fit indices were used: Chi-square (χ 2 ), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA) (and 90% confidence interval), and Weighted Root Mean Square Residual (WRMR). The CFI values greater than .95 are preferred and values close to .90 are considered acceptable, while the RMSEA values should be under .08 for a reasonable fit, and under .05 for a good fit [34,35]. For the WRMR values < 1.0 have been suggested as indicative of adequate model fit [36]. Within the multilevel CFA framework, and due to the hierarchical structure of the data (participants nested in clasrooms), we also studied the intraclass correlation coefficients (ICCs) of the observed variables (items of the RADS-SF) attending to the class level. The ICC assesses the level of variance in the observed variable that is attributable to membership in its cluster. ICC values range from 0.0 to 1.0.
Third, in order to test MI, successive multigroup CFAs were conducted [37]. Basically, a hierarchical set of steps are followed when MI is tested, typically starting with the determination of a well-fitting multigroup baseline model and continuing with the establishment of successive equivalence constraints in the model parameters across groups. The analysed dimensional models can be seen as nested models to which constraints are progressively added. Using Delta parameterization in Mplus, two steps on measuring invariance need to be considered: configural and strong invariance models [38].
Due to the limitations of the Δχ 2 regarding its sensitivity to sample size, Cheung and Rensvold [39] proposed a more practical criterion, the change in CFI (ΔCFI), to determine if nested models are practically equivalent. In this study, when ΔCFI is greater than .01 between two nested models, the more constrained model is rejected since the additional constraints have produced a practically worse fit. However, if the change in CFI is less than or equal to .01, it is considered that all specified equal constraints are tenable, and therefore, it is possible to continue with the next step in the analysis of MI.
Fourth, latent mean differences across gender and educational level groups were estimated, fixing the latent mean values to zero in the male and in the high school students groups. For comparisons among groups in the latent means, statistical significance was based on the z statistic. The group in which the latent mean was fixed to zero was considered as the reference group. Finally, we studied the sources of validity evidences with other external variables. We analysed the correlation between the RADS-SF and the RADS by means of Pearson correlations. SPSS 15.0 [40], Factor 9.2 [41], and Mplus 7.0 [38] were used for data analyses.

Results
Descriptive statistics and internal consistency of the scores Table 1 shows the descriptive statistics referring to the mean and standard deviation for the RADS-SF items by gender and age, and the total sample. The Ordinal alpha for the Total Score was .89, indicating good levels of reliability of the scores. According to gender, Ordinal alpha Internal structure of the RADS-SF: Confirmatory factor analysis The goodness-of-fit indices for the RADS-SF models estimated by means of CFAs are presented in Table 2. As it can be seen, the one dimensional model showed a good fit to the data, with CFI values over .95, and RMSEA values under .08. We also tested a model with the inclusion of error correlation between items 19-20, 3-6 and 3-7 as suggested by Szabo et al. [15]. This resulted in an improvement of the goodness-of-fit indices. Nevertheless, due to the inherent problematic in the use of correlated errors [42], and the fact that the one-dimensional model displayed a good adjust we decided not to take into consideration the correlated errors. The standardized factor loadings for the one-dimensional model, attending to gender, age and the final sample, are shown in Table 3. All the factor loadings, for the RADS-SF were significant, ranging, in the total sample, between .35 (Self-Deprecation, Low Self-Worth) and .82 (Anger, Irritability). With regards to the multilevel CFA study, all the items showed ICC values lower than .10, indicating that hierarchical nature of the data did not have a significant effect in the dimensional structure of the RADS-SF. Thus, the amount of variance attributable to cluster membership was lower than 10%, suggesting that a multilevel analysis is not required.

Measurement invariance of the RADS-SF scores gender and age
Prior to the analysis of MI across gender and age, we tested whether the one-dimensional model showed a reasonable good fit to the data in each group separately. First, the MI of the RADS-SF across gender was tested. The configural invariance model in which no equality constraints were imposed showed an adequate fit to the data. Next, a strong invariance model was tested with the item thresholds and factor loadings constrained to be equal across groups. The results showed configural and strong MI by gender. Next, the MI of the RADS-SF across age was tested. To examine MI by age, the sample was divided into two subgroups (13)(14)(15) yearolds and 16-19 year-olds), according to the early and middle and final stages of the adolescence [43]. The ΔCFI between the constrained and the unconstrained model was under .01, indicating that the model of strong invariance was supported. Hence, the results support strong MI by gender and age.

Latent mean scores across gender and age
For comparisons among groups in the latent means, statistical significance was based on the z statistic. The group in which the latent mean was fixed to zero was considered as the reference group. With regards to the gender, statistically significant differences were found. Specifically, females scored 0.347 units over than males in Depression (0.347; p < .01). The comparison across groups in latent means also revealed statistically significant differences according to age. Thus, the comparison across groups in latent means indicated that, on average, younger adolescents scored 0.111 units under older adolescents in Depression (0.111; p < .05).

Sources of validity evidence with RADS scores
In order to test for convergent validity, Pearson correlation between the total score of the RADS and total score of the RADS-SF was calculated. Results showed a correlation of .91, indicating a high correlation between the two scales.

Discussion
The main purpose of the present study was to study the psychometric properties of the RADS-SF [44] in a large sample of adolescents. We thus estimated the reliability of the scores, examined the internal structure and MI across gender and age of the RADS-SF scores, and gathered different sources of validity evidence with other measures. Results found in the study show that the RADS-SF scores has adequate psychometric properties in this sample and, therefore, that it is a useful instrument that could be used to evaluate self-reported depressive symptoms in non-clinical populations of adolescents. The reliability of the scores, estimated with Ordinal alpha, was adequate, with a value of .88. To the best of our knowledge, no previous studies have estimated the reliability of the RADS-SF scores using the Ordinal alpha. These results are consistent with previous studies, indicating that the RADS-SF scores had adequate levels of internal consistency estimated with Cronbach's alpha. Future studies should analyze the reliability of the RADS-SF scores through Ordinal alpha due to the ordinal nature of the variables. Results of the CFAs showed that the one-dimensional model yielded adequate goodness-offit indices. Similar to the results found in this study, previous research suggested that the RADS-SF has a unidimensional structure [15,16,45]. In addition, and attending to the findings of Szabo et al. [15], the inclusion of different correlated errors was tested. The inclusion of the correlated errors produced an increase in some goodness-of-fit indices. However, we decline its use, due to the inherent problematic in the use of correlated errors [42] and the fact that the model only contained ten items, making the use of correlated errors even more controversial. It is worth noting, that we used the WLSMV estimator for our analysis due to the categorical nature of the data. A four Likert-format response scale, as it is the case of the RADS-SF, should be considered as categorical. Future studies could analyze the internal structure of the instruments considering data as categorical.
Also, the MI of the RADS-SF across gender and age (early adolescent and middle and late adolescent) was tested. The results supported the hypothesis of strong MI by gender and age in our sample. Thus, the RADS-SF showed factorial equivalence by gender and age. These results are somehow similar to other studies that have found total factorial equivalence of the RADS-SF scores across different variables (e.g., gender) [15,45]. For instance, Szabo et al [15] found metric MI in a large sample of New Zealand adolescents across different variables, including gender and age. The other study conducted with the analyzing the structural equivalence of the instrument only analyzed configural invariance across gender and age, so the results found in the present study contribute relevant information.
With this regard, more studies analyzing the MI of the RADS-SF, with data considered as ordinal are needed. It should be stressed, that if MI does not hold, the suggestion is that the validity of such scores should be questioned. As such, it is critical for MI conclusions to be based on statistically sound results. The comparability between different groups only makes sense if it can be guaranteed that participants interpret and understand the latent construct in a similar manner. Hence, from a psychometric point of view, the study of MI is a prerequisite for performing any group comparisons [21,37]. Therefore, if any difference in the latent mean score is found, we can be sure that such difference is a result of a true difference in the latent variable, and not a measurement artifact.
The comparison in the latent means across gender and age yielded statistically significant differences. Females obtained higher scores than males. As a function of age, adolescents between 16 and 19 years obtained higher scores compared to the younger group. Consistent with the previous literature, the expression of depressive symptomatology varies as a function of age and gender [12,25,30,46]. In general terms, the prevalence of depressive symptoms has an earlier onset in females than in males [47] and increases with age, being more frequent in adolescents than in children [48]. Using the raw scores of the RADS subscales or the RADS total score, we found that female adolescents obtained higher scores than males in depressive dimensions except in Anhedonia where males revealed higher scores and the older adolescents also obtained higher scores in comparison to the younger adolescents Results from the analysis of the sources of convergent validity evidence yielded a significant high association, over .90, between the RADS-SF scores and the RADS. These results are consistent with previous studies analyzing criterion validity of the RADS-SF. For instance, Milfont et al. [16] revealed a correlation over .90 between the RADS-SF and the RADS, similarly to the results found in the present work.
The results found in this study should be interpreted in light of some possible limitations. First, the sample was composed exclusively of adolescents. It is well known that adolescence is a developmental period where a great variety of neuromaturational, social, emotional, and self-identity changes occur [49] and that may have an influence in the phenomenological expression of this construct. Second, in this study, information was gathered based solely on self-reports, for which, we consider that it would have been interesting to complete this information with a clinical interview or with a hetero-report administered to the participants' parents. Third, evidence for convergent and discriminant validity were not further explored. With this regard, it might have been useful to include other related and unrelated measures of the severity of depressive symptoms. Finally, fourth, it is also important to note that the item response theory (IRT) modeling could have been used in order to contribute new psychometric approaches in order to check for instance differences between groups such as males and females in target items. Despite the noted limitations, and areas that would benefit of future research, the present study adds more information about different sources of validity and about the reliability of the questionnaire, and identified the fruitfulness of the RADS-SF in order to be used as a measure of depression in non-clinical populations of adolescents.
Future studies could replicate these findings in other samples. Moreover, future research on the MI across cultures in the self-reported version would enable the comparison of results between different countries, regions or cultures with the RADS-SF version.