Mental health among children and adolescents: Construct validity, reliability, and parent-adolescent agreement on the ‘Strengths and Difficulties Questionnaire’ in Chile

The Strengths and Difficulties Questionnaire (SDQ) is a screening tool used to measure psychological functioning among children and adolescents. It has been extensively used worldwide, but its psychometric properties, such as internal structure and reliability, seem to vary across countries. This is the first study exploring the construct validity and reliability of the Spanish version of SDQ among early adolescents (self-reported) and their parents in Latin America. A total of 1,284 early adolescents (9–15 years) and their parents answered the SDQ. We also collected demographic variables. A confirmatory factor analysis was conducted to assess the latent structure of the SDQ. We also used the multitrait-multimethod analysis to separate the true variance on the constructs from variance resulting from measurement methods (self-report vs. parent report), and evaluated the agreement between adolescents and their parents. We found that the original five-factor model was a good solution and the resulting sub-scales had good internal consistency. We also found that the self-reported and parental versions of SDQ provide different information, which are complementary and provide a better picture of the emotional, social, and conduct problems of adolescents. We have added evidence for the construct validity and reliability of the Spanish self-reported and parental SDQ versions in a Chilean sample.

Introduction been evaluated in Chile for the parent-answered version for children between 4 and 11 years old. Therefore, to the best of our knowledge, no studies have explored the construct validity, reliability, and degree of agreement between the SDQ self-report and the parental report for Chilean adolescents. Additionally, we know of no SDQ studies for adolescents from Spanishspeaking, Latin-American countries.
Studies have confirmed the five theoretical dimensions in the adolescent self-reported version and in the parental version of the questionnaire, using exploratory and confirmatory factor analyses [37,38]. However, other studies have failed to replicate the originally postulated five-factor solution [39][40][41][42]. Additionally, they have proposed a three-factor solution [41], combining the conduct and hyperactivity/attention problems as an 'externalizing' dimension and the emotional and peer problems as an 'internalizing' dimension, while keeping the prosocial sub-scale as a separate factor [43]. Furthermore, despite evidence that the five-factor model fits well across gender and ethnic groups for young children [44], a study gathering information from five European countries found that the number of factors could be countrydependent in the case of adolescents [40], and a Norwegian study found that factor loadings were different between pre-adolescents and adolescents [45]. Regarding reliability, some studies have shown adequate internal consistency [46][47][48][49], while others have reported low values for some sub-scales, especially for conduct and peer problems [50].
For adolescents in Spanish-speaking countries in Latin America, we have found no studies exploring the construct validity of the SDQ by means of confirmatory factor analysis (CFA), using parental and self-reported data. As mentioned earlier, we found only two studies using the SDQ in Chile, one exploring the psychometric properties of the parental reports on children between the ages of 4 and 11 [51], and one presenting the results of comparing the scores between early-adolescent Aymara [an indigenous South American nation] and non-Aymara students, using the self-reported, parental, and teacher versions [52].
Therefore, we see a knowledge gap concerning the performance of the SDQ in Spanishspeaking, Latin-American countries, specifically, its construct validity and reliability for adolescent populations. The aims of the present study are: i) to evaluate competing models of the latent structure of the SDQ, using confirmatory factor analysis; ii) to explore the reliability of the resulting sub-scales having the best fit; iii) to compare the degree of agreement between adolescent self-reports and parental reports, and their respective explanatory power; iv) to provide normative data for the SDQ adolescent and parental versions.

Participants
For the purposes of this study, we used SDQ data from two separate studies in similar, schoolaged populations. The first study (Study 1) is being conducted in a vulnerable urban population in San Felipe, a small city north of Santiago. This is an ongoing longitudinal study exploring the factors associated with the development of health-promoting behaviours in early adolescents. As part of the baseline assessment, we administered the SDQ to the students (10 to 15 years old) and the parents. Some preliminary results of the cross-sectional analysis have been published [53]. The second study (Study 2) gathered data from students 9 to 15 years old, aiming to test the validity of the SDQ and of the Chilean-adapted version of the Olweus Bully/ Victim Questionnaire Revised. This preliminary validation study is part of a larger, ongoing research called 'The KiVa antibullying program in primary schools in Chile, with and without the digital game component, a randomized controlled trial' [54]. We decided to present here analytical results from both studies because Study 1 gathered information from low-income families, while Study 2 gathered information from low-, middle-, and high-income families, allowing the latter set of results to be more representative of the adolescent Chilean population as a whole.

Procedure and ethics
In Study 1, we invited all urban, municipal, state-funded primary schools in San Felipe (n = 10) to participate after obtaining authorization from school board authorities. All ten schools agreed to participate. We informed the parents or main caregivers about the study and asked them to sign and informed consent to allow their children to participate. A total of 1,035 parents were contacted, 682 consented and answered the parental questionnaire. On the day of the assessment, 560 students assented to participate and answered the questionnaire (10 did not assent, and 112 were absent that day). A total of 488 parent-child dyads provided complete data. In Study 2, we invited five schools to participate. A total of 1,945 parents or main caregivers were contacted, and 1,068 consented and answered the parent version of the SDQ questionnaire. On the day of the assessment, 913 students assented to participate and answered the questionnaire (50 did not assent, and 105 were absent that day). A total of 796 parent-child dyads provided complete data. See Fig 1: Flow chart. In both studies, the parental SDQ questionnaire was answered by the main caregiver, which most of the time was the mother. Other main caregivers were the father or other significant family member such as grandmother.
Both studies had been approved by the Ethics Committee of the Universidad de los Andes. Written informed consent was obtained from parents and written informed assent from adolescents. The study posed no risks.

Measures
Socio-demographic variables. Sex (0 = male; 1 = female), age (years) and socio-economic status (0 = low-income families; 1 = middle-income; 2 = high-income families) were collected. The socioeconomic status was based on the criteria of the 2009 National System for the Measurement of Education Quality, which gathers information from parents or main caregivers about their household income, and it was collected from the Ministry of Education.
Strengths and Difficulties Questionnaire (SDQ) [55][56][57]. This questionnaire has 25 items, divided into five subscales: emotional symptoms, conduct problems, hyperactivity-inattention problems, peer problems, and pro-social behaviour. These five subscales can be organized into two major sub-scales: strengths (pro-social behaviour) and difficulties (the other four subscales). Each item uses a three-point ordinal format to be answered with one of the following: 0 = not true; 1 = somewhat true; and 2 = certainly true. Five of the items are negatively worded in the original (i.e. is obedient, thinks before acting, has good attention, has good friends, is generally liked). Therefore, for compatibility in combining subscales into major subscales, their scores were reversed. The mean score for each subscale was then calculated (range 0-10). All scores for the difficulties subscales were added up to a total difficulties score (range: 0-40). The scores on the pro-social subscale were analysed independently (range: 0-10). The SDQ has been translated into more than 50 languages [10]. We used the authorized Spanish version and the scoring algorithms proposed by its author (for more information see: sdqinfo. com).

Data analysis
For the purposes of this article, only responders with valid answers on all 25 items were included in the analyses.
Firstly, we summarized the socio-demographic variables and basic psychometric characteristics of the items using descriptive statistics, including means, standard deviations (SD), and, when necessary, frequencies and percentages. We used a structural equation modelling approach to CFA, to assess the structure of the proposed five-factor and three-factor models for the SDQ, in self-reports as well as in parental reports. Multivariate Mardia's coefficients [58] and polychoric matrices were calculated to evaluate the distribution of the items. We ensured the adequacy of the matrices by assessment of the determinant, by the KMO index, and by Barlett's test [59]. We also calculated the internal consistency of each factor by using McDonald's Omega (ω), which can be interpreted as the square of the correlation between the scale score and the latent variable common to all the indicators [60]. The Omega index assumes a congeneric model, which means that factor loadings are allowed to vary, and it also takes into account the item-specific measurement error. Thus, it provides a more realistic estimate of true reliability than classical Cronbach's Alpha values, being that both can be interpreted using the same threshold cut-off points.
We used the unweighted least squares (ULS) method for factor extraction, in view of its robustness [61]. Specifically, The ULS method does not provide inferential estimations based on the χ 2 distribution (and therefore does not provide p-values), but it does not require any distributional assumption; it is robust and usually converges because of its computational efficiency; it tends to provide less biased estimates of the true parameter values than other procedures; and it shows good performance when working with polychoric matrices [62][63][64][65]. From a general perspective, we used we used the fit indices that the ULS reports such as the goodnessof-fit index (GFI), the adjusted goodness-of-fit index (AGFI), the normed-fit index (NFI), and the root-mean-square of the standardized residuals (RSMR). GFI and AGFI refer to the explained variance of the model, and values >0.90 are considered acceptable [66]. The NFI measures the proportional reduction in the adjustment function when going from null to the proposed model and is considered acceptable when >0.90 [67]. The RSMR is the standardized difference between the observed and the predicted covariance, indicating a good fit for values <0.08 [68]. From an analytical perspective, standardized saturations and the explained variance were considered. We also used a CFA approach to MTMM analysis [69]. This approach permits separation of the true variance on the constructs from variance resulting from measurement methods (self-report vs. parental report). The logic is that self-report and parental measures of the same construct should be highly correlated, but that measures of different constructs should have low correlations. We calculated squared factor loadings to estimate the explained variance in the sub-scales resulting from the underlying trait and the reporting method. The unexplained variance was termed uniqueness. We performed the same analyses according to age ( 11 vs >11) and socioeconomic status (Low vs Middle/High income) to assess potential differences.
We calculated the 25 th , 50 th , 75 th , and 90 th percentile scores for each generated sub-scale, for the total sample and for each sex, for both the adolescent and the parental SDQ versions. We also present the normative data for age groups 11 and >11 years old.

Results
The materials used to produce the following results will be available upon request, including a detailed list of documents and all the data files needed in order for replication, as well as every step and the specific sequence the interested researchers should take into account to make data available [70]. Authors will post the referred materials in the group's website, and/or will be send when asked for them [71].

Psychometrics and construct validity of the SDQ
The polychoric matrix of the SDQ items using self-report data had a determinant of 0  Table 2 shows that the CFA fit indices for the SDQ were within acceptable values only in the case of the five-factor model -which in fact was originally proposed from a theoretical point of view-and they were better than in the case of the three-factor model, being that both the corresponding self-report and parental-report indices were adequate in the former.  Table 3 shows the descriptive statistics and McDonald's Omega values for the SDQ items and factors, and Fig 2 the weights and correlations between factors for the CFA both for selfreport and parental-report scores. As we can see, the values and variability of the 'steal' item are low in self-report scores (mean = 0.20, SD = 0.50), but especially so in parental-report scores (mean = 0.04, SD = 0.26). The factorial loadings are adequate, although the 'good friend' item is a low outlier in the case of self-report scores (w = 0.25). In terms of reliability, parental reports are more consistent than self-reports. McDonald's Omega values for self-reports range from 0.65 ('peer problems') to 0.77 ('hyperactivity') and, in the case of parental reports, from 0.76 ('peer problems') to 0.85 ('conduct problems', and 'hyperactivity').
In self-report scores, the correlations among the five constructs are strongest between 'emotional symptoms' and 'peer problems' (r = 0.78), while in parental reports, the strongest correlations are between 'conduct problems' and 'hyperactivity' (r = 0.76). The lowest correlations in self-reports are between 'pro-social behaviour' and 'emotional symptoms' (r = -0.06), while

Trait and method components in the MTMM approach
The CFA approach to MTMM with two method factors and five trait factors has a very good fit to the model ( Table 2).  Table 4 shows the trait and method variance components. As we can see, the trait variance components suggest that selfreports tend to be more discriminating on 'pro-social behaviour' (where they are practically unaffected by method) and 'conduct problems'. On the other hand, parental reports seem to be particularly discriminating on ratings for 'peer problems' (where they are subject to low method effects) and 'emotional symptoms'. The largest uniqueness value in self-reports is in 'emotional symptoms', while in parental reports it is in 'pro-social behaviour'. Finally, we found moderately low correlations between the methods (r = 0.38). We have produced additional results for CFA and MTMM analyses stratified by age ( 11 vs >11) and socioeconomic status (Low vs Middle/High income). These results are available in S1 File. Normative data. Regarding the percentiles of the total difficulties scale, the values are similar between girls and boys, being lower for girls with a difference of one point. See Tables  5 and 6 for the total sample, Tables 7 and 8 for participants aged 11 years old, and Tables 9  and 10 for participants aged >11 years old. The percentiles in the self-reported SDQ are slightly lower than those in the parental SDQ.

Discussion
This is the first study investigating the structure of the SDQ in a Spanish-speaking country in Latin America among adolescents and their parents. The results of our study support the originally proposed five-factor structure of the SDQ among early and middle adolescents and their parents/caregivers in Chile [13]. It appears to be a more plausible solution than the more recently proposed three-factor model [43]. However, we found high correlations between emotional symptoms and peer problems, and between conduct problems and hyperactivity, which may indicate latent, underlying internalizing and externalizing dimensions. Reliability values were in general adequate both for self-report and parent-report measures for all dimensions, although they were fair in the peer problems factor for self-report measures, being appropriate for parent-reports. Exploring the structure of the SDQ stratified by age and socioeconomic status, the best fit was found in the parental report from middle/high households, and the worst fit was found in the self-report of students from low income households. The MTMM model was good in all strata. The correlation between self-report and parent report was similar among younger and older students, and among students coming from low income and middle/high households. Emotional problems were better explained by parental report among older students. While peer problems were better explained by parental report among students from middle/high households.
Some strengths of this study are the sample size, considering the challenge of collecting information from students and parents, and the representation of different socioeconomic backgrounds. Furthermore, normative data are provided, which may help future research to test cut-off points for determining the needs of adolescents with higher scores in difficulties sub-scales. This study, in line with most studies on SDQ, has been conducted in a populationbased sample.
The factor loadings seem to be dissimilar between adolescents and parents. The factor loadings from the parents are higher than those from adolescents, which may indicate that adolescents experience their problems less distinctly than do their parents. We also notice that, in the case of adolescents, the lower, but still adequate, factor loadings were found in the negatively worded items. Furthermore, there was one item with a factor loading lower than de recommended cut-off point (<0.32) [72], namely 'good friends' (0.25). This methodological effect has been found in the SDQ previously [40,73], and in other instruments [74,75]. In our study, these results may be explained by the cognitive development of people at this age (9-15 years old), who may have had difficulty understanding the answer format or the direction of the intercalated questions. However, the results from parents all exceeded the recommended loading thresholds. Given the good factor structure, we recommend keeping these items for both adolescents and parents. However, in future research in Spanish-speaking countries, it will be important to report psychometric information bearing on whether to reconsider this recommendation, and to explore re-wording of the reverse items and assessing the effect on reliability. Additionally, we found from the MTMM approach that the variance associated with the factor of methodological measure (self-report vs. parental report) was lower for hyperactivity problems (self-report), peer problems (parental report), and pro-social behaviour (both versions). However, for the remaining dimensions, the methodological component of the variance was high, suggesting the importance of having multi-informants when assessing psychopathology among adolescents. For example, when we see that the loading weight of the item 'steals' has half the weight in the parental report that it does in the self-report, we suspect that parents underestimate this symptom. This phenomenon is also found for other items such as 'lies' and 'fights', where adolescents may provide a better description of what they are doing than do parents. This study has several limitations. Firstly, we did not have access to the entire range of ages addressed by the SDQ; therefore, our results are limited to the population studied here. Even though we have collected information from a large group of students and their main caregivers, there were many absent students the day of the survey, especially among low-income Table 7. Normative data for total scores by sex based on adolescent self-report, age group 9-11.

Percentiles Emotional symptoms
Conduct problems Hyperactivityattentional problems Boys   25 th  1  1  1  1  1  1  2  2  2  1  1  1  7  7  7  7  7  6   50 th  3  3  3  2  1  2  4  3  4  2  2  2  11  11  12  8  8  schools. Additionally, we could not access teacher reports to obtain a fuller picture of student behaviours. Several studies have shown the importance of having several informants to investigate students' behaviours [76]. Even though our findings support a five-factor structure for the SDQ, it is possible that this instrument requires inversion of the wording of some of the items to improve understanding, especially among adolescents, which in turn may increase the reliability of some of the sub-scales. The usefulness of the normative data provided here is transitory; this data must be updated when we have better cut-off scores as a result of studies tapping both community and clinical populations. Therefore, next steps should be to explore diagnostic predictions made with the SDQ in Spanish-speaking countries in Latin America. Finally, the use of simple and short tools such as the SDQ may help to better investigate and understand the evolution of these symptoms during adolescence, and to explore mechanisms explaining the observed sex and cultural influences. Validity and reliability of the SDQ for adolescents in Chile