Validity and Reliability of the Strengths and Difficulties Questionnaire in 5–6 Year Olds: Differences by Gender or by Parental Education?

Introduction The Strengths and Difficulties Questionnaire (SDQ) is a relatively short instrument developed to detect psychosocial problems in children aged 3–16 years. It addresses four dimensions: emotional problems, conduct problems, hyperactivity/inattention problems, peer problems that count up to the total difficulties score, and a fifth dimension; prosocial behaviour. The validity and reliability of the SDQ has not been fully investigated in younger age groups. Therefore, this study assesses the validity and reliability of the parent and teacher versions of the SDQ in children aged 5–6 years in the total sample, and in subgroups according to child gender and parental education level. Methods The SDQ was administered as part of the Dutch regularly provided preventive health check for children aged 5–6 years. Parents provided information on 4750 children and teachers on 4516 children. Results Factor analyses of the parent and teacher SDQ confirmed that the original five scales were present (parent RMSEA = 0.05; teacher RMSEA = 0.07). Interrater correlations between parents and teachers were small (ICCs of 0.21–0.44) but comparable to what is generally found for psychosocial problem assessments in children. These correlations were larger for males than for females. Cronbach’s alphas for the total difficulties score were 0.77 for the parent SDQ and 0.81 for the teacher SDQ. Four of the subscales on the parent SDQ and two of the subscales on the teacher SDQ had an alpha <0.70. Alphas were generally higher for male children and for low parental education level. Discussion The validity and reliability of the total difficulties score of the parent and teacher SDQ are satisfactory in all groups by informant, child gender, and parental education level. Our results support the use of the SDQ in younger age groups. However, some subscales are less reliable and we recommend only to use the total difficulties score for screening purposes.


Introduction
Early detection and treatment of emotional and behavioural problems in childhood may lead to considerable benefits regarding child development, wellbeing, and health [1]. To detect these problems, valid and reliable screening instruments are needed.
The Strengths and Difficulties Questionnaire (SDQ) is a relatively short instrument developed to screen for emotional and behavioural problems in children aged 3-16 years [2]. The SDQ is a 25-item questionnaire with three response categories from zero to two (not true, somewhat true, and certainly true). Of all 25 items, 15 are negatively phrased and 10 are positively phrased. The questionnaire has five subscales of five items each: emotional problems, conduct problems, hyperactivity/inattention problems, peer problems, and prosocial behaviour. The sum of the first four subscales provides a total difficulties score; a high score being less favourable. The prosocial scale provides information on protective factors of the child; a low score is less favourable. The items and scores are shown in the supporting table S1. Versions of the SDQ are available for parents and teachers, and children aged 11-16 years can complete an almost identical version. To facilitate proper screening by the preventive health care a short, easy to use, and validated instrument is needed.
The SDQ has been applied and evaluated in many countries, and seems to be a suitable instrument to detect emotional and behavioural problems in secondary school aged children [3]. Although the SDQ was developed for children aged 3 years and older, few evaluations have been made in children under 7 years of age [4][5][6][7][8][9][10]. Because different phases of a child's development coincide with age-specific problem behaviour [11], some items in the SDQ might be less applicable or more difficult to interpret in younger children.
Most studies targeted at young children explored the factor structure of either the parent or the teacher version of the SDQ. Five and three factor solutions have been reported [7][8]12]. Furthermore, different patterns were found for item loadings by gender [7][8] but not for item loadings by parental education level [8]. Although the factor structure was invariant between groups based on parental education level, the reliability may differ between groups. Some studies reported moderate to strong internal consistency but not for all SDQ subscales [4,[7][8][9]12]. External validity of the parent version has shown good results [5,10]. The interrater correlation between the parent and teacher versions of the SDQ has been investigated only once [12]. Thus, although the few studies that investigated 5-6 year old children elucidated different aspects of the validity and reliability of the SDQ [4][5][6][7][8][9]12], the overall picture remains fragmented.
In order to use the SDQ as an early detection instrument in children aged 5-6 years, more data are needed on the validity and reliability of the SDQ in this age group. Therefore, the aim of this study is to determine if the SDQ is a reliable and valid instrument for detecting emotional and behavioural problems in children aged 5-6 years. Data for this study were gathered as part of a regular preventive health care check of a large population sample. The Child Behavior Check List (CBCL) and corresponding Teacher Report Form (TRF) were administered in a subsample of participants to enable comparisons between the SDQ and CBCL/TRF. The CBCL and the TRF are widely used and well validated instruments for assessing emotional and behavioural problems and both contain eight syndrome scales: Anxious/ Depressed, Withdrawn/Depressed, Somatic Complaints, Psychiatric Problems, Rule-Breaking Behaviour and Aggressive Behaviour, Attention Problems, and Social Problems [13]. The scales are comparable to the SDQ scales emotional problems (CBCL/TRF scales Anxious/Depressed, Withdrawn/Depressed, Somatic Complaints), conduct problems (CBCL/TRF scales Rule-Breaking Behaviour and Aggressive Behaviour), hyperactivity/inattention problems (CBCL/TRF scales Attention Problems), and peer problems (CBCL/TRF scales Social Problems). Although, the CBCL/TRF is well validated, it has several disadvantages for use in the preventive health care setting. For example, the questionnaire is long (118 questions), it contains only negative formulated questions, and it was developed for use in a clinical setting.
To consider the SDQ as a reliable and valid instrument in young children, we hypothesize the following: 1. The original five-factor structure of the SDQ can be reproduced in a sample of parents and teachers of 5 to 6 year old children. 2. The degree of agreement between the parent and teacher report in young children is higher or comparable to what is generally found for psychosocial problem assessments in children, namely a Pearson r of 0.27 [14]. 3. The internal consistency of the total difficulties score and the subscales for the parent and teacher SDQ is at least 0.7 as recommended for screening instruments intended for use in groups and individuals [15]. 4. The degree of agreement of the SDQ total difficulties score and subscales with the corresponding scales of the CBCL and Teacher Report Form (TRF) is larger than 0.4 [16] and larger than for all other scales (concurrent validity). The degree of agreement of the SDQ total difficulties score and subscales with the opposite scales of the CBCL and Teacher Report Form (TRF) is zero or negative (divergent validity). 5. The validity and reliability of the parent and teacher versions of the SDQ are similar in subgroups by child gender and parental education level.

Ethics Statement
Non-identifiable data gathered as part of the usual governmental preventive healthcare program were used. Informed consent was obtained from parents for all questionnaires that were gathered in addition to the usual practice (CBCL and TRF). This study was approved by the Medical Ethics Committee of the Erasmus University Medical Center Rotterdam, the Netherlands. This study was conducted according to the Declaration of Helsinki code of ethics.

Data Collection
In the Rotterdam-Rijnmond area, the SDQ is routinely administered to parents and teachers as part of the preventive health check for children in grade 2 at elementary school (5-6 year olds). This assessment is routinely provided to all children in this age group as part of the Dutch preventive child healthcare program. The Dutch preventive child healthcare program offers child immunization programs as well as screening assessments for children from 0 to 19 year olds. Screening assessments are offered at 14 stages of a child's development. At each screening, the physical health and psychosocial health of the child are assessed by a specially trained nurse or doctor. A total of 11,987 children were eligible for the preventive health check in the school year 2008-2009. In this study, we only included children of Dutch origin to limit any cross-cultural bias as ethnic background was correlated to parental education level in the present study. In accordance with the classification system used by Statistics Netherlands, we classified a child as being Dutch when both parents were born in the Netherlands [17]. Parents provided questionnaire information on 4,750 (85%) children and teachers provided information on 4,516 (84%) children. The sample consisted of 2,808 males (51%) and 2,706 females (49%). Mean age was 5.3 (SD 0.52) years. There were no differences in child age by gender (p,0.05). Parental education level was low in 13%, middle in 36% and high in 51% of the parents. There were no differences between child gender or age by parental education level (p,0.05) ( Table 1). Nonresponse in parents was more likely when children had an elevated score on the total difficulties score of the teacher SDQ (p,0.05, eta = 0.09). Non-response in teachers was more likely when parental education was middle to high (p,0.05, eta = 0.03).
Parents and teacher of a sub sample of children were invited to fill out the CBCL/TRF in addition to SDQ. This sample was selected in two ways: one part consisted of a random selection of children and the other part consisted of children with an SDQ score above the 90 th percent cut-off (p90) of 14 on the parent report or 13 on the teacher report of the SDQ. These cut offs were based on a pilot study among children eligible for a preventive health check for children in grade 2 at elementary school in the Rotterdam-Rijnmond area. In addition to the SDQ, parents of 397 children completed the CBCL and teachers of 517 children completed the TRF. Although there were differences in child age, child gender and total difficulties score of the parent and teacher SDQ between children with and without a CBCL, the effect size was small (age 2 = 0.005, gender 2 = 0.001, and total difficulties score parent 2 = 0.014 and teacher 2 = 0.008). There were differences between children with and without a TRF for child age, level of parental education and total difficulties score of the parent SDQ, but effect sizes were small (age 2 = 0.014, parental education level 2 = 0.016 and total difficulties score 2 = 0.001).

Measures
The official Dutch version of SDQ was administered to parents and scored in the standard manner [18]. SDQ items and scores are shown in supporting table S1. A sub sample of parents and teachers received the CBCL/TRF [13].
Socio-demographic characteristics included child gender, child age and educational level of the parents. Parental education level was recorded as the parent with the highest education level. This was used to divide the sample into three educational levels: low (no education, primary education, or pre-vocational education), middle (secondary or vocational education), and high (bachelor or master's degree).

Statistical Analyses
All analyses were performed with SPSS 19.0 (SPSS Inc. 2010). Differences between parent and teacher mean scores were analyzed with a paired-sample t-test. Differences between mean scores of males and females and subgroups by parental education level were analyzed in two separate ANOVA's with post-hoc test Games Howell because equal variance and equal group sizes were not present.
Confirmatory factor analysis was carried out to examine the factor structure of the SDQ. We used the software package MPLUS, version 4.2 [19]. Because the measurement level of the SDQ items is ordered-categorical, the weighted least squares estimator with a mean and variance adjusted chi-square statistic (WLSMV) was used [19]. For the teacher report, the COMPLEX procedure in MPLUS was used. Because children are nested within classes within schools, the data have a multilevel structure and cannot be considered as independent. Model fit was evaluated within multiple indicators of model fit, namely the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), and the root mean square error of approximation (RMSEA). Values of CFI above 0.95 are preferred [20] but should not be lower than 0.90 [21]. Values of RMSEA lower than 0.05 are preferable but values between 0.05 and 0.08 are indicative of fair fit [22].
Interrater agreement between parents and teachers was determined with intra-class correlations (ICC) using a two-way random effect model with absolute agreement [23] and Pearson correlations. An ICC above 0.75 was considered excellent, an ICC from 0.75 to 0.40 as moderate to good, and an ICC below 0.40 as poor [16]. Differences between correlations of all subgroups were analyzed by means of the Fisher R to Z transformation [24]. A Pearson r of 0.27 or higher is comparable to what is generally found for psychosocial problem assessments in children [14].
The internal consistency of the different SDQ scales was determined by the Cronbach's alpha coefficient. A Cronbach's alpha of at least 0.7 is recommended for screening instruments intended for use in groups and individuals [15]. Differences between Cronbach's alphas of all subgroups were analyzed by calculating F-statistics [25].
Concurrent validity and divergent validity of the parent and teacher SDQ were assessed by calculating the Pearson correlation between the SDQ and CBCL and the SDQ and TRF. The hypothesis for concurrent validity was that the emotional symptoms scale of the SDQ has higher correlations with the Internalizing, Anxious/depressed, Withdrawn/depressed, and Somatic complaints scale of the CBCL and TRF than all other scales. Furthermore, a higher correlation was hypothesized between the conduct problem scale of the SDQ with the Externalizing, Rule-breaking, and Aggressive scale of the CBCL/TRF, between the hyperactivity scale of the SDQ and the Attention problem scale of the CBCL/TRF, between the peer problem scale of the SDQ and the Social problem scale of the CBCL/TRF than all other scales. Finally, a high correlation was hypothesized between the total scales score of the SDQ and CBCL/TRF. For divergent validity, a negative association between the prosocial scale of the SDQ and all scales of the CBCL/TRF was hypothesized. Furthermore, a low correlation was hypothesized between the emotional symptoms of the SDQ with the externalizing subscales of the CBCL and TRF subscales, and a low correlation between the conduct problem scale, the hyperactivity scale of the SDQ, and the internalizing subscales of the CBCL/TRF scales.
All analyses were repeated separately for each subgroup by gender and by parental education level. Table 2 presents mean scores and p90 cut-offs for parent and teacher ratings for the total group, by gender, and by parental education level. Teachers reported a lower level of psychosocial problems than parents for all scales did (all significant at p,0.01). Parents and teachers reported a significantly higher level of difficulties in males than in females on the total difficulties score and on four of the five subscales (p,0.05). Parents and teachers reported a significantly higher level of difficulties on the total difficulties score and on four of the five subscales in children with low parental education level than all groups by parental education level (p,0.05).

Factor Structure
Confirmatory factor analyses in 4325 complete cases with parent data and 4314 complete cases with teacher data tested whether the theoretical 5-factor model of the SDQ was confirmed, namely emotional problems, conduct problems, hyperactivity/ inattention problems, peer problems, and prosocial behaviour. Fit indices for the parent report approached the preferred levels (x 2 = 2249.57, p,0.001; CFI = 0.88; TLI = .92; and RMSEA = 0.05). Also, the fit indices for the teacher report approached the preferred levels (x 2 = 1402.83, p,.001; CFI = 0.89; TLI = .95; and RMSEA = 0.07) ( Table 3).

Interrater Correlations
Interrater agreement between parent and teacher SDQ scores was determined with intra-class correlations (ICC) and Pearson correlations for all children for which a parent and a teacher report were present (n = 3,718). Correlations (ICC and Pearson) between the parent and teacher scores of complete cases in the total population were significant for all scales. The total difficulties and hyperactivity scale had an ICC $0.4 (p,0.001). Total difficulties score and three of the five subscales had a larger Pearson correlation than the meta-analytic mean of 0.27 [14] ( Table 4).

Internal Consistency
Cronbach's alphas were calculated for each subscale. Cronbach's alphas for the total difficulties score and hyperactivity scale of the parent SDQ in the total population were $0.7. Cronbach's alphas for total difficulties score and three of the five subscales of the teacher SDQ in the total population were $0.7 (Table 5). Note: * = significant difference across gender p,0.05; a = significant difference between low and middle level at p,0.05; b = significant difference between low and high level at p,0.05; c = significant difference between middle and high level at p,0.05. doi:10.1371/journal.pone.0036805.t002 Table 3. Goodness-of-fit indices of the SDQ by gender and by parental education level. Cronbach's alphas did not improve substantially when items were deleted in both the parent and teacher version.

Concurrent and Divergent Validity
For all cases in which the SDQ and either the CBCL or TRF was present, concurrent and divergent validity of the parent and teacher SDQ were assessed by calculating the Pearson correlation between the SDQ and CBCL subscales and the SDQ and TRF subscales. Generally, the hypothesized pattern of correlation coefficients for concurrent and divergent validity between the parent/teacher report of the SDQ and CBCL/TRF was present. However, the emotional problems scale of the parent SDQ also had a substantial correlation with the CBCL's thought problems subscale. The emotional symptoms scale of the teacher SDQ had a low correlation with the somatic complaints subscale of the TRF. Furthermore, the peer problem scale of both reports also showed substantial correla-tions with other CBCL/TRF scales than was hypothesized ( Table 6).

Scale Differences by Child Gender and by Parental Education Levels
Factor structure. When confirmatory factor analyses were performed for each group separately, the original five-factor structure of the SDQ was confirmed and fit indices approached the preferred levels in all subgroups by gender and by parental education level (Table 3).
Interrater correlations. The R to Z transformation showed that the ICCs for the total difficulties score and three of the four subscales were significantly higher for males than females (Table 4). In females, none of the scales had a moderate ICC and only two of the five subscales had a higher correlation than the meta-analytic mean ( Table 3). The ICC for the prosocial behaviour scale was larger in low parental education compared to middle parental  education level (p,0.05) but for all other scales there were no significant differences (Table 4).
Internal consistency. Calculation of the F-statistics between Cronbach's alphas for males and females showed that the alphas of the SDQ parent version were higher for males than females for conduct problems, hyperactivity, prosocial behaviour, and total difficulties score (p,0.05). For the SDQ teacher version, almost all Cronbach's alphas were higher for males than females (0.05) ( Table 5). Cronbach's alphas did not improve substantially when items were deleted.
By calculating the F-statistics between Cronbach's alphas for low, middle, and high parental education level, it showed that alphas for peer problems and the total difficulties score of the parent SDQ were higher for low parental education than for both other groups (p,0.05). The alpha for hyperactivity of the parent SDQ was higher among low parental education level than high parental education level (p,0.05). With the exception of emotional symptoms and impact score, alphas of the teacher SDQ for low parental education were generally higher than middle or high parental education level (p,0.05) ( Table 5). For all groups, alphas did not improve substantially when items were deleted.
Concurrent and divergent validity. When Pearson correlations between the SDQ and CBCL/TRF were calculated for each subgroup by gender and separately by parental education level, the pattern for males and females appeared to be similar to the total population. Only for females did the emotional problems scale of the parent SDQ also have substantial correlations on the externalizing scale of the CBCL (data not shown).
The pattern for subgroups by parental education level was similar to that in the total population (data not shown).

Discussion
The present study, conducted in a community sample of Dutch children aged 5-6 years, is the first study, as we know, to investigate the psychometric properties (factor structure, interrater reliability, internal consistency, and concurrent and divergent validity) of the parent and teacher SDQ with an additional focus on differences by child gender and by parental education level. The results show that, in general, reliability and validity of the parent and teacher version of the SDQ in this age group are satisfactory, but there are concerns regarding reliability of the subscales. The reliability and validity of the teacher SDQ is better in all samples than the parent report, and both versions of the SDQ perform slightly better in males and in children of parents with a low education level.
Mean SDQ scores for males and the sub-group with low parental education level were less favourable than for all other subgroups. This is in line with other reports [4,9,12,18,[26][27][28][29][30]. Furthermore, other studies found higher mean scores in younger children compared with older children [18,26,31]. In the present study, mean scores were also higher compared to a group of Dutch children aged 10-14 years [30]. It seems that SDQ mean total difficulties scores are slightly higher and, consequently, less favourable for younger children than for older children.
The original five factor structure of the SDQ, as hypothesized by Goodman et al. [2], was reproduced in a sample of parents and teachers of 5 to 6 year old children. This five-factor model was also confirmed when the data was split by child gender and by parental education level. This is in line with other research [8,10]. Van Leeuwen et al. [10] also tested a three-factor solution, but this did not improve model fit. Additional analyses in our population using a three-factor solution also did not show improved model fit.
Interrater agreement was acceptable for the total difficulties score and three subscales in the total sample and in the sub samples by gender and parental education level, but not for the conduct problem and prosocial behaviour scale. This is inline with research among older children [3,10]. It is possible that these behaviours are more difficult to observe and rate for parents, for example, because teachers see children interact more with other children in the classroom. Another explanation is that these behaviours are more influenced by the setting (e.g classroom versus at home) or that subjective norms of parents of and teachers differ more on these types of behaviour.
Internal consistency for the total difficulties score and the hyperactivity/inattention scale of the parent SDQ and teacher SDQ was acceptable. Internal consistency of the parent SDQ was not acceptable for the four other subscales. Internal consistency for the teacher SDQ was generally higher than for the parent SDQ. Only the alpha of the conduct problems and peer problems scales of the teacher SDQ was lower then 0.7. In the present study, a similar pattern was found by gender and by parental education level. Our findings are comparable to studies on older children where weighted mean alphas for almost all subscales of the parent SDQ were smaller than 0.7 and weighted mean alphas for the teacher SDQ on conduct problems and peer problems were lower than 0.7. [3]. Because the scales contain just five items, it should be kept in mind that scales with a small number of items are generally less reliable than scales with more items [32]. Another explanation for smaller reliability of the subscales is that the items are less onedimensional than assumed. For instance, the conduct problems scale asks about aggressive behaviour as well as rule-breaking behaviour.
For all scales except the peer problems scale, concurrent and divergent validity of the parent and teacher SDQ was acceptable and implies that, as hypothesized, the SDQ scores correlate with CBCL/TRF scores. However, our data should be interpreted with caution due to the small sample sizes in the subgroups by gender and by parental education level. The concurrent validity found in this study is slightly lower than that found by Goodman et al. [5] but is similar to that found in children aged 8-16 years in the Netherlands [18] and in children aged 5-8 years in Flanders [5,12].
Finally, there were differences in validity and reliability between subgroups by gender and by parental education level. The outcomes of reliability and validity measures of the parent and teacher SDQ are better in males than females. When analyzed by parental education level, we found better internal consistency for parents with a low education level. However, differences between gender and parental education level were small and conclusions on the acceptability of the psychometric properties stayed the same for all subgroups.
It should be acknowledged that the present study has a few shortcomings. First, among parents non-response was more likely when children had an elevated score on the total difficulties score of the teacher SDQ (p,0.05). It is possible that these children were already receiving care and the parents did not wish to participate in this study; however, the effect size was very small and did not influence our results (Eta = 0.09). Teacher nonresponse was higher when parental education was middle to high. Parents were allowed to raise objections about scores on the teacher report; perhaps higher educated parents are more likely to raise objections than lower educated parents. Also, higher educated parents gave their children lower total difficulties scores than low educated parents. However, the effect size was again small (Eta = 0.03). Also, because no measure was included to validate the prosocial behaviour scale, we could only investigate the divergent validity and not the concurrent validity of this positively phrased subscale. Finally, because this study did not include a retest, the test-retest reliability could not be investigated.
A strength of the study is the large sample of young children for whom parent and teacher versions of the SDQ (including the impact scale) were available. This large sample was compiled in the preventive youth healthcare setting; therefore, the questionnaires (as filled out by parents and teachers) were used in the preventive child healthcare system and were not anonymous. Theoretically, this could have caused lower or higher mean outcomes, interrater agreement, and reliability than in the case of an anonymous questionnaire. Thus, generalizing our findings to an anonymous research setting probably requires caution. Finally, our study was conducted in a sample of Dutch children only. Reviews indicate that the reliability and validity of the SDQ in Western countries is comparable [3,33]. Although Dutch children seem to have lower mean scale scores, we expect that our results can be generalized to other young populations.
In general, reliability and validity of the total difficulties score of the SDQ were satisfactory in a population of parents and teachers of young Dutch children. Overall it seems that reliability and validity were comparable to findings in populations of older children; however, as also found in older children [3], concerns remain regarding the reliability of the subscales. Because most subscales have low internal consistency and some subscales have low interrater agreement, we recommend using only the total difficulties score for screening purposes. This means that child health professionals should only use the total difficulties score as an indicator for psychosocial problems and not the individual scores on the subscales. The subscales could be further explored in their consult with the parent and child to get an indication of the kind of problems if necessary. For epidemiological studies or outcome measures in research, we recommend only using the total difficulties score. Additionally, because of the low interrater agreement we recommend to use the parent and the teacher report in combination, because this gives a more complete picture of the child's psychosocial well-being.
Since we found similar validity and reliability in subgroups by gender and parental education level the SDQ is suitable for large screening programs in the general population. To use the SDQ as a screening tool, cut offs are needed. For Dutch children aged 7 to 12 years old cut offs are available. As our findings indicate that mean scores for young children are higher than for older children we recommend to define separate cut offs for young children as is available for British, Australian and American children [34].
In conclusion, the validity and reliability of the total difficulties score of the parent and teacher SDQ are satisfactory in all groups by informant, child gender, and parental education level. Our results support the use of the SDQ in younger age groups. However, some subscales are less reliable and we recommend only to use the total difficulties score for screening purposes.