An important characteristic of a screening tool is its discriminant ability or the measure’s accuracy to distinguish between those with and without mental health problems. The current study examined the inter-rater agreement and screening concordance of the parent and teacher versions of SDQ at scale, subscale and item-levels, with the view of identifying the items that have the most informant discrepancies; and determining whether the concordance between parent and teacher reports on some items has the potential to influence decision making. Cross-sectional data from parent and teacher reports of the mental health functioning of a community sample of 299 students with and without disabilities from 75 different primary schools in Perth, Western Australia were analysed. The study found that: a) Intraclass correlations between parent and teacher ratings of children’s mental health using the SDQ at person level was fair on individual child level; b) The SDQ only demonstrated clinical utility when there was agreement between teacher and parent reports using the possible or 90% dichotomisation system; and c) Three individual items had positive likelihood ratio scores indicating clinical utility. Of note was the finding that the negative likelihood ratio or likelihood of disregarding the absence of a condition when both parents and teachers rate the item as absent was not significant. Taken together, these findings suggest that the SDQ is not optimised for use in community samples and that further psychometric evaluation of the SDQ in this context is clearly warranted.
Citation: Vaz S, Cordier R, Boyes M, Parsons R, Joosten A, Ciccarelli M, et al. (2016) Is Using the Strengths and Difficulties Questionnaire in a Community Sample the Optimal Way to Assess Mental Health Functioning? PLoS ONE 11(1): e0144039. https://doi.org/10.1371/journal.pone.0144039
Editor: Hanna Christiansen, Philipps University Marburg, GERMANY
Received: June 13, 2015; Accepted: November 12, 2015; Published: January 15, 2016
Copyright: © 2016 Vaz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: This project was funded by a Doctoral scholarship provided by the Centre for Research into Disability and Society and the School of Occupational Therapy and Social Work, Curtin University, Perth, Australia. It was part of a larger study that was awarded the 2007 Social Determinants for Health Research award by Healthway Australia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Methodological difficulties in the assessment of mental health problems in adolescence
Mental health problems are relatively common in children and youth. More than 75% of all cases of severe mental illnesses are estimated to occur prior to the age of 25 years [1,2]. Australian estimates suggest a prevalence of 14% mental illness in the 4–17 year age bracket ; only one in four of the identified cases were receiving professional help . Mental disorders account for around 22% of all disability-adjusted life years lost in established market economies such as Australia . Early detection of mental health problems in children and youth is crucial, as evidence shows that left undetected, mental health problems tend to increase in severity with age and could be antecedents of chronic, complex, disabling and expensive complications in adult life [1,5–7]. Current screening methods rely on children and youth displaying certain symptoms, or impairments in everyday functioning, in order to identify them to be at risk and in need of further evaluation and potential treatment .
Children commonly rely on adults in their close environment to recognise their mental health problems ; the most common adults being their parent or teacher. Parent-reported barriers to accessing children’s mental health care can be categorised into: (i) structural barriers (e.g., lack of providers, long waiting lists, insurance or monetary constraints, transportation problems); (ii) identification barriers (i.e., parents’, teachers’, and medical care providers’ inability to identify children’s need for mental health services; denial of the severity or need for treatment of problem); and (iii) barriers related to perceptions about mental health services (i.e., stigma related to seeking help, lack of trust in or negative experience of service providers) [9,10]. Furthermore, parents who reported barriers were more likely to have parent stressors, schedule constraints, and to be divorced compared with parents who did not report barriers. Given these barriers, parents frequently seek teachers’ opinions of their child’s mental health functioning prior to contacting formal health care services . Consequently, teachers have been recognised as an increasingly important stakeholder in detecting mental health problems in children and supporting child mental health [12,13].
Available research examining teachers’ abilities to detect mental health problems in their students suggest that teachers tend to have low confidence in their ability to recognise and support students’ mental health problems and their knowledge base of mental health [12–14]. Moreover, teachers tend to seek help for students with behaviours that are disruptive to other students and as a result affect their academic performance, rather than students with internalising problems [11,14]. Studies that compare parents’ abilities with teachers’ abilities to detect mental health problems in children suggest that parents usually rate their child’s problems as more important and severe than teachers . As a result, mental health professionals tend to regard teachers’ reports on hyperactivity, for example , and mothers’ report on conduct and internalising problems to be most reliable.
To date, in the absence of gold standard measures for assessing mental health problems in children and youth, a multi-informant multimodal approach is couched as best-practice [17–22]. The literature has consistently demonstrated that informants are inconsistent in their assessment of child and adolescent mental health functioning, irrespective of the method of clinical assessment [17,18,23–26]. For example, a recent review on the psychometric properties of one of the most widely used mental health screening tool in children and youth, namely the Strength and Difficulties Questionnaire , reported poor to moderate weighted mean parent-teacher (inter-rater) agreement correlations (total difficulties = 0.44; hyperactivity/inattention = 0.47; emotional symptoms = 0.28; conduct problems = 0.34; peer problems = 0.35) . Even when attempts were made to reduce informant discrepancies through mitigation by a senior clinician; ratings have, at best, resulted in modest levels of agreement (r = 0.19–0.52).
Disagreements between parents’ and teachers’ ratings of a child’s behaviour could be explained by the fact that children behave differently in different contexts [15,30]. Hence, these discrepancies may reflect variation in the circumstances under which the child expresses disruptive behaviour symptoms . It is also likely that parents and teachers use different benchmarks when evaluating these behaviours. For example, teachers’ ratings may be influenced by the level of difficulties experienced by the child in relation to those of other children in the class, whilst comparisons with siblings might have more bearing on parents’ ratings. Also, teachers are exposed to a large number of children and hence, a much wider and diverse comparison group . Teachers may therefore be better equipped than parents to distinguish behaviours that are symptomatic of mental health problems. Thus, informant discrepancies may reflect biases in reporting, measurement error, or variability in symptomatology across settings .
The pattern of agreement-disagreement between parents’ and teachers’ ratings of a child’s mental health can provide a more holistic description of the child as it combines different views . If so, the pattern of agreement-disagreement may give an insight into how each informant provides multidimensional information that reflects the child’s functioning in different contexts. Further research into the pattern of agreement-disagreement between informant ratings at item response level could shed some light into the cause of discrepancy . The current study set out to critically examine the pattern of agreement-disagreement between parents’ and teachers’ ratings of early adolescents’ mental health functioning using a community used screening tool—the Strength and Difficulties Questionnaire (SDQ) .
The tool at the centre of enquiry: The Strength and Difficulties Questionnaire (SDQ)
The Strength and Difficulties Questionnaire (SDQ) is a short, user friendly, easy to use measure of competencies and problem behaviours of children and youth [27,28]. The SDQ items and subscales were developed with reference to the main nosological categories recognised by contemporary classification systems of child mental disorders such as the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM- IV)  and the International Classification of Diseases, 10th edition (ICD—10) . The questionnaire consists of 25 screening items that measure both psychosocial problems (i.e., emotional problems, conduct problems, hyperactivity–inattention, and peer problems) and strengths (i.e., prosocial behaviour) in children and youths aged 3–16 years [27,33,34]. It has an impact supplement that assesses chronicity, distress, social impairment and burden to others. Three dimensions of impact can be calculated; namely, perceived difficulties (is there a problem), impact score (distress and social incapacity on the child), and a burden rating (do symptoms impose a burden) .
The SDQ uses a multi-informant approach and is suitable for use in studies involving general community populations, in which the majority of children are healthy. Having multiple informants reporting on the SDQ is valuable due to the situational nature of psychosocial problems, particularly in children [35–37]. There are informant-rated versions, which can be completed by either the parents or teachers of children and adolescents aged 2–4 years, 4–10 years and 11–17 years; and a self-report version, which can be completed by adolescents aged 11–17 years. The present study employed the parent and teacher SDQ and impact supplement for 11–17 year olds.
Psychometric quality and utility of screeners–understanding relevant indices.
Although the psychometric quality of mental health screeners has yet to be evaluated, quality measures should demonstrate adequate reliability and validity [8,38]. A recent literature review drawing on the psychometric properties of the parent and teachers versions of the SDQ in 4 to 12-year olds reported satisfactory pooled internal consistency for the total difficulties score (parent: α = 0.80; N = 53,691; teacher: α = 0.82; N = 21, 866) . All subscales of the parent version reported internal consistency values below the recommended benchmark (prosocial behaviour: α = 0.67; emotional problems: α = 0.66; conduct problems: α = 0.58; and peer problems: α = 0.53), with the exception of the hyperactivity-inattention subscale (α = 0.76) . All subscales of the teacher version were reported to have adequate internal consistency, with the exception of the peer problems subscale (α = 0.63). This means that despite the SDQ being used widely in practice and research, caution ought to be exercised when using SDQ subscales that do not meet the recommended reliability guidelines. The inter-rater agreement between parent and teacher ratings of the SDQ from eight studies by weighted mean correlations range from 0.26–0.47 .
Use of SDQ as a mental health screener.
Another important characteristic of a screening tool is its discriminant ability; that is, the measure’s accuracy to distinguish between those with and without mental health problems. The SDQ has been used in epidemiological, developmental, and clinical research in many countries and has been translated into more than 60 languages [28,39–44]. The increased use of the SDQ has been accompanied by increased research on its psychometric properties. Validation studies on the SDQ have used community-based [42,43,45–48] and clinical samples [37,49,50].
The evidence to date on the discriminative ability (i.e., screening ability) of the parent and teacher versions of the SDQ in detecting mental health problems, is better in clinical samples when compared to community populations . For example, combined parent and teacher reports in UK samples have been shown to have sensitivity values of 62.1% and 82.2%, in detecting mental health disorders in community and clinical samples, respectively [36,39]. When only parent reports were used, sensitivity dropped to 29.8% in a community sample and to 51.4% in the clinical sample [36,39]. In the case of only using teacher reports, sensitivity values of 34.5% and 59.8% have been documented in community and clinical samples, respectively [36,39]. The SDQ sensitivity was lowest for detecting anxiety in community samples (parent and teacher combined = 45.4%; parent = 38.8%; teacher = 15.9%). Positive predictive value (PPV) in a community sample has been shown to range from 35% (hyperactivity disorders) to 86% (emotional disorders) and negative predictive value (NPV) ranged from 83% to 98% .
The level of agreement between SDQ generated diagnoses (multi-informant format, parent, teacher and self-report) and clinical team diagnoses made by a community child and adolescent mental health service (regarded as gold standard) in an Australian sample has been found to be moderate; ranging from 0.39 to 0.56 . The level of agreement (Kendall’s Tau-b) between the SDQ generated diagnoses and the independent clinician’s diagnoses were low to moderate in range (emotional problems, r = 0.26, to 0.43, hyperactivity disorder). Concurrent agreements between clinical team and the independent clinician ratings were higher (emotional problems, r = 0.45, to 0.65, hyperactivity disorder). The probable or 90% dichotomisation system was used to measure sensitivity of SDQ diagnoses. The ‘probable’ dichotomisation level classifies approximately 90% of a population-based sample as having a negative test, while the ‘possible’ dichotomisation level gives a ‘test negative’ for approximately 80% of the same sample  The sensitivity of SDQ diagnoses was generated using the probable or 90% dichotomisation was 36% for emotional disorders, 44% for hyperactivity disorders, and 93% for conduct disorders. In contrast, the sensitivity of combined possible and probable SDQ diagnoses was 81% for emotional disorders, 93% for hyperactivity disorders, and 100% for conduct disorders. False negatives; that is, children who had a definite disorder but who were rated unlikely by the SDQ algorithm (multi-informant format), were rare for conduct disorders (n = 0, N = 130) and hyperactivity disorders (n = 2, N = 130), but more frequent for emotional disorders (n = 7, N = 130) .
The SDQ screening accuracy works best when it is completed by all three informants; namely parents, teachers and young people aged 11 years and older . If it is impractical or uneconomical to obtain data from all informants, parents’ and teachers’ reports have been shown to have equal predictive value, although their relative value depends on the type of mental health problem . The screening accuracy of the SDQ varies by mental health problem and rater . For conduct and hyperactivity disorders, self-report data are of less screening value than data from either parents or teachers. In the case of emotional disorders, self-report information are about as useful as teacher data, but less useful than parent data . Consequently, the parent and teacher report combinations are most often used in research .
In summary, although the SDQ is labelled as the most widely used screening measure of mental health problems in children and youth; the parent and teacher versions of the measure have poor concordance, questionable internal consistency, and inadequate sensitivity in a community sample. No study to date has examined the inter-rater agreement and screening concordance of the parent and teacher versions of the SDQ at item-level with the view of identifying the items that have the most informant discrepancies; and determining whether the concordance between parent and teacher reports on some items has the potential to influence decision making. In addressing the gap, this study aimed to:
- examine the reliability of the teacher and parent versions of the SDQ in a community sample of young adolescents;
- explore the inter-rater agreement-disagreement between parent and teacher ratings on the SDQ both at scale, subscale, and item level; and
- identify whether concordance between parent and teacher reports on the SDQ (scales and subscales, and items) has the potential of identifying young adolescents at-risk of mental health problems.
This cross-sectional study is part of a larger longitudinal study concerning the factors associated with student adjustment across the primary-secondary transition [52,53]. Parent and teacher reports of the mental health functioning of a community sample of 299 students with and without disabilities from 75 different primary schools across metropolitan Perth and other major city centres across Western Australia were collected. The study's cohort comprised 29% Catholic Education, 47% Government, and 24% Independent schools, which was different to the profile of all schools in Western Australia at the time of data collection (15%, 72%, and 13%, respectively). The school post code was used to calculate its socio-economic index (SEIFA Index), using the Commonwealth Department of Education, Employment, and Workplace Relations measure of relative socio-economic advantage and disadvantage . The SEIFA decile was used as the measure of mean school socioeconomic status (SES), with a lower decile number meaning that the school was located in an area that was relatively more disadvantaged than other areas. Approximately 35% of the sample came from schools that fell into the 1–8 decile bands; 44% came from schools in the 9th decile band; and 21% came from schools in band 10. This means that the sample was over-representative of higher SES band schools.
The mean age of the students was 11.9 years (SD = 0.45 years, median = 12 years). There was a nearly even split by gender (boys = 48.2%; n = 144). Household income data from 294 families were retrieved. The majority of the students (60%, n = 179) came from mid-range households; under one-third of the students (30.3%, n = 89) were from high-SES households and 8.8% (n = 26) were from low-SES groupings . Informed written consent was obtained from school principals, parents, and teachers. All participants were made aware that they were not obliged to participate in the study and were free to withdraw from the study at any time without justification or prejudice. Ethics approval was obtained from Curtin University Health Research Ethics Committee in Western Australia (WA) (HR 194/2005).
Measurement tool: The Strength and Difficulties Questionnaire (SDQ)
The 25-item teacher and parent versions of the Strengths and Difficulties Questionnaire (SDQ) were used to record each informant’s perception of four problem domains/subscales and one pro-social domain/subscale (each consisting of five items) [27,33,34]. The problem subscales include emotional symptoms, conduct problems, hyperactivity/inattention, and peer problems. Each item on the SDQ is scored on a 3-point ordinal scale with 0 = not true; 1 = somewhat true; and 2 = certainly true, with higher scores indicating larger problems (except in the case of pro-social behaviour in which a higher score indicates more positive behaviour). The SDQ total difficulties score is computed by summing the four problem behaviour subscales. Subscale scores range from 0–10, while the total difficulties SDQ score ranges from 0–40.
Data were managed and analysed using SPSS Version 20.0 and SAS Version 9.2 software packages. Less than 0.8% of data were missing at scale levels. The estimation maximisation algorithm and Little’s chi-square statistic revealed that the data were missing completely at random [56,57]. Replacement of missing data was undertaken using the guidelines recommended by the SDQ developers wherein, if at least three of the five SDQ items in a scale were completed, the remaining two scores were replaced by their mean. When more than three items were missing in a scale level, scores were excluded from the analysis. Independent samples t-tests confirmed that the profiles of those whose data were missing for various questions were similar to those who responded. The following analyses were undertaken:
- Descriptive overview: Means and standard deviations were calculated to provide a descriptive overview of the problems reported by both parents and teachers.
- The nature of agreement of teachers’ scores relative to parents (gold standard) was measured using the Bland-Altman Limits of Agreement (LOA) plots [58–60]. The LOA are based on the normal distribution, and bracket approximately 95% of differences between the ratings of teachers and parents. The plot of difference against mean was used to investigate any possible relationship between the measurement error and the true value for each total score.
- Intraclass Correlation Coefficient [ICC, Absolute agreement]: To attain interval level scores for each participant and each SDQ item, parent and teacher SDQ raw scores were subjected to Rasch analysis using the Winsteps programme (version 184.108.40.206) . The Rasch model enables the researcher to examine simultaneously: (i) whether or not the items define a single unidimensional construct (strengths and difficulties in this instance); (ii) the relative difficulty of each test item; and (iii) the relative strengths and difficulties score of each person . In addition to estimates of the relative difficulty of items and abilities of people, the Rasch analysis yields goodness-of-fit statistics expressed in mean square (MnSq) and standardised values. Prior to further calculations, we examined the goodness-of-fit statistics for people and items to ensure that they were within an acceptable range set a priori (MnSq < 1.4; standardised value < 2) ; this ensured that the measured scores were true interval-level measures. The resulting person and item measure scores were then entered into SPSS to test if the data were normally distributed (using the Kolmogorov-Smirnov test for normality). As the data were normally distributed, individual child and item inter-rater reliability between parent and teacher ratings of the children’s mental health status (using the SDQ) was calculated using ICC (2,1) [absolute agreement, two-way random effects model, single measures]. Reliability refers to the degree to which participants can be distinguished from each other, despite measurement error . An ICC between 0.4 and 0.7 is generally taken to indicate fair agreement, while values higher than this indicate excellent agreement .
- Percentage of agreement, and overall classification accuracy index at item level, using the raw data
- Cohen’s weighted Kappa coefficient, percentage of agreement, and overall classification accuracy index at scale level (using the “possible or 80% dichotomisation system”): The diagnostic algorithm was used to allocate the subscales and total SDQ scale scores into three categories and indicate the risk of difficulties, namely: ‘unlikely’, ‘possible’ and ‘probable’. The weights for the calculation of weighted Kappa were obtained from the column scores using the Fleiss-Cohen method (SAS v 9.2). Essentially, more weight is given to measurements that are in closer agreement. To calculate the values for screening efficiency in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), and diagnostic odds ratio (ORD), the three risk categories were reduced to two categories (‘test negative’, and ‘test positive’) [37,49]. In the first instance, the categories unlikely and possible were labelled ‘test negative’, and the third category probable was labelled ‘test positive’ (hereafter, referred to as the “probable dichotomisation”). In the second calculation, only the category unlikely was labelled ‘test negative’, and the second and third categories, possible and probable, were labelled ‘test positive’ (hereafter referred to as the “possible or 90% dichotomisation system”).
- Cohen’s weighted Kappa coefficient, percentage of agreement, and overall classification accuracy index at scale level (using the “probable or 90% dichotomisation system”).
- Screening efficiency of teachers’ ratings relative to parents’ ratings, using the “probable or 90% dichotomisation” at item level.
Descriptive overview of the reported symptoms in the students
The mental health scores of the students in the current study were better than Australian population norms for the 7–17 year old age group . The internal consistencies of the total SDQ scale in the current sample were below recommended standards for reliable use in a clinical setting . The parent version of the hyperactivity/ inattention subscale merely met the benchmark criterion (Table 1).
Measure of the nature of agreement of teachers’ scores relative to parents (gold standard), using Bland—Altman Limits of Agreement (LOA) plots
As shown in Figs 1–6, the Bland-Altman plots are organised so that the frequencies of data points on each particular dot on the graph are represented by the diameter of the ring around each point (larger number of points corresponds to a ring with a larger diameter). The horizontal lines on the graph show the mean bias (the middle solid line) with its 95% confidence interval. The upper and lower LOA are represented by the dashed lines, while the dotted line shows the line of complete agreement. In many cases, the confidence interval for the bias excludes the dotted line, suggesting a consistent bias. The width of the LOA indicates the degree of discrepancy between the teacher and parent ratings.
The Bland—Altman plots showed:
- systematic differences in teacher and parent ratings on the hyperactivity and emotional scale ratings; and
- a linear relationship between measurement errors as estimated by differences in the size of the measurement for conduct problems, peer problems, and total difficulty ratings.
Given the existence of systematic error and wide LOA relative to the range of scores, the agreement between informants was explored using the possible (80% dichotomisation) and probable (90% dichotomisation) systems (Table 2).
Intraclass correlation coefficient (ICC, Absolute agreement)
The ICC between parent and teacher ratings of children’s mental health using the SDQ at the person level was .44 (95% CI: .34–.53). This showed fair parent-teacher inter-rater reliability at the individual child level. The comparable ICC calculated at item level was .96 (95% CI: .91–.98); suggesting excellent parent-teacher inter-rater reliability for SDQ items.
Percentage of agreement, and overall classification accuracy index at item level, using the raw data
By using the parents’ ratings of the child’s mental health as the reference category, the percentage of teachers who obtained the same ratings as the parents were computed, using raw data (Table 3).
As shown in Table 3, item level agreement of teachers’ scores relative to parents ranged from 98.32% (item 22) to 45.61% (item 21). Eleven of the 25 items (items 3, 7, 10, 11, 12, 14, 18, 19, 22, 24) had agreement scores greater than 70%. Inter-rater agreement values for items 16 and 21were both less than 50%.
Overall, parents and teachers agreed on scoring the child as ‘not having a particular behaviour in question (rating of 0)’–with over 90% agreement in scores found for seven of 25 items (items 3, 5, 10, 13, 18, 19, 22), and less than 50% agreement found on items 20 and 21. Agreement between parents’ and teachers’ ratings of symptom severity was less than 50% for almost all items (except items 1 and 20). Weighted Kappa coefficients were in the poor to fair category, with three items having Kappa values over 0.3 (items 7, 15, 22).
Cohen’s weighted Kappa coefficient, percentage of agreement, and overall classification accuracy index at scale level (using the “possible or 80% dichotomisation system”)
Given that we administered the SDQ to a community sample, where the majority of students were assumed to have no mental health problems, we were interested in determining whether the 80% (Table 4) or 90% (Table 5) dichotomisation system would be most beneficial to identify cases at risk for further assessment. In all situations, the parents’ rating of the child’s mental health was treated as the true report; and the percentage of teachers who obtained the same rating as parents were computed (Tables 4 and 5). Cohen’s weighted Kappa, percentage of agreement and index of overall agreed classification accuracy indices were computed to this end.
An overview of the number and percentage of caseness identified by teacher and parent ratings on the SDQ subscales and total SDQ score is presented in Table 4, using the ‘possible’ dichotomisation system. The overall agreement between teacher and parent ratings was in the poor to moderate category (Kappa = .18–.36). Across the board, parents were more likely to identify their child as having problems that impacted on their overall functioning.
The conduct disorders category had a LR+ value indicative of a potentially useful test. Therefore, based on parent and teacher reports, the likelihood of a positive test result in a child having a conduct disorder was 7.17 times larger than in a chid without a conduct disorder, which is high enough to be interpreted as having the potential to alter clinical decisions. None of the categories were in the LR- interval for potentially useful tests. That means that the likelihood of a child without a diagnosis having a negative test result was too low to be interpreted as having a potential to alter clinical decisions. None of the SDQ categories had ORD in the range for potentially useful tests, as indicated by recommended guidelines . Given the limited usefulness of teacher and parent ratings on the other SDQ domains and total SDQ scores, using the “possible or 80% dichotomisation system”; further analyses were conducted to determine whether a more stringent categorisation (90% dichotomisation) could improve the screening efficiency of the tool.
Cohen’s weighted Kappa coefficient, percentage of agreement, and overall classification accuracy index at the scale level (using the “probable or 90% dichotomisation system”)
Table 5 presents a descriptive overview of SDQ domain caseness based on parent and teacher ratings, and agreement/ screening efficiency of teacher ratings relative to parent ratings using the probable (90%) dichotomisation system (N = 299)
As shown in Table 5, the overall agreement between teacher and parent ratings was poor (Kappa = .17–.40). Teachers noted a higher proportion of students to have conduct problems. Parents on the other hand found a higher percentage of students to have peer, emotional, and hyperactive problems that impacted on the overall functioning of the child. When using the 'probable' dichotomisation, the categories: peer problems; conduct disorders; hyperactive-inattention; and total difficulties were all in the LR+ range for potentially useful tests. This means that when there is agreement between parent and teacher reports (as reflected by LR+ values, the likelihood of a child being at risk of those problem behaviours after a positive test was between 7.35 and 13.36 times more likely than in a child without identified problem behaviours. None of the categories were in the LR- range for potentially useful tests. The ORD result for the peer problem category was in the range for potentially useful tests as indicated by recommended guidelines . Given the clinical usefulness of the “probable or 90% dichotomisation system”, further analyses were undertaken to explicitly identify individual SDQ items that had the largest potential to alter clinical decisions.
Screening efficiency of teachers’ ratings relative to parents’ ratings using the “probable or 90% dichotomisation” at the item level
Overall, 19 of the 25 items (items 1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 22, 23, 24) showed agreement scores greater than 70%. Agreement between parents and teachers about the level of intensity of symptoms continued to be less than chance (i.e., < 50%).
Four items: item 3 = often complains of headaches, stomach aches or sickness (emotional domain); item 5 = easily distracted, concentration wanders (emotional domain); item 13 = often unhappy, depressed or tearful (emotional domain); and item 14 = generally liked by other children (peer problems domain) were in the LR+ range for potentially useful tests (items in this instance). This means that the likelihood of a child having a positive test index that warrants further investigation when parents and teachers flag these items as being of concern is between 7.28–8.87 times higher than chance would occur in an individual without the condition.
The SDQ is one of the most common screening tools used in both educational clinical settings to flag potential mental health problems in children and adolescents [7,28]. The tool’s originator suggests that the SDQ can be used for screening; as part of a clinical assessment; as a treatment outcome measure; and as a research tool [27,46]. A recent study questioned the reliability of some of the subscales of the SDQ . The current study aimed to examine the inter-rater agreement and screening concordance of the parent and teacher versions of the SDQ at scale, subscale and item levels to determine if some items have the potential to influence clinical decision making.
Internal consistency estimates using parent and teacher forms
The raw SDQ scores of the sample were better that the Australian population norms for the 7–17 year old age group [51,53]. Consistent with the review by Stone et al. , the internal consistencies for several subscales failed to meet the recommended threshold for reliable use in a community sample. Whilst the ICCs between parent and teacher ratings of children’s mental health using the SDQ at the person level was fair for individual children; the comparable ICC calculated at item level was excellent, suggesting that the SDQ is reliable.
Screening is often the first step in determining who is eligible for further assessment, and can be used to identify those who are likely to benefit from immediate interventions because they are considered to be at risk [66,67]. The utility of a mental health screener may vary due to the prevalence of a disorder; and the population or setting (clinical versus community) [8,38]. A clinical population is likely to have a higher prevalence of psychosocial problems than a community population. Therefore, when used in a clinical population, the SDQ should inform us about types of problems, their duration, and perception of impact. A community population is likely to have only a few people with psychosocial problems; hence, the SDQ should be very sensitive in detecting those in the community who are at-risk of having psychosocial problems .
Congruence between parent and teacher reports on the SDQ
Consistent with previous research , the results suggest that using the SDQ with parents only or teachers only is not recommended. Excellent parent teacher inter-rater reliability values were recorded at the item level; however, this was only the case for children demonstrating no problems on an item (i.e., a score of 0). Congruence between parents and teachers for children demonstrating any behaviour (i.e., a score of 1 or 2) was low. In addition, the weighted Kappa values were moderate to low. Weighted Kappa values are very sensitive to skewed distributions, as is the case in the present data, so the generally low Kappa values were expected. Even so, the overall congruence between parent and teacher reports was poor. Importantly, this was the case using both the ‘possible’ and ‘probable’ dichotomisation criteria.
An important consideration is the positive predictive value (PPV) value, which reflects the proportion of cases where both the parent and teacher were in agreement that the child has probable or possible mental health problems. The PPV is determined by the sensitivity and specificity of the test and prevalence; in this case the number of children identified by parents to have problems on any item at either of the two dichotomisation levels. Because the prevalence was generally low (ranging from 5–10% in most cases), a PPV of 0.4 would be considered acceptable . Using the “possible or 80% dichotomisation”, PPV values were acceptable for the total difficulties score, as well as the conduct problems, emotional symptoms, and peer problems subscales. At the “probable or 90% dichotomisation” level, the PPV values were acceptable for the total difficulties score and the emotional symptoms and peer problems subscales. Given that the study comprised a community sample, within which the prevalence rate of mental health was at most 14%, the use of the 80% dichotomisation was most appropriate. However, when using the 90% dichotomisation at item level, most PPVs were significantly lower; suggesting that where a child is reported to have a problem on an item level by the parent (as a reference point) he or she is unlikely to be reported as having the same problem by the teacher. It should be noted, however, that the NPVs were generally very high, suggesting that the parent was not reporting a problem at the item level, which the teacher is very likely to agree. Therefore, consistent with the basic patterns of parent-teacher congruence, parents and teachers had excellent agreement when the child did not have emotional and behavioural problems. Unfortunately, the level of agreement deteriorated dramatically when parents and teachers rated their children as being somewhat and certainly sure of exhibiting emotional and behavioural difficulties. So the issue at hand relates to the false positives (i.e., the parents reported a potential mental health problem but the teacher did not). The pertinent question that follows is: “Is a low PPV problematic in screening tools?” Given that the SDQ was designed to identify children at risk of mental health problems, the low PPV and sensitivity means that the measure is not optimised for use in this community sample for identifying at risk children and youth. Further research examining the optimal contexts within which parents’ versus teachers’ report problem behaviours as an indicator of potential mental health problems is warranted.
Clinical Utility of the SDQ
At both the domain and item level the specificity of the SDQ was excellent; however, sensitivity was generally poor. Given that the SDQ has been designed as a screening tool to identify at risk children [27,36], this poor sensitivity is concerning. The sensitivity of teacher and parent concordance in the current study are however similar to those reported in prior studies involving community samples . Although well established, sensitivity and specificity have some deficiencies in clinical use. This is mainly due to the fact that sensitivity and specificity are population measures; that is, they summarise the characteristics of the test across a population . The present study computed ratios to obtain: a) the likelihood of the adolescent having a mental health problem given a positive test result by both parents and teachers (positive likelihood ratio, LR+); and b) the negative likelihood ratio or the likelihood of the adolescent to not have a mental health problem given a negative test result by both parents and teachers (negative likelihood ratio, LR-). A LR+ value of 7 or greater is generally indicative of the clinical utility of a scale or item . Using the 80% dichotomisation, only the conduct problems subscale, and none of the individual SDQ items, reached this threshold. However, the total difficulties score and the subscales (with the exception of emotional symptoms) reached the threshold for clinical utility at the 90% dichotomisation level. Moreover, three individual items also had LR+ scores indicating clinical utility: item 3 (often complains of headaches, stomach aches or sickness), item 13 (often unhappy, depressed or tearful), and item 14 (generally liked by other children; negative coded item). Specifically, if these items are flagged by both the teacher and parent this may indicate the probable presence of mental health problems that warrant further assessment.
Taken together, these findings suggest that when evaluating concordance between parent and teacher reports on the SDQ, using the 90% dichotomisation system has the most clinical utility, at scale, subscale and item levels. However, this was only the case when there was agreement between parent and teacher reports (as reflected by LR+ values). Additionally, LR+ scores did not reach the threshold of 7 for the emotional symptoms subscale. This may reflect that the internalising symptoms may be difficult for both parents and teachers to observe . Based on the current findings, if only parent or teacher report versions of the SDQ were administered in community samples, it appears unlikely that the SDQ will reliably identify children at risk of mental health problems. Future research should examine the clinical utility of the self-report version of the SDQ with regard to emotional symptoms, as well as examining the clinical utility of various combinations of reports to best identify students at risk of mental health problems.
Limitations and Directions for Future Research
The current study had some major limitations which should be noted. First, the data was cross-sectional. The SDQ was the only mental health measure administered and clinical assessments were not conducted. Therefore, it was not possible to determine whether the teacher or parent report was a better indicator of child mental health and behaviour problems than clinical observations, or whether this differed as a function of symptomatology (e.g., internalising vs. externalising). Future research should examine this using prospective study designs that incorporate both a clinical assessment and administer additional outcome measures. Secondly, the sample was over-represented by students from higher mean school socio-economic sectors, which may limit the generalisation of the findings to the wider population of Australian school children. Thirdly, missing values were replaced using mean values as recommended by the SDQ developers. Using the mean value replacement technique could have resulted in biased estimates and specifically underestimated the standard errors . Expectation maximisation has been recommended to overcome some of the limitation of mean substitution and should be used in the future. Also, the self-report version of the SDQ was not completed by the children in the study. Teachers and parents may not identify internalising symptoms unless they manifest in some form of observable behaviour . Consistent with this, the emotional symptoms subscale was the only subscale that did not meet the LR+ threshold. Future research should examine concordance between parent, teacher, and youth reports on the SDQ in community samples and compare the utility of these in identifying potential mental health problems.
The SDQ is one of the most widely used screening tools internationally in both clinical and community samples. Consistent with a recent review , internal consistencies did not reach recommended thresholds for the total difficulties score (teacher report), as well as the conduct problems (parent report), peer problems (both parent and teacher reports), and prosocial behaviour (parent report) subscales. Moreover, given that the purpose of the SDQ as a screening measure is to identify children at risk of mental health and behavioural problems, the low PPVs is of concern.
In the current community sample, the SDQ only demonstrated clinical utility when there was agreement between teacher and parent reports using the 90% dichotomisation system. Moreover, three individual items also had LR+ scores indicating clinical utility: item 3 (often complains of headaches, stomach aches or sickness), item 13 (often unhappy, depressed or tearful), and item 14 (generally liked by other children). Specifically, if these items are flagged by both teacher and parent this may indicate the probable presence of mental health problems and warrant further assessment. Further research is needed to learn more about the relationship of items to each other and the contribution of each item to its subscale score and its contribution to the overall difficulties score. Of note was the finding that the negative likelihood ratio or likelihood of disregarding the absence of a condition when both parents and teachers rated the item as absent was not significant. There is a need for further research to identify in which contexts parent and teacher reports might independently show clinical utility. Taken together, these findings suggest that the SDQ is not optimised for use in community samples and that further psychometric evaluation of the SDQ in this context is clearly warranted.
Conceived and designed the experiments: SV TF. Performed the experiments: SV. Analyzed the data: SV RP RC. Contributed reagents/materials/analysis tools: SV. Wrote the paper: SV RC MB RP AJ MC TF MF. Critically reviewed the manuscript and provided expert feedback: AJ.
- 1. Hickie IB, Groom G, Davenport T (2004) Investing in Australia’s future: The personal, social and economic benefits of good mental health. Canberra: Mental Health Council of Australia.
- 2. Kessler RC, Berglund P, Demler O, Jin R, Merikangas KR, et al. (2005) Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication. Archives of general psychiatry 62: 593–602. pmid:15939837
- 3. Sawyer MG, Arney FM, Baghurst PA, Clark JJ, Graetz RJ, et al. (2001) The mental health of young people in Australia: Key findings from the child and adolescent component of the national survey of mental health and well-being. Australian and New Zealand Journal of Psychiatry 35: 806–814. pmid:11990891
- 4. Murray CJL, Lopez AD (1996) The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injury and risk factors in 1990 and projected to 2020. Harvard School of Public Health, Boston.: World Health Organisation and World Bank, Harvard School of Public Health.
- 5. Bricker D, Davis MS, Squires J (2004) Mental health screening in young children. Infants & Young Children 17: 129–144.
- 6. Patel V, Flisher AJ, Hetrick S, McGorry P (2007) Mental health of young people: a global public health challenge. The Lancet 369: 1302–1313.
- 7. Lane LK, Parks RJ, Kalberg RJ, Carter EW (2007) Systematic Screening at the Middle School Level: Score Reliability and Validity of the Student Risk Screening Scale. Journal of Emotional and Behavioral Disorders 15: 209–222.
- 8. Levitt JM, Saka N, Hunter Romanelli L, Hoagwood K (2007) Early identification of mental health problems in schools: The status of instrumentation. Journal of School Psychology 45: 163–191.
- 9. Sayal K, Tischler V, Coope C, Robotham S, Ashworth M, et al. (2010) Parental help-seeking in primary care for child and adolescent mental health concerns: qualitative study. The British Journal of Psychiatry 197: 476–481. pmid:21119154
- 10. Owens PL, Hoagwood K, Horwitz SM, Leaf PJ, Poduska JM, et al. (2002) Barriers to Children's Mental Health Services. Journal of the American Academy of Child & Adolescent Psychiatry 41: 731–738.
- 11. Rothì DM, Leavey G (2006) Mental Health Help-Seeking and Young People: A Review. Pastoral Care in Education 24: 4–13.
- 12. Loades ME, Mastroyannopoulou K (2010) Teachers’ Recognition of Children’s Mental Health Problems. Child and Adolescent Mental Health 15: 150–156.
- 13. Reinke WM, Stormont M, Herman KC, Puri R, Goel N (2011) Supporting children's mental health in schools: Teacher perceptions of needs, roles, and barriers. School Psychology Quarterly 26: 1.
- 14. Rothì DM, Leavey G, Best R (2008) On the front-line: Teachers as active observers of pupils’ mental health. Teaching and Teacher Education 34: 1217–1231.
- 15. Strickland J, Hopkins J, Keenan K (2012) Mother-Teacher Agreement on Preschoolers’ Symptoms of ODD and CD: Does Context matter? Journal of abnormal child psychology 40: 933–943. pmid:22661105
- 16. Loeber R, Green SM, Lahey BB (1990) Mental health professionals' perception of the utility of children, mothers, and teachers as informants on childhood psychopathology. Journal of Clinical Child Psychology 19: 136–143.
- 17. De Los Reyes A, Kazdin AE (2004) Measuring informant discrepancies in clinical child research. Psychological Assessment 16: 330–334. pmid:15456389
- 18. De Los Reyes A, Kazdin AE (2005) Informant discrepancies in the assessment of childhood psychopathology: a critical review, theoretical framework, and recommendations for further study. Psychological Bulletin 131: 483–509. pmid:16060799
- 19. Verhulst FC, Dekker MC, van der Ende J (1997) Parent, teacher and self-reports as predictors of signs of disturbance in adolescents: whose information carries the most weight? Acta Psychiatrica Scandinavica 96: 75–81. pmid:9259229
- 20. Mash EJ, Hunsley JE (2005) Developing guidelines for the evidence-based assessment of child and adolescent disorders [Special issue]. Journal of Clinical Child and Adolescent Psychology 34.
- 21. Richters JE (1992) Depressed mothers as informants about their children: A critical review of the evidence for distortion. Psychological Bulletin 112: 485–499. pmid:1438639
- 22. Mash EJ, Hunsley J (2005) Evidence-Based Assessment of Child and Adolescent Disorders: Issues and Challenges. Journal of Clinical Child & Adolescent Psychology 34: 362–379.
- 23. Achenbach TM (2006) As Others See Us: Clinical and Research Implications of Cross-Informant Correlations for Psychopathology. Current Directions in Psychological Science 15: 94–98.
- 24. De Los Reyes A, Henry DB, Tolan PH, Wakschlag LS (2009) Linking Informant Discrepancies to Observed Variations in Young Children’s Disruptive Behavior. Journal of Abnormal Child Psychology 37: 637–652. pmid:19247829
- 25. Koenig K, Los Reyes A, Cicchetti D, Scahill L, Klin A (2009) Group Intervention to Promote Social Skills in School-age Children with Pervasive Developmental Disorders: Reconsidering Efficacy. Journal of Autism and Developmental Disorders`39 8.
- 26. De Los Reyes A, Kazdin AE (2006) Conceptualizing Changes in Behavior in Intervention Research: The Range of Possible Changes Model. Psychological Review 113: 554–583. pmid:16802881
- 27. Goodman R (1997) The strengths and difficulties questionnaire: A research note. Journal of Child Psychology and Psychiatry 38: 581–586. pmid:9255702
- 28. Stone LL, Otten R, Engels R, Vermulst AA, Janssens JM (2010) Psychometric properties of the parent and teacher versions of the strengths and difficulties questionnaire for 4-to 12-year-olds: a review. Clinical child and family psychology review 13: 254–274. pmid:20589428
- 29. Nguyen N, Whittlesey S, Scimeca K, Digiacomo D, Bui B, et al. (1994) Parent-Child Agreement in Prepubertal Depression: Findings with a Modified Assessment Method. Journal of the American Academy of Child and Adolescent Psychiatry 33: 1275–1283. pmid:7995793
- 30. Youngstrom E, Loeber R, Stouthamer-Loeber M (2000) Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of consulting and clinical psychology 68: 1038. pmid:11142538
- 31. American Psychiatric Association (1994) Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: American Psychiatric Association.
- 32. World Health Organization (1992) International classification of diseases 10th edition: Classification of mental and behavioural disorders. Geneva: WHO.
- 33. Goodman R (1999) The extended version of the strengths and difficulties questionnaire as a guide to child psychiatric caseness and consequent burden. Journal of Child Psychology and Psychiatry 40: 791–801. pmid:10433412
- 34. Goodman R, Ford T, Simmons H, Gatward R, Meltzer H (2003) Using the Strengths and Difficulties Questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. International Review of Psychiatry 15: 166–172. pmid:12745328
- 35. Achenbach TM, McConaughy SH, Howell CT (1987) Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin 101: 213–232. pmid:3562706
- 36. Goodman R, Ford T, Simmons H, Gatward R, Meltzer H (2000) Using the Strengths and Difficulties Questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. The British Journal of Psychiatry 177: 534–539. pmid:11102329
- 37. Goodman R, Renfrew D, Mullick M (2000) Predicting type of psychiatric disorder from Strengths and Difficulties Questionnaire (SDQ) scores in child mental health clinics in London and Dhaka. Eur Child Adolesc Psychiatry 9: 129–134. pmid:10926063
- 38. Glover TA, Albers CA (2007) Considerations for evaluating universal screening assessments. Journal of School Psychology 45: 117–135.
- 39. Goodman R, Ford T, Corbin T, Meltzer H, Goodman R, et al. (2004) Using the Strengths and Difficulties Questionnaire (SDQ) multi-informant algorithm to screen looked-after children for psychiatric disorders. European Child and Adolescent Psychiatry 13: 25–31.
- 40. Marzocchi GM, Capron C, Di Pietro M, Tauleria ED, Duyme M, et al. (2004) The use of the Strengths and Difficulties Questionnaire (SDQ) in Southern European countries. European Child & Adolescent Psychiatry 13: ii40–ii46.
- 41. Mathai J, Anderson P, Bourne A (2002) The Strengths and Difficulties Questionnaire (SDQ) as a screening measure prior to admission to a Child and Adolescent Mental Health Service (CAMHS). Advances in Mental Health 1: 235–246.
- 42. Obel C, Heiervang E, Rodriguez A, Heyerdahl S, Smedje H, et al. (2004) The strengths and difficulties questionnaire in the Nordic countries. European child & adolescent psychiatry 13: ii32–ii39.
- 43. Woerner W, Becker A, Rothenberger A (2004) Normative data and scale properties of the German parent SDQ. European Child & Adolescent Psychiatry 13: ii3–ii10.
- 44. Vostanis P (2006) Strengths and Difficulties Questionnaire: research and clinical applications. Current Opinion in Psychiatry 19: 367–372. pmid:16721165
- 45. Hawes DJ, Dadds MR (2004) Australian data and psychometric properties of the Strengths and Difficulties Questionnaire. Australian and New Zealand Journal of Psychiatry 38: 644–651. pmid:15298588
- 46. Goodman R (2001) Psychometric Properties of the Strengths and Difficulties Questionnaire. Journal of the American Academy of Child & Adolescent Psychiatry 40: 1337–1345.
- 47. Marzocchi G, Capron C, Di Pietro M, Duran Tauleria E, Duyme M, et al. (2004) The use of the Strengths and Difficulties Questionnaire (SDQ) in Southern European countries. European Child & Adolescent Psychiatry 13: ii40–ii46.
- 48. Bourdon KH, Goodman R, Rae DS, Simpson G, Koretz DS (2005) The Strengths and Difficulties Questionnaire: U.S. Normative Data and Psychometric Properties. Journal of the American Academy of Child and Adolescent Psychiatry 44: 557–564. pmid:15908838
- 49. Mathai J, Anderson P, Bourne A (2004) Comparing psychiatric diagnoses generated by the Strengths and Difficulties Questionnaire with diagnoses made by clinicians. Australian and New Zealand Journal of Psychiatry 38: 639–643. pmid:15298587
- 50. Brøndbo P, Mathiassen B, Martinussen M, Heiervang E, Eriksen M, et al. (2011) The strengths and difficulties questionnaire as a screening instrument for norwegian child and adolescent mental health services, application of UK scoring algorithms. Child and Adolescent Psychiatry and Mental Health 5: 1–10.
- 51. Mellor D (2005) Normative data for the strengths and difficulties questionnaire in Australia. Australian Psychologist 40: 215–222.
- 52. Vaz S (2010) Factors affecting student adjustment as they transition from primary to secondary school: A longitudinal investigation. Perth: Curtin University.
- 53. Vaz S, Parsons R, Falkmer T, Passmore AE, Falkmer M (2014) The Impact of Personal Background and School Contextual Factors on Academic Competence and Mental Health Functioning across the Primary-Secondary School Transition. PloS ONE 9: e89874. pmid:24608366
- 54. Australian Bureau of Statistics [ABS] (2011) 2033.0.55.001—Census of Population and Housing: Socio-Economic Indexes for Areas (SEIFA). Canberra: ABS.
- 55. Australian Bureau of Statistics [ABS] (2001) ABS 2001 Census Dictionary (Cat. No. 2901.0).
- 56. Tabachnick B, Fidell L (2007) Using multivariate statistics. Boston, MA: Pearson Education Inc. and Allyn & Bacon.
- 57. Meyers LS, Gamst G, Guarino AJ (2006) Applied multivariate research: Design and implication. CA: Sage Publications, Inc..
- 58. Bland JM, Altman DG (1996) Measurement error. BMJ 312: 1654. pmid:8664723
- 59. Bland JM, Altman DG (2003) Applying the right statistics: Analyses of measurement studies. Ultrasound Obstet Gynecol 22: 85–93. pmid:12858311
- 60. Hamilton C, Stamey J (2007) Using Bland-Altman to assess agreement between two medical devices-don't forget the confidence intervals! Journal of Clinical Monitoring and Computing 21: 331–333. pmid:17909978
- 61. Linacre JM (2009) Winsteps programme (version 220.127.116.11). Beaverton; Oregon.
- 62. Bond TG, Fox CM (2007) Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
- 63. Streiner DL, Norman GR (2003) Health measurement scales. A practical guide to their development and use. New York: Oxford University Press.
- 64. Vincent WJ (1999) Statistics in Kinesiology Champaign, IL: Human Kinetic.
- 65. Fischer J, Bachmann L, Jaeschke R (2003) A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Medicine 29: 1043–1051. pmid:12734652
- 66. Fleming MF (2002) Screening, assessment and intervention for substance use disorders in general health care settings. Substance Abuse 23: 47–65. pmid:23580987
- 67. Babor TF, Sciamanna CN, Pronk NP (2004) Assessing multiple risk behaviors in primary care: Screening issues and related concepts. American journal of preventive medicine 27: 42–53.
- 68. Youngstrom E, Loeber R, Stouthamer-Loeber M (2000) Patterns and correlates of agreement between parent, teacher, and male adolescent ratings of externalizing and internalizing problems. Journal of Consulting and Clinical Psychology 68: 1038–1050. pmid:11142538
- 69. Attia J (2003) Moving beyond sensitivity and specificity: using likelihood ratios to help interpret diagnostic tests. Australian prescriber 26.
- 70. Enders CK (2010) Applied Missing Data Analysis: Guilford Press.