Diagnostic efficiency and validity of the DSM-oriented Child Behavior Checklist and Youth Self-Report scales in a clinical sample of Swedish youth

The Child Behavior Checklist (CBCL) and Youth Self-Report (YSR) are widely used measures of psychiatric symptoms and lately also adapted to the DSM. The incremental validity of adding the scales to each other has not been studied. We validated the DSM subscales for affective, anxiety, attention deficit/hyperactivity (ADHD), oppositional defiant (ODD), conduct problems (CD), and obsessive-compulsive disorder (OCD) in consecutively referred child and adolescent psychiatric outpatients (n = 267) against LEAD DSM-IV diagnoses based on the K-SADS-PL and subsequent clinical work-up. Receiver operating characteristic analyses showed that the diagnostic efficiency for most scales were moderate with an area under the curve (AUC) between 0.70 and 0.90 except for CBCL CD, which had high accuracy (AUC>0.90) in line with previous studies showing the acceptable utility of the CBCL DSM scales and the YSR affective, anxiety, and CD scales, while YSR ODD and OCD had low accuracy (AUC<0.70). The findings mostly reveal incremental validity (using logistic regression analyses) for adding the adolescent to the parent version (or vice versa). Youth and parent ratings contributed equally to predict depression and anxiety disorders, while parent ratings were a stronger predictor for ADHD. However, the youth ADHD rating also contributed. Adding young people as informants for ODD and OCD or adding the parent for CD did not improve accuracy. The findings for depression, anxiety disorders, and ADHD support using more than one informant when conducting screening in a clinical context.


Introduction
Recent systematic reviews report that at any given year approximately 13-25% of youth suffer from mental disorders that cause significant functional impairment in important domains of everyday life such as family, school, and socializing with peers [1,2]. This brings about high costs and suffering for the individual, family, and society as a whole [3,4]  and effective assessment and treatment for these young people. However, only a small proportion of youth with mental disorders receive adequate treatment [5]. This is especially true for internalizing disorders which are greatly underdiagnosed and undertreated [6][7][8][9]. Thus, the need to identify and treat pediatric mental disorders is important and may potentially reduce the risk of impairment, severity, and recurrence of psychopathology in the future [6,10,11]. Standardized diagnostic interviews (SDIs) are considered to be the gold standard [12]. Brief continuous psychometric measures are more time-efficient and thus less expensive. They can be suitable for screening or as part of clinical intake procedures to capture a wide range of symptoms in a cost-efficient fashion [13]. However, it is important to evaluate the diagnostic efficiency of screening instruments using representative samples, such as consecutive treatment seeking children [14,15].
Collecting and combining data from multiple informants (e.g., parents and children) can increase the accuracy of screening and is recommended as informant discrepancy is common [16][17][18][19][20][21] and particularly for subjective symptoms and behavior outside the home [22,23].
The Child Behavior Checklist (CBCL) and Youth Self-Report (YSR) are widely used measures of psychiatric symptoms in young people, measuring a range of problem areas [24]. They are often used as part of clinical intake procedures and can screen for psychiatric disorders without any additional cost for the clinic or the family. The syndrome scales of the CBCL/YSR derived by factor analysis have only shown modest concordance with the Diagnostic and Statistical Manual of Mental Disorders (DSM) [25][26][27][28]. For instance, as each syndrome scale is related to multiple DSM disorders (e.g., the anxious/depressed component is related to both depressive disorders and anxiety disorders), making it impossible to tease apart whether a child's symptoms are congruent with depressive or anxiety disorders or both [28]. This lack of concordance is suboptimal as treatment options are based on DSM or International Classification of Diseases (ICD) diagnoses.
The authors of the CBCL/YSR have attempted to overcome this limitation by developing DSM-oriented scales (DOSs) based on expert consensus, choosing pre-existing items corresponding with the DSM criteria. This has resulted in the following DOSs: affective, anxiety, somatic, attention deficit/hyperactivity (ADHD), oppositional defiant (ODD), conduct problems (CD), and obsessive-compulsive disorder (OCD). Several studies have investigated the concurrent validity of the DOSs in clinical samples and compared this with syndrome scales. Ebesutani and colleagues [29,30] showed that DOSs are not superior to the original syndrome scales, while Bellina [31] showed weaker correspondence between the DOS and DSM diagnoses compared with syndrome scales except for the ADHD scale, which outperformed the older attention problems scale. Further, Aebi [32] showed better correspondence between the affective DOS and DSM-IV diagnosis of major depressive disorder than the older syndrome scales. Most studies have reported acceptable correspondence between the affective DOS and a depressive disorder diagnosis [33][34][35][36], the anxiety DOS and anxiety disorders [34][35][36], and the ADHD, ODD, and CD scales and their corresponding disorders [35,36]. Some evidence also exists for the OCD scale but not in purely clinical samples [37,38].
However, we are not aware of any existing study that has combined data from multiple informants to evaluate increased accuracy of the DOS. The present study addresses the lack of data on the diagnostic efficiency of the DOS when combining data from multiple informants. The aim of the current study was to evaluate the concurrent and discriminant validity of the DOS in a large consecutive help-seeking sample at CAP clinics by comparing diagnosis-specific DOS scores between children with and without the diagnosis-specific disorder, and by using receiver operating characteristic (ROC) to examine the screening efficiency of the DOS. Secondary aims were to examine gender differences in mean scores as discovered in our previous papers using same data [18,19] and to evaluate the incremental validity of the DOS by combining data from multiple informants.

Participants
In all, we included 307 CAP outpatients who consecutively sought treatment at four CAP clinics in southern Sweden from January 2010 to March 2013. Further information can be retrieved from our previous publications on this sample [12,18,19]. Briefly, our exclusion criterion was insufficient proficiency in Swedish by the patient or the parent. Forty cases were discarded due to protocol violations in the Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL) interview. One clinician used leading questions or did not ask both parent and child questions about all symptom areas and another clinician failed to sufficiently report data. The data from the remaining 267 cases are reported. These cases had a mean age of 12.1 (SD 3.2, range 6.1-17.8) years. The proportion of children 6-12 years was 57.7% (n = 154). There were slightly more boys (n = 150, 56.2%) than girls. The CBCL was filled out by 263 (98.5%) parents of these patients. Mothers' CBCL data were used in the parental CBCL in 240 (89.9%) cases; fathers' ratings were used only when those were the only data available (23 cases, 8.6%). The YSR was filled out by 139 (52.1%) young people, as it was only distributed to patients aged 11-17 years. Both YSR and CBCL ratings were available for 137 (51.3%) patients.

Measures and procedures
A comprehensive description of measures and procedures can be found in a previous report [12]. Briefly, the semi-structured interview K-SADS-PL was used by resident MDs following a training program. The K-SADS interviews with both parents and patients yielded DSM-IV diagnoses, which were then further evaluated by using a Longitudinal Expert All Data (LEAD) process commonly viewed as the gold standard for evaluating semi-structured interviews. This process considers all information brought in through diagnostic procedures, the level of impairment, and the treatment outcome across a suitable period [39][40][41][42]. To be eligible for LEAD, the record should have covered at least six months of follow-up from the K-SADS-PL and included a minimum of three further visits or significant information from a teacher or an assessment by a senior clinician. In the LEAD work, the assessors had access to the K-SADS-PL interview as well as subsequent information from the medical records. All these data were retrieved by using a structured form. Thus, the re-evaluation of the K-SADS diagnoses was systematic and included oral reports and report forms from teachers and other informants, psychological assessments, and the outcome of pharmacological and psychological treatment. The observation time that yielded new diagnostic information was 1.2 (SD 0.6) years with a range of 0.1-3.1 years. For further information about the reliability of this process, see [12]. The LEAD procedure and clinical records were blind to the CBCL and YSR. The Ethical Review Board at Lund University approved the study. Patients aged 15 years and above and parents consented to the study in writing. often true. Achenbach and Rescorla (2001) constructed a new scoring system for the CBCL and YSR scales, based on the DSM diagnostic criteria, the DSM-oriented scales, which will be used in the current study. The scales are affective problems, anxiety problems, attention-deficit/hyperactivity (ADHD) problems, oppositional defiant problems (ODD), and conduct problems (CD). We also examined OCD problems [38]. The internal consistency was as followed: affective (CBCL internal consistency (α = 0.82, YSR α = 0.80), anxiety (CBCL α = 0.82, YSR α = 0.73), ADHD (CBCL α = 0.84, YSR α = 0.76), ODD (CBCL α = 0.84, YSR α = 0.61), CD (CBCL α = 0.81, YSR α = 0.82), and OCD (CBCL α = 0.77, YSR α = 0.76).

Statistics
T-tests were conducted to examine gender and diagnostic group differences on the DOS mean scores. Receiver operating characteristics (ROC) analyses were conducted to examine the concurrent validity of the CBCL/YSR DOSs versus a LEAD diagnosis [14,43]. Generally, the area under the curve (AUC) is judged to represent low accuracy between 0.50 and 0.70, moderate accuracy between 0.70 and 0.90, and high accuracy above 0.90 [44]. The agreements between LEAD diagnoses and cut-off scores for the CBCL/YSR DOSs were also evaluated by using the Kappa statistic: poor agreement = less than 0.20; fair agreement = 0.20-0.40; moderate agreement = 0.40-0.60; good agreement = 0.60-0.80; and very good agreement = 0.80-1.00 [45]. We also conducted series of multivariate logistic regression analyses to evaluate the concurrent and discriminant validity of the CBCL/YSR DOSs. In addition, sequential logistic regression analyses were conducted to examine whether adding an informant (child or parent) would increase how accurately children with a disorder could be identified based on the relevant CBCL/YSR DOS. Sequential logistic regression was only used for participants 11 years or older since YSR was not administered to younger participants.

Sample characteristics
The frequency of psychiatric disorders for the total sample and by gender is displayed in Table 1. The most prevalent disorders were ADHD (53%), anxiety disorders (36%), and depressive disorders (29%) while least prevalent were OCD (5%), and conduct disorders (4%). Table 2 displays the means and standard deviations (SDs) for the CBCL and YSR for all DOSs. Participants diagnosed with a specific disorder (e.g., any depressive disorder) scored significantly higher on the corresponding DOS (e.g., affective) compared with participants without a specific disorder. However, we did not find any significant differences between participants with or without a diagnosis for YSR ODD and OCD subscale. We observed genderspecific differences for the CBCL ADHD scale, where parents scored significantly higher for boys than for girls. On the contrary, girls scored significantly higher than boys on the YSR affective, anxiety, and OCD scales.

Diagnostic efficiency
First, we conducted a series of ROC analyses to evaluate how efficiently the DOSs predicted the presence of a corresponding LEAD diagnosis (Table 3). All predictions except YSR ODD and OCD were significant. We observed that CBCL CD predicted a diagnosis of CD with high accuracy. We observed moderate accuracy for the other DOSs in predicting the presence of their corresponding LEAD diagnoses. Second, we selected the most efficient cut-off scores to equally minimize the false-positive and false-negative results by establishing maximizing efficiency κ(0.5) [29,30]. Then, we evaluated the sensitivity and specificity of these cut-off scores (Table 3). For the CBCL and YSR affective scales, the Kappa [κ(0.5)] showed moderate agreement with their corresponding LEAD (any depressive disorder) diagnoses. The same was true for CBCL ADHD and OCD. All the other agreements were fair, except for YSR ODD and OCD, which showed poor agreement. Sensitivity ranged from 50% for YSR OCD to 81% for CBCL ADHD. The corresponding specificity ranged from 70% for the CBCL affective DOS, CBCL/YSR ADHD, and YSR ODD to 95% for CBCL CD (Table 3). More detailed results of the ROCs can be found in supplemental tables (see S1 File) with a presentation of each cutoff from 90% sensitivity to 90% specificity with kappa, positive and negative diagnostic likelihood ratio, and positive and negative predictive values. Table 2. Means, standard deviations, and independent t-test as per diagnostic group for CBCL and YSR and boys and girls.

Concurrent and discriminant validity
We also conducted a series of multivariate logistic regression analyses to evaluate the concurrent and discriminant validity of each subscale. Thus, we aimed to verify whether only the corresponding subscale of the DOS is associated with particular LEAD diagnoses compared to the other subscales (Table 4). The odds ratios (ORs) showed that the CBCL's affective, anxiety, and ADHD, and ODD scales all predict the presence of their corresponding LEAD diagnoses. However, the ADHD DOS also significantly but negatively predicted the presence of a LEAD depression diagnosis. Likewise, the CBCL affective scale also significantly negatively predicted the presence of a LEAD ADHD diagnosis. The YSR affective, anxiety, and ADHD DOSs predicted their corresponding LEAD diagnoses. However, YSR ODD did not predict the presence of the ODD diagnosis. Like the CBCL findings, we observed that the YSR ADHD and CD DOSs negatively predicted LEAD (any depression). Similarly, the YSR CD predicted LEAD anxiety. We did not analyze the data for CD or OCD due to too few diagnoses.

Incremental validity
We evaluated the possible benefit of adding the DOS child report (YSR) to the parent report (CBCL) and vice versa in predicting LEAD diagnoses. We used a sequential logistic regression analysis to evaluate whether the DOSs would predict the presence of a depressive disorder, anxiety disorder, ADHD, ODD, CD, and OCD. First, we entered the parent report and then added the adolescent report. Second, we started with the adolescent report and then added the parent report. In this way, we evaluated the unique contribution of each informant to the other (Table 5). We found good goodness-of-fit values for all analyses (Hosmer-Lemeshow p>0.05). For the affective scale, in the single variable models, both the YSR and the CBCL DOSs predicted the presence of depressive disorders (OR = 6.54 for YSR and OR = 5.29 for CBCL), explaining 24% of the variance (R 2 ). We observed significant benefits of adding the CBCL to the YSR (Δχ 2 = 16.172, p<0.001) and vice versa (Δχ 2 = 16.113, p<0.001). In the final model, both the CBCL and the YSR predicted the presence of depressive disorders (Table 5), explaining 36% of the variance. The DOS for anxiety predicted the presence of any anxiety disorder (OR = 4.88 for CBCL and OR = 4.56 for YSR), explaining 16% of the variance in the single variable models. Both scales demonstrated significant benefits when added to each other (Δχ 2 = 9.422, p<0.05 for adding the CBCL and Δχ 2 = 9.017, p<0.05 for adding the YSR) explaining 24% of the variance. The DOS for ADHD also predicted the presence of ADHD (OR = 8.18 for CBCL and OR = 4.39 for YSR explaining 28% and 16% of variance) in single variable models. Both scales showed significant benefits of adding an informant (Δχ 2 = 24.40, p<0.001 for adding the CBCL and Δχ 2 = 9.19, p<0.05 for adding the YSR) explaining 35% of the variance. We observed a significant OR (OR = 7.41 for CBCL and OR = 2.76 for YSR) when predicting ODD in the one informant (variable) model. However, only the CBCL carried significant benefits when added to the YSR (Δχ 2 = 11.435, p<0.001). Both scales had significant ORs when predicting the presence of CD (OR = 15.13 for CBCL and OR = 16.35 for YSR), but only the YSR carried significant benefits when added to the CBCL (Δχ 2 = 4.91, p<0.05). Both scales predicted the presence of OCD (OR = 52.29 for CBCL and OR = 5.79 for YSR) but only the CBCL carried significant benefits when added to the YSR (Δχ 2 = 17.74, p<0.001).

Discussion
In the current study, we evaluated the concurrent and incremental validity of the CBCL and YSR DOSs with several DSMs internalizing and externalizing diagnoses based on the LEAD gold standard [12,39]. This is the first study to evaluate the incremental validity of the CBCL added to YSR DOSs and vice versa.
In this sample of newly referred child and adolescent psychiatric outpatients, the concurrent validity of the parent reports (CBCL DOSs) showed moderate accuracy in predicting the presence of the corresponding disorder (AUC 0.75-0.89) while CD DOS predicted the presence of CD in the sample with high accuracy (AUC = 0.93). The child reports (YSR DOS) predicted the corresponding LEAD-disorder with moderate accuracy. However, the accuracy of the youth ODD and OCD DOSs was low and not significant as opposed to the corresponding parent report. The scales also showed incremental validity when added to each other. However, adding the child as an informant did not increase diagnostic efficiency for ODD and OCD.
The low accuracy for the YSR ODD subscale is at odds with previous studies examining youth in the general population [30] or incarcerated adolescents [46]. Further, the YSR ODD scale had weak internal consistency (α = 0.61), supporting the inadequacy of this subscale in a clinical population. The diagnostic efficiency of the self-report (YSR) OCD scale has not been investigated previously. The low accuracy of the YSR OCD subscale is in line with studies of other self-report instruments for obsessive and compulsive symptoms in young people [47]. Table 5. Sequential logistic regression to test the effects of child and parent report on the DOS scales (using the most optimal cut-off scores) for the prediction of LEAD diagnoses. Cut-off values were chosen based on maximizing efficiency. We found moderate agreement between our cut-offs and LEAD diagnoses for the affective, ADHD, ODD, and OCD CBCLs (Kappa 0.40-0.51) and just slightly below moderate agreement for anxiety and CD (0.35, 0.38). All these cut-off values rendered acceptable sensitivity and specificity (e.g., affective scale: 75% sensitivity and 70% specificity) for screening in a clinical setting. The Kappa for the YSR scales showed moderate agreement with any depression but only fair agreement with anxiety, ADHD, and CD. However, we found poor agreement between the YSR OCD and ODD subscale and the corresponding LEAD diagnosis, reflecting the low AUC levels. Thus, most cutoff scores (especially the affective DOS (for both CBCL and YSR) and CBCL ADHD and OCD scales from the ROC analyses (based on the point where both sensitivity and specificity are optimal) can be used with confidence given that the sample is similar to the sample in our analyses.

LEAD diagnosis
We found clear evidence for both concurrent and discriminant validity of the DOSs for anxiety (CBCL and YSR) and for ODD (CBCL). Surprisingly, both affective and ADHD subscales (CBCL and YSR) predicted but also inversely predicted the presence of any depression or any ADHD. It is remarkable that both the CBCL and the YSR DOSs indicated a lower chance of depression with a high score on ADHD and vice versa despite the established comorbidity between these disorders. However, in this enriched clinical sample, patients with depression had clinically important comorbidity with ADHD but still a significantly lower rate of ADHD than those without depression (35% vs. 61%, p<0.001). We are not aware of any previous studies that have investigated the divergent validity of the DOS in a similar manner. The prevalence of both ADHD and any depressive disorder was high in this sample and the majority of the young people had at least one comorbid disorder [12], thus reflecting the clinical situation in a true manner and making screening and differential diagnostics more complicated.
Overall, we found good evidence that adding the parent as an informant, or vice versa, increases diagnostic precision. This is in line with a study of screening for depression with Mood and Feelings Questionnaire (MFQ), where a combination of parent and patient ratings was better than either rating alone [48]. When data has been analyzed separately across gender, it shows a significant contribution for adding parent ratings for adolescent girls but surprisingly not for boys [19], which would be important to examine further. However, adding the child as the informant to information from parents does not increase diagnostic efficiency for the ODD and OCD DOSs, which corresponds to the findings of the ROC analyses. In addition, adding the parental information for CD DOS to information from the child does not increase diagnostic efficiency while for CD adding the child as an informant to the parent increases the diagnostic efficiency. This is not surprising as parents do not always have full knowledge about disruptive behaviors for adolescents.
The results also revealed that boys scored significantly higher on the CBCL ADHD. We did not find any gender differences in other CBCL DOS. Parents ratings for depression and anxiety were similar across gender while girls´ratings were higher, which is in line with our findings from the Mood and Feelings Questionnaire (MFQ) [19] and Screen for Child Anxiety Related Emotional Disorders [18]. Our MFQ study also showed that parents and girls´report correlated highly. However, the girls scored consequently higher, suggesting that girls express affective symptoms more markedly [19].

Strengths and limitations
The main strength of this study was the large sample of participants from a specialized CAP clinical population. All patients were new referrals without prior contact with psychiatric services. Thus, they had not received any prior psychiatric diagnosis, assessment or psychoeducation. This recruitment is ecologically suitable for testing the screening efficiency of the CBCL/YSR ahead of receiving a diagnosis. LEAD diagnoses were high quality, as they were based both on a semi-structured interview and on further clinical work-up and observations as well as expert consensus by two senior consultants (TI and HJ). The expert consensus work was independent of the ASEBA scores, as no information from the scales was included in the clinical records. There were adequate numbers for ADHD (n = 60), anxiety disorders (n = 48) and depression (n = 61) on the YSR self-report for analyzing concurrent and incremental validity of the DOSs.
However, there were some limitations as well. First, although this was a sizable study, the number of patients in some diagnostic groups was small. For instance, we had only 11 participants with CD and 12 with OCD limiting our analysis strategy, especially for convergent and divergent validity using logistic regression. Second, using LEAD diagnoses based on enhancing K-SADS with information from clinical records is still at risk of including spurious variation.

Conclusion
In a child and adolescent outpatient psychiatric setting, the subscales of CBCL and YSR for ADHD, anxiety disorders, depression, and conduct disorders and the CBCL subscales for ODD and OCD can be used for screening or for enhancing diagnostic assessment. Adding self-report to parent-report and vice versa improves the prediction and is recommended for youths. YSR self-report for OCD and ODD should not be used.