Prediction of child and adolescent outcomes with broadband and narrowband dimensions of internalizing and externalizing behavior using the child and adolescent version of the Strengths and Difficulties Questionnaire

The Strengths and Difficulties Questionnaire (SDQ) is a frequently used screening instrument for behavioral problems in children and adolescents. There is an ongoing controversy—not only in educational research—regarding the factor structure of the SDQ. Research results speak for a 3-factor as well as a 5-factor structure. The narrowband scales (5-factor structure) can be combined into broadband scales (3-factor structure). The question remains: Which factors (narrowband vs. broadband) are better predictors? With the prediction of child and adolescent outcomes (academic grades, well-being, and self-belief), we evaluated whether the broadband scales of internalizing and externalizing behavior (3-factor structure) or narrowband scales of behavior (5-factor structure) are better suited for predictive purposes in a cross-sectional study setting. The sample includes students in grades 5 to 9 (N = 4642) from the representative German Health Interview and Examination Survey for Children and Adolescents (KiGGS study). The results of model comparisons (broadband scale vs. narrowband scales) did not support the superiority of the broadband scales with regard to the prediction of child and adolescent outcomes. There is no benefit from subsuming narrowband scales (5-factor structure) into broadband scales (3-factor structure). The application of narrowband scales, providing a more differentiated picture of students’ academic and social situation, was more appropriate for predictive purposes. For the purpose of identifying students at risk of struggling in educational contexts, using the set of narrowband dimensions of behavior seems to be more suitable.


Introduction
Internalizing and externalizing behavior problems are considered a substantial risk factor for students' social and academic well-being [1]. Both dimensions are consistently associated with

Broadband and narrowband scales of behavior
Some studies on the psychometric properties of the SDQ [13,25,26,28,29] have confirmed the original 5-factor structure (emotional symptoms, peer problems, conduct problems, hyperactivity, and prosocial behavior) proposed by Goodman [10]. At the same time, several studies have highlighted potential model fit shortcomings of the 5-factor structure [30][31][32][33]. As a reaction to these concerns, several authors have proposed a 3-factor model as a possible alternative to the original 5-factor model [13,33]. In the revised 3-factor model, the narrowband subscales of emotional symptoms and peer problems are combined into the broadband scale for internalizing behavior, and the narrowband subscales of hyperactivity and conduct problems are subsumed into the broadband scale for externalizing behavior (the narrowband subscale of prosocial behavior remains unchanged). Recent research has provided partial support for the appropriateness of the revised 3-factor model. In studies comparing the different factor structures (3 vs. 5) in terms of model fit, the superiority of either the 3-factor structure [34][35][36] or the 5-factor structure [13,[37][38][39] or also the adequateness of both factor structures [13,25,28,33] have been documented. Goodman et al. ([7] p. 1189) conclude that "there may be no single best set of subscales to use in the SDQ; rather, the optimal choice may depend in part upon one's study population and study aims." Subsuming narrowband subscales into broadband scales takes place from a clinical perspective and represents the well-known hierarchical model of child and adolescent psychopathology [40,41]. For example, the SDQ narrowband scales of hyperactivity and conduct problems describe distinctive psychopathological phenomena but are often co-occurring [42], which is why both scales form a broadband externalizing scale. The broadband perspective on child and adolescent behavior and emotions leads to a two-dimensional taxonomy of psychopathology distinguishing between internalizing and externalizing behavior. The question of whether child and adolescent psychopathology is best described by narrowband or broadband measures is the subject of ongoing debate [43][44][45]: "[. . .] the once plausible goal of identifying homogeneous populations for treatment and research resulted in narrow diagnostic categories that did not capture clinical reality, symptom heterogeneity within disorders, and significant sharing of symptoms across multiple disorders. The historical aspiration of achieving diagnostic homogeneity by progressive subtyping within disorder categories no longer is sensible [. . .]" ( [32] p. 12) However, it must also be considered that different narrowband dimensions of behavior (e.g., aggressive vs. non-aggressive rule-breaking behavior) are related to different etiological factors [46]. Vice versa, different narrowband dimensions of behavior might explain educational outcome variables to varying degrees, for example, conduct problems are associated with arithmetic skills (r = -. 20), while the association is stronger for hyperactivity (r = -.38) [14]. Subsuming these narrowband dimensions of behavior into a broader category of behavior might therefore lead to a loss of information; that is, differentiated effects (between narrowband scales) in predicting educational outcomes are not described by a broadband scale. The usage of narrowband scales could provide a nuanced description of the association between dimensions of behavior and child and adolescent outcomes. prediction of outcomes is sparse. The question remains: Which factors (narrowband vs. broadband) are better predictors? In addition, the vast majority of previous research using the SDQ has focused on parent-or teacher-reported student behavior. There is a lack of studies that examine the associations between self-reports of students on the SDQ and relevant outcomes. However, children and adolescents can be considered experts of their own well-being [47] and consequently might depict a valid and important source of information on their own behavior.
In the present study, therefore, we examined how behavioral and emotional problems, measured by means of the different self-report SDQ scores (narrowband and broadband scales), are associated with child and adolescent outcomes, such as measures of academic success (grades), well-being (school, friends, and family), and self-belief (self-esteem and self-efficacy). We thereby assume that narrowband scales of behavior are more informative predictors of outcomes than broadband scales of behavior. This comparison (narrowband vs. broadband dimensions of behavior) seems of particular importance for emphasizing the need to differentiate behavioral problems when examining associations with child and adolescent outcomes.

Study design and participants
The analyzed cross-sectional sample was obtained from the baseline of the German Health Interview and Examination Survey for Children and Adolescents (KiGGS study) [48]. The KiGGS study is a nationally representative health survey comprising children and adolescents. The survey's main objective is to obtain information on key physical and mental health indicators, risk factors, health service utilization, health behavior, and living conditions of children and adolescents in Germany. Study participants were not recruited in schools (non-nested data structure) but randomly selected from the official registers of local residents. The data were collected from 2003 to 2006. The KiGGS sample consists of 17641 children and adolescents aged 0 to 17 years. The children were given a physical examination and the parents as well as the children and adolescents themselves (from age 11 on) were interviewed via written questionnaires. The study was approved by the Charité/Universitätsmedizin Berlin ethics committee and the Federal Office for the Protection of Data.
The present analysis represents a secondary data analysis with a focus on behavior/emotions and child and adolescent outcomes such as measures of well-being (school, friends, and family), academic success (grades), and self-belief (self-esteem and self-efficacy), which are factors of interest to professionals in educational contexts. We will focus on the child/adolescentreported data and refer to children and adolescents of compulsory education age. Compulsory education in Germany usually ends with the completion of grade 9 (usually 15-year-old adolescents). The survey's questionnaires were addressed to children and adolescents from the age of 11 years onwards (usually children in grade 5 and above). The present sample therefore includes children and adolescents with a minimum age of 11 years in grades 5 to 9 (N = 4642; 52% boys; age in years: M = 13.46, SD = 1.47, Min = 11.00, Q 1 = 12.17, Md = 13.42, Q 3 = 14.67, Max = 17.92). The distribution of the children and adolescents across grades 6 to 9 is nearly equal (approximately 21.5% in each grade, but 14% in grade 5). The proportion of children and adolescents in grade 5 (usually 10-and 11-year-olds) is small, as individuals under the age of 11 are not included in the present sample.

Measures
Child and adolescent behavior and emotions. The German self-report version of the Strengths and Difficulties Questionnaire (SDQ) was used to assess child and adolescent behavior and emotions [10,49]. This questionnaire (25 items) quantifies emotional symptoms, peer problems, conduct problems, hyperactivity, and, as a dimension of strength, prosocial behavior (original narrowband scales). Emotional symptoms and peer problems can be combined into a broadband internalizing behavior subscale, while conduct problems and hyperactivity can be subsumed into a broadband externalizing behavior subscale [13]. The prosocial behavior subscale was not considered in the present study, because the focus was on a comparison of the two broadband subscales with the underlying narrowband subscales. SDQ items are rated on a three-point scale (0 for "not true," 1 for "somewhat true," and 2 for "certainly true"). High subscale scores indicate elevated behavioral problems. The children and adolescents with the highest subscale scores, which are the upper 10% of the normative sample, can be categorized as "abnormal" and are considered to be at risk for psychiatric disorders [10,50]. We therefore refer to these individuals as at-risk children and adolescents. A conservative classification rule [51], which minimizes false positive cases by selecting a cutoff value below 10%, was used to identify at-risk children and adolescents (emotional symptoms � 6, peer problems � 5, conduct problems � 5, hyperactivity � 7, internalizing behavior � 9, and externalizing behavior � 10; calculations based on KiGGS baseline data, Table 1). With regard to the selfreport version used in the KiGGS study (data at hand), the internal consistencies (Table 2) for the subscales of peer problems and conduct problems are insufficient (α and ω � .50), but moderate for the subscales of emotional symptoms, hyperactivity, and internalizing and externalizing behavior (α and ω � .60).
Child and adolescent outcomes. Health-related quality of life. The self-report version of the KINDL-R is a brief questionnaire to measure the health-related quality of life of children and adolescents [52]. The subscales school (e.g., "doing my schoolwork was easy"), friends (e.g., "I played with friends"), self-esteem (e.g., "I was proud of myself"), and family (e.g., "I got on well with my parents") describe the students' well-being related to daily school life, friendship, family life, and feelings of self-worth. Each subscale consists of 4 items. Items are rated on a five-point scale (1 for "never," 2 for "seldom," 3 for "sometimes," 4 for "often," and 5 for "all the time"). High scores indicate a positive quality of life in the specific domain. With regard to the KiGGS study (data at hand), the internal consistencies (Table 2) of the mentioned subscales are mediocre (α and ω range from .53 to .69). The subscales physical and emotional well-being are not used in the present study.
School grades. The school grades (math and German) received on the last report card (halfyear term) were reported by the parents. Germany uses a 6-point grading scale. School grades vary from 1 (excellent) to 6 (insufficient), which were reversed so higher values indicate a better academic performance.
General self-efficacy. The general self-efficacy scale is a 10-item questionnaire that was designed to assess optimistic self-beliefs in coping with a variety of difficult demands in life [53] (e.g., "I can always manage to solve difficult problems if I try hard enough"). In the KiGGS study, the scale was only used in adolescents aged 14 years and older (N = 1750). Items are rated on a four-point scale (1 for "not at all true," 2 for "hardly true," 3 for "moderately true," and 4 for "exactly true"). High scores indicate stronger self-efficacy. With regard to the KiGGS study (data at hand), the internal consistency (Table 2) of the scale is good (α and ω > .80).

Statistical analysis strategy
Ordinary least square regression models will be formulated with regard to the prediction of child and adolescent outcomes (outcomes regressed on SDQ subscales). The term "prediction" and cognate terms are used here in a statistical sense and shall not be confused with the concept of predictive validity, which describes the ability of a measure to forecast outcomes in the future [54]. To judge the statistical predictive performance of the different subscales of the SDQ (broadband vs. narrowband), the regression model with the broadband subscale (model 1: outcome regressed on broadband subscale, e.g., internalizing behavior) will be compared to the regression model with both underlying narrowband subscales jointly as predictors in one regression model (e.g., model 2: outcome regressed on emotional symptoms and peer problems). This model comparison (narrowband vs. broadband) will be conducted with regard to each predicted outcome and separately for internalizing and externalizing behavior (internalizing behavior vs. emotional symptoms and peer problems; externalizing behavior vs. conduct problems and hyperactivity). To evaluate the predictive performance of the different models (predictive performance of the broadband and narrowband subscales), we report two goodness-of-fit indices for each regression model. The adjusted R 2 is the proportion of variance in the outcome that is predictable from the predictors (SDQ subscales). The Akaike Information Criterion (AIC) takes into account both model complexity (total number of estimated model parameters) and goodness of model fit (maximized likelihood) and balances these two [55]. The individual AIC values are not interpretable. However, the smaller the AIC value, the better the model fit. Consequently, models with less complexity (fewer predictors) along with a high goodness of fit are deemed to be good models. If the difference in AIC values between the models is less than 3 (model with broadband scale vs. model with underlying narrowband scales), then the model with the higher AIC value is almost as good as the model with the smaller AIC value [55]. For the application of AIC model selection in the fields of psychology and psychometrics, see Vrieze [56].
The SDQ subscales are used as dummy variables. The reference is the at-risk category ("abnormal"). Therefore, the intercept (constant) of each regression model is interpretable as the expected average outcome for the at-risk children and adolescents. Since all the outcomes are standardized (M = 0, SD = 1), the intercept represents the average outcome for the at-risk group as a deviation from the overall sample mean in units of standard deviation. The regression parameters (B) for all the other SDQ subscale scores are interpretable as the average difference in the outcomes (in units of standard deviation) between the at-risk group and the children and adolescents with the particular SDQ subscale score. These types of analyses emphasize the clinical category "abnormal" (at-risk). In some additional regression analyses, we will use the SDQ subscales as continuous predictors. If the SDQ subscale is a continuous predictor, the intercept represents the average outcome for children without behavioral problems (SDQ subscale score equals zero) as a deviation from the overall sample mean in units of standard deviation, and the regression parameter (B) is the average change (slope) of the outcome (in units of standard deviation) when the SDQ score increases on average by one unit. All statistical analyses were conducted in R 3.6.0.

Model fit and measurement invariance of the 3-and 5-factor structures of the SDQ.
Confirmatory factor analyses (weighted least square mean and variance adjusted estimation) reveal an appropriate model fit (RMSEA < 0.08, for details see [57]) for both the 3-and 5-factor structures of the SDQ, although the 5-factor structure shows a better model fit (RMSEA = .05, CFI = .89, TLI = .88, χ 2 = 3250.87, df = 265, p = .00) than the 3-factor structure (RMSEA = .06, CFI = .85, TLI = .83, χ 2 = 4540.10, df = 272, p = .00). However, CFI (< .90) and TLI (< .95) do not indicate good fit for both the 3-and 5-factor structures. Both the 3-and 5-factor structures meet the standards for metric invariance [58,59] across gender and age groups (multigroup confirmatory factor analysis; comparison of metric and configural model: difference in the models' CFI � .01), which can be interpreted to indicate that the measured dimensions of behavior (narrowband and broadband scales) manifest in the same way across boys and girls as well as different age groups (quartile age groups in years: [11,12.2], (12.2,13.4], (13.4,14.7], and (14.7,17.9]). As the goal of the present paper is to compare the statistical predictive performance of the broadband (3-factor structure) and narrowband (5-factor structure) scales of behavior and as it is not the goal to highlight differences between boys and girls or different age groups, sex and age are not considered as predictors of the child and adolescent outcomes.

Descriptive results
Based on the SDQ narrowband subscales, the proportion of at-risk children and adolescents ranges between 5.3% and 9.2% (conduct problems: 5.3%; emotional symptoms: 6.2%; peer problems: 6.5%; hyperactivity: 9.2%), while the SDQ broadband subscales reveal a proportion of 7.9% (internalizing behavior) and 10% (externalizing behavior) of at-risk children and adolescents ( Table 1). The correlations, means, and standard deviations of the SDQ subscales and the child and adolescent outcomes are displayed in Table 2. All outcomes are positively associated. Increased positive correlations are observed between the grades in math and German (r = .47), as well as between the KINDL-R subscales of family and school (r = .35). The general self-efficacy scale is likewise considerably correlated with the KINDL-R subscales of selfesteem (r = .40) and friends (r = .32). The different SDQ subscales were negatively correlated with all outcomes, which means that increased behavioral problems measured by means of the different SDQ subscales are associated with lower values for the outcomes, indicating adverse outcomes. The KINDL-R friends subscale is highly correlated with the SDQ subscales of peer problems (r = -.49) and internalizing behavior (r = -.48). The grades (math and German) were only weakly correlated with the SDQ subscales of peer problems and internalizing behavior (r ranges from -.04 to -.10). Also, the correlation between the German grades and emotional symptoms is close to zero (r = -.03). Another small correlation is between the KINDL-R subscale for friends and hyperactivity (r = -.08).

Main results: Predictive performance of the SDQ subscales
Internalizing behavior vs. emotional symptoms and peer problems. Each child and adolescent outcome is regressed on the different SDQ subscales, which are the broadband subscale for internalizing behavior (model 1) and both underlying narrowband subscales, i.e., emotional symptoms and peer problems jointly as predictors in one regression model (model 2). The regression coefficients (B) and model fit parameters (R 2 and AIC) are displayed in Table 3.
With regard to the results of models 1 and 2, it can be stated that the associations between the outcomes and the SDQ subscales are negative, which means that increased behavioral problems as indicated by high SDQ subscale scores are associated with lower values of the outcomes, indicating adverse outcomes. Taken as a whole, the at-risk children and adolescents have the lowest average outcome values (the intercept ranges from -1.61 to -0.11).
With reference to the model fit parameters, the narrowband subscales of emotional symptoms and peer problems (model 2: jointly as predictors in one regression model) outperform the broadband subscale of internalizing behavior (model 1) in the prediction of all outcomes (comparable R 2 and lower AIC values), except for the prediction of general self-efficacy (AIC value favors the predictive performance of the broadband subscale).
However, the predictive performance of the narrowband and broadband scales is poor with regard to the prediction of the grades (math and German), i.e., the proportion of explained variance is close to zero (R 2 � .01). Therefore, it is hard to judge the predictive superiority of one of the SDQ subscales (with regard to grade prediction), although the AIC values favor the predictive performance of the narrowband subscales of emotional symptoms and peer problems (model 2). Besides this, model 2 offers a deeper insight into the magnitude of the effect sizes of the two narrowband subscales. For example, in the prediction of the KINDL-R school subscale, the regression coefficients for emotional symptoms are remarkably higher (B ranges from 0.37 to 1.30) than the coefficients for peer problems (B ranges from 0.04 to 0.32). As well, in the prediction of the KINDL-R family subscale and in the prediction of the math grade, the emotional symptoms subscale shows noticeably higher coefficients than the peer problems subscale (family: B ranges from 0.22 to 0.86 vs. 0.06 to 0.26, math: B ranges from 0.04 to 0.34 vs. 0.03 to 0.06). The situation is reversed in the case of predicting the KINDL-R friends subscale, i.e., higher coefficients are observable for peer problems (B ranges from 0.62 to 1.66), while lower coefficients are detected for emotional symptoms (B ranges from 0.23 to 0.69). Also in the prediction of the German grade, the peer problems subscale shows higher coefficients than the emotional symptoms subscale (B ranges from 0.02 to 0.30 vs. -0.10 to 0.00). This detailed information about the differences in effect sizes between the two narrowband subscales (model 2) is not depicted when the broadband subscale for internalizing behavior is used as a predictor (model 1).
The results are almost the same when the SDQ subscales are considered as continuous predictors (Table 4). With regard to the model fit parameters, the narrowband subscales of emotional symptoms and peer problems (model 2: jointly as predictors in one regression model) outperform the broadband subscale for internalizing behavior (model 1) in the prediction of all outcomes (comparable or lower R 2 and lower AIC values), except for the prediction of self-esteem and general self-efficacy (AIC values favor the predictive performance of the broadband subscale). Differences in effect sizes (B) between the two narrowband subscales (model 2) are apparent.
Externalizing behavior vs. conduct problems and hyperactivity. Each child and adolescent outcome is regressed on the different SDQ subscales, which are the broadband subscale for externalizing behavior (model 1) and both underlying narrowband subscales, i.e., conduct problems and hyperactivity jointly as predictors in one regression model (model 2). The regression coefficients (B) and model fit parameters (R 2 and AIC) are displayed in Table 5.
With regard to the results of models 1 and 2, it can be stated that the associations between the outcomes and the SDQ subscales are negative, which means that increased behavioral With regard to the model fit parameters, the narrowband subscales for conduct problems and hyperactivity (model 2: jointly as predictors in one regression model) outperform the broadband subscale of externalizing behavior (model 1) in the prediction of all outcomes (comparable or higher R 2 and lower AIC values).
In addition, model 2 offers a deeper insight into the magnitude of the effect sizes of the two narrowband subscales. For example, in the prediction of the KINDL-R friends subscale, the regression coefficients for conduct problems are remarkably higher (B ranges from 0.12 to 0.57) than the coefficients for hyperactivity (B ranges from -0.05 to 0.20). Similarly, in the prediction of the KINDL-R family subscale, the conduct problems subscale shows considerably higher coefficients (B ranges from 0.20 to 1.14) than the hyperactivity subscale (B ranges from 0.10 to 0.52). The situation is reversed in the case of predicting the math grade, i.e., higher coefficients are observable for hyperactivity (B ranges from 0.17 to 0.72), while lower coefficients are detected for conduct problems (B ranges from 0.18 to 0.31). In addition, in the prediction of general self-efficacy, the hyperactivity subscale shows noticeably higher coefficients (B ranges from 0.02 to 0.97) than the emotional symptoms subscale (B ranges from -0.07 to 0.33). This detailed information about the differences in effect sizes between the narrowband subscales (model 2) is not depicted when the broadband subscale for externalizing behavior is used as a predictor (model 1).
If the SDQ subscales are considered as continuous predictors (Table 6), the broadband subscale for externalizing behavior (model 1) outperforms the narrowband subscales (model 2) in the prediction of the KINDL-R subscales of school and self-esteem as well as in the prediction of the German grade (comparable R 2 and lower AIC values). In these predictions (KINDL-R school and selfesteem subscales as well as the German grade), there are no differences in effect sizes (B) between the two narrowband subscales, but they are apparent in the predictions of the other outcomes (KINDL-R subscales for friends and family, as well as math grades and general self-efficacy).

Discussion
For the first time, the SDQ broadband and narrowband scales were compared with regard to their criterion validity in predicting child and adolescent outcomes. The results of the study indicated the relevance of the different SDQ subscales for the description of students' socioemotional and academic situation. At the same time, the results could not support a superiority of the broadband subscales with regard to prediction of the outcomes. This interpretation can be described for the internalizing and externalizing behavior subscales (except for the prediction of general self-efficacy, where the internalizing behavior scale shows the best model fit). If the SDQ scales are considered as continuous predictors, the broadband scale for internalizing behavior outperforms the narrowband scales in the prediction of general self-efficacy and self-esteem. The same holds true for the prediction of the KINDL-R subscales of school and self-esteem, as well as for the prediction of the German grade, where the continuous externalizing subscale shows the best model fit. At any rate, in all cases where a continuous broadband scale outperforms the underlying narrowband scales, the difference in AIC values between the models is less than 3 [55], i.e., the models with narrowband scales are almost as good as the models with the broadband scales. The use of sum scores for the narrowband subscales of emotional symptoms and peer problems is more informative with respect to the range of predicted outcomes. The models using the narrowband scales indicate that there are differences between emotional symptoms and peer problems with regard to their effect on different outcome variables. This information is not depicted through the use of the broadband subscale of internalizing behavior, which might therefore lead to a loss of information. At the same time, it must be stated that children and adolescents might exhibit symptomatic behaviors related to emotional symptoms but not have major peer problems or vice versa [60]. Similar observations can be made for conduct problems and hyperactivity [61]. It can be assumed that different categories of behavioral problems (e.g. conduct problems and attention deficit/hyperactivity) also go in hand with the development of different outcomes over time [62,63]. Similarly, different conditions and predictors might lead to either the one or the other narrowband behavior. Moreover, different developmental trajectories become clear when focusing different narrowband behaviors problems (e.g. emotional problems and peer problems) [60,64,65], Aggregating scores of narrowband into broadband scales might therefore run the risk of blending distinctive behaviors associated with different developmental outcomes. This evidence seems to strengthen the assumptions of Tandon et al. ([66] p. 593) who argue that: "[. . .] a major shift, and advance in this area, has been the study of more discrete differentiated disorders instead of lumping of all internalizing symptoms into one broad category of the two-dimensional internalizing versus externalizing taxonomy of childhood psychopathology." Therefore, the differentiation between different narrow facets of internalizing behavior seems to be of particular importance in the description of child and adolescent psychopathology and the prediction of relevant outcomes. With respect to the broadband subscale of externalizing behavior, these assumptions can also be partially supported. In line with similar previous findings, differences between the narrowband subscales of conduct problems and hyperactivity with regard to their effect on the outcomes can also be described for most predictions.

Limitations
In this context, however, it must be noted that the chosen outcome variables do not fully describe the levels of educational development of children and adolescents. Further research is desirable that applies in-depth assessment of educational outcomes, such as domain-specific academic achievement, social integration, cognitive or self-regulation processes and uses a multi-informant approach (especially data reported by educators and teachers are of relevance). Compared to school grades, a domain-specific assessment (e.g., reading comprehension) would provide a more detailed picture of the academic performance. Besides this, the reporting of school grades from the last report card by the parents might be prone to recall bias. In addition, some of the chosen outcome variables (KINDL-R scales school and friends) show only low internal consistencies and might therefore not be reliable. Unsatisfactory internal consistency can also be described with regard to some of the SDQ scales used (conduct problems and peer problems). However, it is important to note that poor reliability does not necessarily affect the goodness of predictive analysis [67].

Conclusion
Despite the aforementioned limitations of the study at hand, the results shed light on the predictive abilities of different subscales of the SDQ. In addition to previous studies [13,25,28], these insights can be used for a further discussion of the advantages and disadvantages of different factor structures of the SDQ. In the sense of predicting educational outcomes, no advantage of the broadband scales (resulting from the 3-factor structure) become clear. The application of narrowband scales (resulting from the 5-factor structure), providing a more differentiated picture of the socio-emotional and academic situation of students, seems to be more appropriate for the prediction of child and adolescent outcomes. This interpretation is of course limited, as it refers to a selection of criteria and needs to be replicated with further educational outcome variables. Furthermore, future research should examine the validity of these results when using parent-or teacher-reported student behavior. In addition, the finding that differentiation of behavioral problems might be a benefit for the description of educational outcomes should be replicated using other behavior assessment tools than the SDQ. Nonetheless, for the purpose of identifying students at risk of struggling in educational contexts, using the set of narrowband dimensions of behavior seems to be more appropriate in educational research and practice.

Implications
The study at hand indicated the need of focusing narrowband behaviors in educational practice in order to gain the most differentiated insights into possible predictors of the emotionalsocial as well as academic development of children and adolescents. It becomes clear, that different narrowband behaviors are more or less associated with different outcome variables. Subsuming behaviors into broadband categories in educational practice might lead to the fact that students, who might have been identified as at-risk because of salient narrowband behavior, might not be identified as at-risk in broadband categories (classification accuracy). In extension to these results of our analysis, the question arises, whether in educational practice, focusing narrowband behaviors might also be the most appropriate approach with regard to educational planning. Consequently, Casale et al. [68] argue, that the early identification of atrisk students and the provision of individualized intervention might be a key advantage of applying universal screening procedures (e.g. the SDQ) in schools. Gaining insights in specific behaviors might offer the most detailed information on educational needs, which might be addressed in subsequent behavioral interventions. Subsuming behaviors might however lead to a loss of relevant information.