Screening Accuracy and Clinical Application of the Brief Infant-Toddler Social and Emotional Assessment (BITSEA)

Background The Brief Infant-Toddler Social and Emotional Assessment (BITSEA) is a promising questionnaire for the early detection of psychosocial problems in toddlers. The screening accuracy and clinical application were evaluated. Methods In a community sample of 2-year-olds (N = 2060), screening accuracy of the BITSEA Problem scale was examined regarding a clinical CBCL1.5-5 Total Problem score. For the total population and subgroups by child’s gender and ethnicity Receiver Operating Characteristic (ROC) curves were calculated, and across a range of BITSEA Problem scores, sensitivity, specificity, likelihood ratio’s, diagnostic odds ratio and Youden’s index. Clinical application of the BITSEA was examined by evaluating the relation between the scale scores and the clinical decision of the child health professional. Results The area under the ROC curve (95% confidence interval) of the Problem scale was 0.97(0.95–0.98), there were no significant differences between subgroups. The association between clinical decision and BITSEA Problem score (B = 2.5) and Competence score (B = −0.7) was significant (p<0.05). Conclusions The results indicate that the BITSEA Problem scale has good discriminative power to differentiate children with and without psychosocial problems. Referred children had less favourable scores compared to children that were not referred. The BITSEA may be helpful in the early detection of psychosocial problems.


Introduction
Preventive child health care has always focused on the detection of physical conditions. Recently, its focus has been expanded to mental health issues [1], offering an opportunity for the early detection of psychosocial problems among preschool children. In the Netherlands, preventive child health care for young children is delivered through community well-child clinics that are free of charge and provide routine developmental assessment and vaccinations [2]. One approach for facilitating early recognition and identification of psychosocial problems is to use parentcompleted questionnaires as part of routine primary care visits (i.e. well-child visits) [3]. Early detection instruments of psychosocial problems, intended for use in preventive child health care, should have adequate psychometric properties (i.e. reliability and validity), and should also be short, easy to administer, score, and interpret [4,5]. Furthermore, early detection instruments should be able to correctly discriminate children with and without psychosocial problems. Of course early detection will not be without errors, but should be as accurate as possible as to minimize the expenses associated with over-referrals and under-detection [4]. The identification of cutpoints and their concomitant accuracy is therefore important, since this enables child health professionals to determine how many responses that indicate psychosocial problems, reliably indicating the actual presence of psychosocial problems. When cutpoints are identified, indices for screening accuracy (e.g. sensitivity and specificity) can be established for the questionnaire.
In the setting of preventive child health care general early detection instruments are warranted, since the aim is to early detect a broad range of possible psychosocial problems. The Child Behavior Checklist 1.5-5 (CBCL1.   [6] and Infant-Toddler Social and Emotional Assessment (ITSEA) [7,8] are early detection instruments that are well-validated and measure a broad range of psychosocial problems, and in the case of the ITSEA also delays in competencies. However both instruments are too extensive to apply in the context of well-child visits. The ITSEA has been reported to have adequate psychometric properties [7,8], and exists in a shorter version; the Brief Infant-Toddler Social and Emotional Assessment (BITSEA). The BITSEA comprises 42 items that measure psychosocial problems as well as delays in the acquisition of competencies in toddlers (1-3 year olds). Previous studies have shown that the BITSEA has adequate reliability and validity [9][10][11].
Previous studies have evaluated the sensitivity and specificity of the BITSEA [9,12]. One study, performed in Turkey among a community sample of 462 children, examined the sensitivity and specificity of only the BITSEA Competence scale relative to children treated in a child psychiatry outpatient clinic with an autism diagnosis. In this study, the sensitivity was 72%-93% and specificity was 76%-85%, depending on the cutpoint chosen [12]. In another study, performed in the United States of America, the sensitivity and specificity was examined in a community sample of 1280 children. In this study, the BITSEA Problem scale had, relative to the CBCL1.5-5 a sensitivity of 93.2% and a specificity of 78.0%. The BITSEA Competence scale was examined relative to low ITSEA Competence, and had a sensitivity of 68.9% and a specificity of 95.1% [9]. BITSEA Problem scale cutpoints were chosen at scores of $75 th percentile and for the BITSEA Competence scale cutpoints were chosen at scores of ,15 th percentile [13].
In the present study we aim to evaluate the screening accuracy more extensively than prior studies. The screening accuracy will be evaluated with multiple indices (i.e. area under the curve, sensitivity, specificity, likelihood ratio's, diagnostic odds ratio and Youden's index) by calculating receiver operating characteristic (ROC) curves of the BITSEA Problem scale relative to a CBCL Total Problem score in the clinical range. In our study we present indices of screening accuracy for a range of BITSEA Problem scores, because different cutpoints might be chosen in different settings (e.g. clinical application versus epidemiological research). The screening accuracy of the BITSEA Competence scale was not evaluated with a reference group of children with a CBCL Total Problem score in the clinical range, since the CBCL Total Problem score does not measure competencies. Previous studies showed differences in mean BITSEA scores between boys and girls (with boys scoring less favourably) [9][10][11] and between native and non-native children (with non-native children scoring less favourably) [11], therefore the screening accuracy will also be evaluated in subgroups by gender and ethnicity.
Furthermore, we evaluated the clinical application of the BITSEA Problem and Competence scale, by comparing BITSEA scores with the clinical decision of the child health professionals. We hypothesised that the clinical decision of the child health professional is associated with on the one hand higher BITSEA Problem scale scores, as high BITSEA Problem scale scores are expected to be indicative of problems, and on the other hand lower BITSEA Competence scale scores, as low BITSEA Competence scale scores are expected to be indicative of problems (i.e. delays in competencies).

Ethics Statement
Only anonymous data were used and the questionnaires were completed on a voluntary basis by the parents. Parents received written information on these questionnaires and were free to refuse to participation. Observational research with data does not fall within the ambit of the Dutch Act on research involving human subjects [14] and does not require the approval of an ethics review board. Informed consent was obtained for the use of the CBCL, since this data collection was not part of the routine health examinations. The Medical Ethics Committee of the Erasmus Medical Centre Rotterdam declared to have no objection ('formal waiver') regarding the study protocol and consent procedures.

Design and Participants
The data was gathered between April 2010 and April 2011 by child health care organizations in the context of routine health examinations in the Rotterdam area, the Netherlands. Before coming to the well-child visit, parents were invited to complete the BITSEA and CBCL1.5-5. Child health professionals were trained to score the answers given by parents on the BITSEA and use the cutpoint as identified in the US [15] in their assessment whether children are at risk for, or currently experiencing, psychosocial problems. Child health professionals were blind for the answers on the CBCL. Parents of 3170 2-year old children that attended the well-child visit handed in the questionnaire (95.5% of all parents that attended the well-child visit). Of these parents, 2184 (68.9%) parents gave informed consent for participation in the study. Children were excluded from analyses if there were too many missing items on the BITSEA Problem (.5) and Competence (.2) scales [15] (n = 50) and CBCL1.5-5 (.8) (n = 74), leaving a study population of 2060 (94.3%) children (mean age: 23.7 months, SD = 0.7; 49.6% boys, 33.8% non-native). None of the children were under treatment of a mental health professional at the time of inclusion. Details on the design of the study are described elsewhere [11,16]. The community sample consists of 43 (2.1%) children (mean age: 23.9, SD = 0.7; 51.2% boys, non-native 69.8%) that scored in the clinical range of the CBCL Total Problem score (raw score.60).

Measures
The BITSEA is designed for children aged 12 months to 36 months and consists of 42 items with three response options ('not true/rarely', 'somewhat true/sometimes', 'very true/often'). There are two multi-item scales, a Problem scale (31 items) and a Competence scale (11 items). The Problem scale assesses socialemotional/behavioral problems such as aggression, defiance, overactivity, negative emotionality, anxiety, and withdrawal. The Competence scale assesses social-emotional abilities such as empathy, prosocial behaviors, and compliance [13]. Responses should be summed for each scale, a high score on the Problem scale or a low score on the Competence scale is less favourable [15]. Previous studies showed that the BITSEA has good reliability and validity [9][10][11]. A study in the Netherlands yielded for the BITSEA Problem and Competence scale respectively an internal consistency (Cronbach's alpha) of 0.76 and 0.63 a test-retest reliability (intraclass correlation, ICC) of 0.75 and 0.61; parentchildcare provider interrater reliability (ICC) of 0.30 and 0.17; and significant positive correlations (Problem scale) and significant negative correlations (Competence scale) with the CBCL1.5-5 Total Problem score [11].
The CBCL1.5-5 is a well-validated [6] 100-item questionnaire designed for children aged 18 months to 5 years and has two domains (Internalising and Externalising) that are combined to give a Total Problem score. Answers are given on a 3-point scale ('not true', 'somewhat or sometimes true' and 'very or often true'). Children with a Total Problem score greater than 60, score in the clinical range of the CBCL1.5-5.
Ethnicity of the child was determined based on parental country of birth: a child was considered native if both parents were born in the Netherlands [17].
Clinical decision was measured as the decision of the child health professional whether or not to refer the child to a more specialized mental health professional and/or to request a followup consultation, as recorded in the electronic medical file.
Hereinafter, we will only mention 'referral' as clinical decision, but it also entails the request for follow-up consultation.

Analyses
Mean and median BITSEA scores and CBCL1.5-5 Total Problem score. Differences in mean BITSEA Problem and BITSEA Competence scores and CBCL Total Problem score between subgroups by child gender and ethnicity are tested with independent sample t-tests. Differences in median and distribution of BITSEA Problem and BITSEA Competence scores and CBCL Total Problem score between children with and without a clinical CBCL Total Problem score were tested with a Mann-Whitney U test for the total population and for subgroups by child gender and ethnicity. A Mann-Whitney U test was appropriate since the subgroups were small and the assumption of normality could not be met.
Screening accuracy. Screening accuracy of the BITSEA Problem scale was evaluated by calculating receiver operating characteristics (ROC) curves, with a reference group that consists of children with a CBCL Total Problem score in the clinical range. Area under the ROC curve was examined, along with, for a range of BITSEA Problem scale scores; sensitivity, specificity, positive test likelihood ratio (LHR + ) and negative test likelihood ratio (LHR 2 ), diagnostic odds ratio (OR) and Youden's index. All indices for screening accuracy were evaluated for the total sample as well as for boys and girls and for native and non-native children separately.
The ROC curve is a plot of sensitivity as a function of 1specificity for all possible cutpoints of the BITSEA. The greater the area under the curve (AUC), the more discriminative power the BITSEA has in differentiating children with and without psychosocial problems. An AUC$0.90 indicates high accuracy; 0.70#AUC,0.90 indicates moderate accuracy; 0.50#AUC, 0.70 indicates low accuracy; and AUC = 0.50 is chance level accuracy [18]. We examined the 95% confidence intervals of the AUCs to evaluate whether the screening accuracy differed significantly between subgroups.
The Youden index was calculated, which is defined as the maximum vertical distance between the ROC curve and the diagonal or chance line and is calculated as Youden's index = sensi-tivity+specificity21. The higher the Youden index, the more optimal a cutpoint is [19].
Sensitivity is the proportion of true positives that are correctly identified by the test; specificity is the proportion of true negatives that are correctly identified by the test. To further investigate the correctness of classification, likelihood ratios were calculated. LHR + = sensitvitiy/(12specificity) is the ratio of the probability of a positive test result if the outcome is positive (true positive) to the probability of a positive test result if the outcome is negative (false positive); LHR 2 = (12sensitivity)/specificity is the ratio of the probability of a negative test result if the outcome is positive (false negative) to the probability of a negative test result if the outcome is negative (true negative). LHR + .7.00 and LHR 2 ,0.30 indicate high screening accuracy [20].
The OR = sensitivity*specificity/((12sensitivity)*(12specificity)) = LHR + / LHR 2 of a test is the ratio of the odds of a positive test result when having the 'disorder' relative to the odds of a positive test result when not having the 'disorder'. The values of OR ranges from zero to infinity, with higher values indicating better discriminatory test performance. OR.20.00 indicate high screening accuracy [20].
The AUC, Youden's index, sensitivity, specificity, LHR + , LHR 2 and OR are independent of prevalence of the 'disorder', as opposed to the positive predictive value and negative predictive value [20].
Clinical Application. The clinical application of the BIT-SEA was explored by evaluating the relation between BITSEA Problem and Competence scores and the decision of the child health professional to refer to another mental health professional.
Because the observations within child health professional were not independent from each other, a multilevel regression analyses was used to evaluate the relation between the clinical decision as independent variable and the BITSEA scale scores as dependent variable, corrected for confounding effects of child's gender and ethnicity. The effect sizes of the differences in mean BITSEA scale scores between children that were and were not referred were defined as Cohen's d = [mean(worried)-mean(not worried)]/SD worried. [21] A small effect is defined as 0.20#d,0.50; a medium effect is defined as 0.50#d,0.80; and a large effect is defined as d$0.80. Additionally, frequencies of referral for children scoring in the clinical range of the BITSEA were evaluated. Cutpoints are set at the 25 th percentile for the Problem scale; and at 15 th percentile for the Competence scale, as is specified in the manual of the BITSEA [15].
Data regarding the clinical decision of the child health professional were available for 1579 (76.7%) children (combined data of 481 (23.3%) children were unavailable due to missing patient-codes and differences in registration methods between child health care organizations). Significant differences between the group with complete data and the group with incomplete data were found only for the mean CBCL Total Problem score (p = 0.04). Similar differences were not observed for mean BITSEA Problem score, mean BITSEA Competence score, age of the child and parents, child gender and ethnicity, country of birth of the parents, person who completed the questionnaire and family composition. The effect size of the significant differences in CBCL Total Problem score between the group with complete data and the group with incomplete data, however, was small and was taken as an indication that the data were 'missing at random' (mean CBCL Total Problem score, Cohen's d = 0.10).
Multilevel regression analyses were performed in SAS software version 9.2 (SAS Institute Inc., 2009). All other analyses were performed in SPSS 19.0 (SPSS Inc. 2010).

Mean and Median BITSEA Scores and CBCL1.5-5 Total Problem Score
Mean BITSEA Problem scale score, BITSEA Competence scale score and CBCL Total Problem score and standard deviations for the total population and subgroups by child gender and ethnicity are presented in Table 1. All mean scores differed between boys and girls and native and non-native children (p,0.01), except for the mean CBCL Total Problem score between boys and girls (p = 0.96). Table 2 presents the median scores and 25 and 75 percentile for children with and without a CBCL Total Problem score in the clinical range, for the total subpopulations as well as for subgroups by child gender and ethnicity. Between the subpopulations of children with and without a CBCL Total Problem score in the clinical range, all distributions and median scores differed significantly (p,0.05), except for the median BITSEA Competence score between girls (p = 0.18) and native children (p = 0.22). Within the subpopulation of children without a CBCL Total Problem score in the clinical range, all distributions and median scores differed between subgroups of gender and ethnicity (p,0.01), except for the CBCL Total Problem score distribution (p = 0.96) and median (p = 0.87) between boys and girls. Within the subpopulation of children with a CBCL Total Problem score in the clinical range, all distributions and median scores did not differ between subgroups of gender and ethnicity (p.0.05), except for the Competence score distribution (p = 0.03) between boys and girls.
In Table 3 are the AUC and sensitivity, specificity, LHR+, LHR-, OR and Youden's index presented for a range of BITSEA Problem score cutpoints, for the total population and for subgroups by gender and ethnicity.
The Youden index indicated the same optimal BITSEA Problem scale cutpoint for boys and girls (score 14), whereas a different optimal cutpoint was indicated by the Youden index for native children (score 17) and non-native children (score 14).

Clinical Application
Of the 1579 children with complete data of both the parent and the child health professional, child health professionals referred 96  Table 4.
Of the children with a score in the clinical range on the Problem scale or Competence scale or CBCL respectively 9.5%; 7.4% and 26.7% were referred.

Discussion
The present study evaluated the screening accuracy of the BITSEA Problem scale for a large community sample in comparison to a subsample of children with a CBCL Total Problem score in the clinical range. Furthermore, we evaluated the clinical application of the BITSEA Problem and Competence scale. Our results indicate that the BITSEA Problem scale has  high screening accuracy and that BITSEA scores were less favourable for children that were referred.

Screening Accuracy
The BITSEA Problem scale has a good screening accuracy when compared to a group of children with a CBCL Total Problem score in the clinical range (i.e. AUC.90). The study performed in the US [9] found comparable sensitivity for the BITSEA Problem scale as in our study, whereas the specificity in our study was higher.
In our study the Youden index yielded the same optimal cutpoint for boys and girls (score 14). In the US-study, for the age range 24-29 months, score 14 was also identified as the cutpoint for boys, whereas the cutpoint for girls was 13. These different Table 3. Screening accuracy for a range of BITSEA Problem scores, relative to a CBCL Total problem score in the clinical range. results between studies might be attributed to different characteristics of the study populations and to the different methods of indicating a cutpoint. Also, as opposed to the US-study, in our study completion of the BITSEA was not anonymous, since the answers were used by the child health professional to assess the child's development. In our study we found different optimal cutpoints for native and non-native children, where native children differed from the other (sub)samples in cutpoint as indicated by the Youden index; score 17. The mean BITSEA Problem scores differed significantly between boys and girls and native and non-native children, but the difference in mean BITSEA Problem scores between native and non-native children was larger (effect size = (mean non-native 2 mean native )/sd non-native = 0.43) than the difference in mean BIT-SEA Problem scores between boys and girls (effect size = (mean boys 2mean girls )/sd boys = 0.15), which might explain the different optimal cutpoints between native and non-native children and not between boys and girls. The outcome that the screening accuracy of the BITSEA is the same for native and non-native children is valuable. However, the application of different cutpoints for different ethnic groups in preventive child health care may not be desirable, since it is difficult to determine whether the different distribution and mean BITSEA scores can be attributed to the actual amount or seriousness of problems, or that it reflects cultural differences (e.g. in interpretation of behavior, or question items). Moreover, the composition of ethnic groups may change over time, which would mean the (continuous) evaluation and adjustment of cutpoints.
The BITSEA Competence scale was excluded from screening accuracy analyses because the content of the items of the BITSEA Competence scale do not resemble the content of the items on the CBCL Total Problem score. This is supported by the low and nonexistence of correlations between the mean BITSEA Competence scores and the CBCL Total Problem score found in prior studies [9][10][11]. The decision not to include the BITSEA Competence scale in the analyses seems also (partly) justified by the results of the present study that the median BITSEA Competence score did not differ between girls and native children with and without a CBCL Total Problem score in the clinical range.

Clinical Application
The BITSEA Problem and Competence score were significant, respectively positively and negatively, associated with the child health professional's decision whether or not referral was required. These results indicate that scores were less favourable for children who were referred, compared to children who were not referred. However, the difference in mean BITSEA Problem and Competence scores were small (0.20#d,0.50).
The child health professionals referred 7.4-9.5% of the children with a score in the clinical range on either BITSEA scale and 26.7% of the children that score in the clinical range of the CBCL1.5-5. The frequency of children that were referred was relatively small (n = 96). Moreover, only 30 (2%) children of whom we had complete data of the parent and child health professional, had a score in the clinical range of the CBCL1.5-5 Total Problem score, of which 8 were referred. These small frequencies might have caused a power problem. Other studies found percentages, comparable to our referral frequencies on the CBCL1.5-5. In one study child health professionals referred (or requested a follow-up consultation) 22.4% children with a high score on both the parent and teacher completed SDQ (.P90) [22]. In two other studies child health professionals referred (or requested a follow-up consultation) 19% of the children with a score in the clinical range on the CBCL [23] and ITSEA [24]. However, in these latter two studies the child health professional was blind to the questionnaire score, as were the professionals in our study for the CBCL1.5-5, this might also partly explain the difference in frequencies. Not all children with a score in the clinical range on an early detection instrument are referred, possibly because the problematic emotions or behaviors are mild or are considered to be temporarily (e.g. after a major life event). Then, the child health professional may offer advice about how to cope with the circumstances instead of referring the child to more specialized care [15]. Also, the degree of concerns the parents have about their child's development is likely to play a role in the clinical decision of the child health professionals, since child health providers are found to be more likely to refer when parents are concerned about their child's behavior [25,26].

Strengths and Limitations
A major strength of our study is that the analyses of screening accuracy were performed on a large and diverse community sample, which adds to the power of the study. Additionally, the answers on the BITSEA were not anonymous, since the child health professional used the BITSEA to assess the child's  development during the well-child visit, this could be seen as either a strength or a limitation. The parents could have completed the BITSEA more seriously; on the other hand it could also have led to more socially desirable answers.
Our study also has some limitations. First, in our sample a low percentage (i.e. 2.1%) of children had a CBCL Total Problem score in the clinical range, whereas based on the literature a higher percentage (i.e. 6.5-12.5%) was expected [23,24,27]. This might be indicative of a response bias: not all parents with children with (serious) psychosocial problems may come to the well-child visit, possibly because they are already under treatment of a specialized mental health professional, or because they did not wish to participate in the study. Different cutpoints might be the result of the response bias, as opposed to when the sample consisted of more children with CBCL Total Problem scores in the clinical range. However, the percentage of parents that attended the wellchild visit and also completed the questionnaire is quite high (i.e. 95.5%), indicating that the sample is a good reflection of the population in the Rotterdam area that make use of the preventive child health care, may complete the questionnaire in the future and on whom the cutpoints should be applied. However, as a consequence, the subgroups of child gender and ethnicity in the 'clinical range sample' are quite small. This does not lead to large confidence intervals since the confidence intervals are calculated based on the large total study population.
Another limitation of our study is the use of children with a CBCL Total Problem score in the clinical range, a subsample of the community sample, as a reference group for the ROC analyses. This excluded the possibility to evaluate the screening accuracy of the BITSEA Competence scale. Moreover, the criterion-related validity of the CBCL (criterion in this case being the presence of psychosocial problems) might limit the quality of findings on screening accuracy. However, the CBCL1.5-5 is a well validated questionnaire and often used as a gold standard for research and clinical work among broad-band early detection instruments for psychosocial problems.
The study was performed in the Netherlands with a Dutch population; this might hamper generalizations to populations of other cultures. However, no difference in screening accuracy was found between native and non-native Dutch children. Moreover, a previous study also showed no difference between native and nonnative Dutch children in reliability and validity of the BITSEA [11], suggesting that the BITSEA performs equally well for samples of different cultures.

Future Research
We recommend future studies to evaluate the screening accuracy of both the BITSEA Problem and Competence scale with a reference group of children with a broad range of psychosocial problems who are diagnosed by a mental health professional. Additionally, evaluating the clinical application of the BITSEA in a larger sample, and including the concerns of parents regarding their child's development in these analyses, might provide more insight in the value of the BITSEA in the preventive child health care. Furthermore, we recommend evaluating the application of the determined BITSEA cutpoints by child health professionals and the subsequent adherence of the referrals by parents.

Conclusion
The BITSEA Problem scale shows a good screening accuracy with regard to psychosocial problems as indicated by the CBCL1.5-5, for the total population and for subgroups of child gender and ethnicity. Furthermore, the clinical application of the BITSEA was as hypothesised; less favourable scores for children that were referred, compared to children that were not referred. These results indicate that the BITSEA may be suitable for use in the preventive child health care.