Screening for Autism Spectrum Disorders with the Brief Infant-Toddler Social and Emotional Assessment

Objective Using parent-completed questionnaires in (preventive) child health care can facilitate the early detection of psychosocial problems and psychopathology, including autism spectrum disorders (ASD). A promising questionnaire for this purpose is the Brief Infant-Toddler Social and Emotional Assessment (BITSEA). The screening accuracy with regard to ASD of the BITSEA Problem and Competence scales and a newly calculated Autism score were evaluated. Method Data, that was collected between April 2010 and April 2011, from a community sample of 2-year-olds (N = 3127), was combined with a sample of preschool children diagnosed with ASD (N = 159). For the total population and for subgroups by child's gender, area under the Receiver Operating Characteristic (ROC) curve was examined, and across a range of BITSEA Problem, Competence and Autism scores, sensitivity, specificity, positive and negative likelihood ratio's, diagnostic odds ratio and Youden's index were reported. Results The area under the ROC curve (95% confidence interval, [95%CI]) of the Problem scale was 0.90(0.87–0.92), of the Competence scale 0.93(0.91–0.95), and of the Autism score 0.95(0.93–0.97). For the total population, the screening accuracy of the Autism score was significantly better, compared to the Problem scale. The screening accuracy of the Competence scale was significantly better for girls (AUC = 0.97; 95%CI = 0.95–0.98) than for boys (AUC = 0.91; 95%CI = 0.88–0.94). Conclusion The results indicate that the BITSEA scales and newly calculated Autism score have good discriminative power to differentiate children with and without ASD. Therefore, the BITSEA may be helpful in the early detection of ASD, which could have beneficial effects on the child's development.


Introduction
Preventive child health care offers a systematic opportunity for the early detection of psychosocial problems and psychopathology, such as autism spectrum disorders (ASD), among toddlers. In the Netherlands, preventive child health care for children of ages 0-4 years is delivered through community well-child clinics that provide routine developmental assessment and vaccinations (i.e. well-child visits) and that are free of charge [1].
ASD represents a set of neurodevelopmental disorders that are characterized by impairments in the domains of reciprocal social interactions and communication and by restrictive, stereotyped patterns of behavior [2]. In the current Diagnostic and Statistical Manual of Mental disorders, 5 th edition, ASD's are part of the pervasive developmental disorders and classified into three main categories, namely: autistic disorder, Asperger's disorder and pervasive developmental disorder-not otherwise specified [2]. Studies report ASD prevalence rates of about 1.0% [3,4].
Abnormal functioning that is indicative of ASD starts before 3 years of age [2]. On average, the first symptoms to arouse parental concerns about children eventually diagnosed with ASD occur before the second birthday. However, the average age of ASD diagnosis is approximately three years of age and often occurs later [5]. These findings suggest that it should be possible to detect and diagnose ASD earlier. Early detection of ASD is important because early access to interventions may improve children's outcomes, [6,7] and diagnosis may enhance parent's understanding and coping with the impairments of their child [8].
One approach for facilitating early identification of ASD is the population-based screening of children as part of well-child visits using parent-completed questionnaires [9,10] Several instruments are developed for the early detection of ASD, of which the use of the Checklist for Autism in Toddlers (CHAT) [11] and the Modified Checklist for Autism in Toddlers (M-CHAT) [12] is advocated by autism support organizations [13]. However, early detection instruments that are used in a preventive health care setting should cover a broad range of psychosocial problems, since limited time and capacity in the preventive child health care make it undesirable to screen for each psychosocial problem separately. Also, it has been shown that psychosocial problems tend to cooccur, [14,15] and that individual problems may apply to more than one disorder [16]. In addition to measuring problem domains, it is crucial to also measure competence domains. Delays in the acquisition of competencies are strongly related to a wide range of psychosocial problems later in life [17] and are often the prodromal signs of developmental disorders, such as ASD [18].
The Brief Infant-Toddler Social and Emotional Assessment (BITSEA) [19] is a promising and short (42 items) questionnaire, that measures both problems (Problem scale) and delays in the acquisition of competencies (Competence scale) in 1-3 year olds, and also consists of items designed to measure ASD symptoms. The BITSEA is not designed to diagnose ASD, but it may be useful as a screener for identifying children with this disorder [20]. All correlations were significant (p,0.01). The mean BITSEA score was compared between a group of parents that worried about the development of their child and a group that did not worry. The Problem and Competence score were significantly less favourable in the group of parents that worried, compared to the group of parents that did not worry (effect sizes were respectively 0.93 and 0.52).'' Also the sensitivity and specificity of the BITSEA has been evaluated in several studies [19,26,27] One study, conducted in the United States [19], examined its sensitivity and specificity in a community sample of 1280 children. In this study, children with scores in the clinical range on the Child Behavioral Checklist (CBCL1.5-5) [28] and Infant-Toddler Social and Emotional Assessment (ITSEA) [29,30] were used as reference groups for the evaluation of the Problem scale. A sensitivity of respectively 93.2% and 78.1% and a specificity of respectively 78.0% and 88.8% were found. The Competence scale was evaluated against a group of children with a score in the clinical range on the ITSEA and had a sensitivity of 68.9% and a specificity of 95.1%. Problem scale cutpoints were chosen at scores of $75 th percentile and Competence scale cutpoints were chosen at scores of ,15 th percentile [31]. In a Turkish study [26], in a community sample of 462 children, sensitivity and specificity of only the Competence scale was examined relative to children treated in a child psychiatry outpatient clinic with an autism diagnosis (n = 35). In this study, the sensitivity was 72%-93% and specificity was 76%-85%, depending on the cutpoint chosen. A Dutch study [27] evaluated the screening accuracy of the BITSEA Problem scale more extensively than prior studies. The screening accuracy was evaluated with multiple indices (i.e. area under the curve, sensitivity, specificity, likelihood ratio's, diagnostic odds ratios and Youden's index) by calculating receiver operating character-istic (ROC) curves of the BITSEA Problem scale relative to the CBCL Total Problem scale. Indices of screening accuracy for a range of BITSEA Problem scores were presented, because different cutpoints might be chosen in different settings (e.g. clinical application versus epidemiological research). In that study, the screening accuracy of the BITSEA Competence scale was not evaluated with a reference group of children with a CBCL Total Problem score in the clinical range, since the CBCL Total Problem score does not measure competencies.
In the present study we aim to evaluate the screening accuracy of both the BITSEA Problem and Competence scales with regard to an ASD diagnosis. Additionally, we will evaluate the screening accuracy of the BITSEA items that are specifically intended to signal ASD, since little is known about the performance of these items in the detection of ASD. Previous studies showed differences in mean BITSEA scores between boys and girls (with boys scoring less favourably) [19,22,23], therefore the screening accuracy is also evaluated in subgroups by child gender.

Ethics Statement
Regarding the data collection of the community sample; only anonymous data were used and the questionnaires were completed on a voluntary basis by the parents. Parents received written information on these questionnaires and were free to refuse to participation. Observational research with data does not fall within the ambit of the Dutch Act on research involving human subjects [32] and does not require the approval of an ethics review board. The Medical Ethics Committee of the Erasmus Medical Centre Rotterdam declared to have no objection ('formal waiver') regarding the study protocol and consent procedures. The Medical Ethical Committee of the University Medical Centre St. Radboud Nijmegen approved the study protocol regarding the ASD-study. We are prepared to make the data available upon request.

Design and participants
For the present study, data from two separate samples were combined. First, data from a community sample of 2-year old children was used. These data were gathered between April 2010 and April 2011 by child health care organizations in the context of routine health examinations in the Rotterdam area, the Netherlands. Parents of 3170 children that attended the well-child visit handed in the questionnaire (95.5% of all parents that attended the well-child visit). Children were excluded from the analyses if there were too many missing items on both BITSEA scales [20] (n = 43), leaving a study population of 3127 (94.2%) children. No children in the community sample were under treatment of a mental health professional at the time of inclusion. Details on the design and participants of the community sample are described elsewhere [23].
Second, data from a sample of children diagnosed with ASD were used (i.e. ASD-sample). Children between the ages of 12-40 months were recruited in the DIANE-study (Diagnosis and Intervention of Autism in the Netherlands) [33] at Karakter Child and Adolescent Psychiatry University Center Nijmegen, the Netherlands. Children with a positive score on the Early Screening of Autistic Traits Questionnaire [34] and/or for whom there were major concerns regarding social and communicative development entered the study between spring 2004 and spring 2007. Parents of the ASD-sample completed the ITSEA (i.e. a more comprehensive measure that includes the BITSEA items) at home before their first visit for diagnostic assessments and all children underwent an extensive psychiatric assessment (i.e. administration of the Autism Diagnostic Observation Schedule and Autism Diagnostic Interview-Revised) observations of standardised parent-child play and standardised assessment of cognitive and language skills). Details on the design and participants of the ASD-sample are described elsewhere [35]. For the purpose of this study, answers on BITSEA items were extracted from the larger pool of ITSEA items. Children were excluded from the analyses if they did not receive a diagnosis (n = 29), if they received a diagnosis other than ASD (n = 69) (i.e. false positives), if there were too many missing items on the BITSEA scales [20] (n = 19), or if they were younger than 12 months (n = 2) leaving a study population of 159 (57%) children.

Measures
The BITSEA, designed for 1-to-3-year old children, consists of 42 items with three response options ('not true/rarely'(0), 'somewhat true/sometimes'(1), 'very true/often'(2)) and comprises two multi-item scales; a Problem scale (31 items) and a Competence scale (11 items). The Problem scale assesses socialemotional/behavioral problems such as aggression, defiance, overactivity, negative emotionality, anxiety, and withdrawal. The Competence scale assesses social-emotional abilities such as empathy, prosocial behaviors, and compliance [31]. Responses can be summed for each scale: a high score on the Problem scale and/or a low score on the Competence scale is less favourable [20]. The BITSEA also consists of 17 items that are specifically included for the early detection of ASD belonging to either the Problem scale (9 items) or the Competence scale (8 items). The autism items reflect problems behaviors that are typical of children with ASD (e.g. put things in a special order over and over) and competencies in which deficits are often present in children with ASD (e.g. points to show you something far away) [20]. Although these items formally do not represent a separate scale, we calculated the Autism score analogous to the Problem scale score, yielding a good internal consistency (Cronbach's alpha = 0.77). Answers on the autism items belonging to the Competence scale were first reversed before all autism items were summed, so a higher Autism score would represent more problems and fewer competencies. Children with more than 3 missing items were excluded for analyses (n = 48). Excluded children were all part of the community sample.
Items on standard socio-demographic variables were included: child age and gender.

Analyses
Demographic characteristics and mean BITSEA scores. Differences in mean BITSEA scores and child age between the community sample and the ASD-sample were tested with independent sample t-tests. Differences in gender composition of the community sample and ASD-sample were tested with Chi-square tests.
Screening accuracy. Screening accuracy was evaluated by calculating receiver operating characteristic (ROC) curves, with a reference group that consists of children with a diagnosis of ASD. The area under the ROC curve was examined, along with, for a range of Problem and Competence scale scores and the Autism score; sensitivity, specificity, positive test likelihood ratio (LHR + ) and negative test likelihood ratio (LHR 2 ), diagnostic odds ratio (OR) and Youden's index. All indices for screening accuracy were evaluated for the total sample as well as for boys and girls separately.
The ROC curve is a plot of sensitivity as a function of 1specificity for all possible cutpoints of the BITSEA. The greater the area under the curve (AUC), the more discriminative power the BITSEA has in differentiating children with and without ASD. An AUC.0.90 indicates high accuracy; 0.70#AUC,0.90 indicates moderate accuracy; 0.50#AUC,0.70 indicates low accuracy; and AUC = 0.50 is chance level accuracy [36]. We examined the 95% confidence intervals of the AUCs to evaluate whether the screening accuracy differed significantly between subgroups.
To determine the optimal cutpoint, the Youden index was used, which is defined as the maximum vertical distance between the ROC curve and the diagonal or chance line and is calculated as Youden's index = sensitivity+specificity-1 [37].
Sensitivity is the proportion of true positives that are correctly identified by the test; specificity is the proportion of true negatives that are correctly identified by the test. To further investigate the correctness of classification, likelihood ratios were calculated. LHR + = sensitvitiy/(1-specificity) is the ratio of the probability of a positive test result if the outcome is positive (true positive) to the probability of a positive test result if the outcome is negative (false positive); LHR 2 = (1-sensitivity)/specificity is the ratio of the probability of a negative test result if the outcome is positive (false negative) to the probability of a negative test result if the outcome is negative (true negative). LHR + .7.00 and LHR 2 ,0.30 indicate high screening accuracy [38].
The OR = sensitivity*specificity/((1-sensitivity)*(1-specificity)) = LHR + /LHR 2 of a test is the ratio of the odds of a positive test result when having the 'disorder' relative to the odds of a positive test result when not having the 'disorder'. The values of OR ranges from zero to infinity, with higher values indicating better discriminatory test performance. OR. 20.00 indicate high screening accuracy [38].
The AUC, Youden's index, sensitivity, specificity, LHR + , LHR 2 and OR are independent of prevalence of the 'disorder', as opposed to the positive predictive value and negative predictive value, therefore the latter were not evaluated in this study. [38].

Results
The demographic characteristics of the multiethnic community sample and ASD-sample are presented in Table 1. In comparison to the community sample, the ASD-sample consisted of older children (t = 58.3, p,0.001) and more boys (X 2 = 50.2, p,0.001).

Mean BITSEA scores
The mean Problem and Competence scale scores and the Autism score are presented in Table 1. In comparison to children in the community sample, children in the ASD-sample scored less favourably on the Problem scale (t = 28.1, p,0.001), the Competence scale (t = 29.9, p,0.001) and Autism score (t = 37.3, p, 0.001).

Screening accuracy
ROC curves of the Problem and Competence scale scores and Autism score are presented in Figure 1. In Table 2, the AUC and sensitivity, specificity, LHR + , LHR 2 , OR and Youden's index are presented for a range of BITSEA scale, for the total population and for subgroups by child gender. indicated the same optimal cutpoint for the total population and for boys and girls for the Problem scale (score 13) and for the Competence scale (score 15).
In Table 3 AUCs and sensitivity, specificity, LHR + , LHR 2 , OR and Youden's index are presented for a range of Autism scores for the total population and for subgroups by child gender. The AUC was 0.95(0.93-0.97) and the screening accuracy was equal for girls (AUC = 0.97; 95%CI = 0.95-0.99) and boys (AUC = 0.93; 95%CI = 0.91-0.96). The Youden index indicated different optimal cutpoint for the total population (score 10) and for boys (score 9) and girls (score 8).
The scores in the general population with the highest Youden index as cutpoints for the Problem and Competence scale and Autism score yielded concern level of ASD of respectively 16.1%, 10.1% and 6.9% children.

Discussion
The present study evaluated the screening accuracy of the Problem and Competence scales and the newly calculated Autism score for a community sample in comparison to a sample that consists of children with an ASD diagnosis. Our results indicate that the Problem and Competence scales and the Autism score have high screening accuracy to detect ASD (i.e. AUC.0.90).
In our study we present the sensitivity and specificity for a range of BITSEA scores, because different cutpoints might be chosen in different settings (e.g. clinical application versus epidemiological research). For the comparison of the sensitivity and specificity with results of other studies we chose to discuss the sensitivity and specificity for the optimal cutpoint as indicated by the Youden index. In comparison with the prior Dutch study [27] on the screening accuracy of the BITSEA Problem scale with regard a CBCL Total Problem score in the clinical range, we found similar  Multiple values for sensitivity and specificity of the BITSEA are reported in the study conducted in the US, because different indicators were used to classify a 'clinical group', and also in the Turkish study, because in their study a range of BITSEA cutpoints was applied. The US-study [19] found comparable mean sensitivity and specificity for the Problem scale as in our study. However, for the Competence scale in the US-study, a lower sensitivity and slightly higher specificity were found, compared to our study. The Turkish study [26] found slightly higher mean sensitivity and lower mean specificity for the Competence scale, compared to our study. However, the different methods to determine sensitivity and specificity (i.e. different indicators of a 'clinical group' and different methods to determine cutpoints), make it difficult to compare results across these studies. The Youden index yielded the same cutpoints for boys and girls on the Problem and Competence scales. These results differ from what was found in the US-study [19], where the cutpoints on the Problem scale in children aged 24-29 months differed between boys (score 14) and girls (score 13) and also differed on the Competence scale (girls, score 15; boys, score 14). The Turkish study [26] found the same cutpoint (score 12) on the Competence scale in children aged 24-35 months, for both boys and girls. These differences between studies might be attributed to different characteristics of the study populations. Also, in the Turkish study, the ASD sample size (n = 35) was much smaller compared to our ASD sample size (n = 159). The screening accuracy of the newly calculated Autism score was equal for boys and girls, however, the scores with the highest Youden's index differed between boys (score 9) and girls (score 8). Even though the Autism score consists of less items (17 items), its screening accuracy for ASD was better for the total population than the Problem scale (31 items), but not better than the Competence scale (11 items). The Autism score is formally not a separate BITSEA scale and the findings of the present study imply that calculation of the Autism score is unnecessary when the Competence score is known. It was to be expected that the screening accuracy of the Autism score would be at least equally well as the screening accuracy of the Competence scale, since the Autism score consists of 8 of the 11 Competence items. However, the addition of the items from the Problem scale does not further improve the screening accuracy of the Autism score.

Limitations and strengths
Our study has some limitations. First, the BITSEA scores for the ASD-sample are based on BITSEA items that were extracted from the larger pool of ITSEA items, since parents of children in the ASD-sample completed the ITSEA.
Second, as it is expected that children with typical development acquire more competencies with age, previous studies have found higher Competence scores in older children, compared to younger children. [19,22]. Our community sample consisted of a homogeneous sample with regard to age (M = 23.7, SD = 0.7). Therefore, it may not be appropriate to generalise our findings on screening accuracy of the Competence scale to children of other ages.
Third, the ASD-sample differed significantly from the community sample with regard to child's gender (more boys), and age (older children). It is likely that these characteristics might have influenced mean BITSEA scale scores; previous studies have found that mean BITSEA scores for boys are less favourable [19,22,23] and that mean Competence scores increase with age [19,22]. Therefore, differences in mean BITSEA scores between the community and ASD-sample might not solely be attributed to the ASD, but also to the demographic characteristics of the samples.
To compensate for these differences between conditions, we applied propensity score matching post-hoc. This yielded a sample of 900 matched cases: 750 children in the community sample en 150 in the ASD-sample, with a statistically equal boy/girl ratio (community sample: 74.5% boys, ASD-sample: 80,0% boys). There was still a significant (p,0.001) difference between matched cases regarding age (community sample: M = 28.9; SD = 7.5, ASD-sample: M = 31.8; SD = 6.4), however the effect size, Cohen's d, was small; 0.38 [39]. We calculated the AUC for the ROC-curves again for the matched sample, and no significant differences (i.e. no overlapping confidence intervals) were found compared to our prior results (data not shown). Fourth, we do not have follow-up data on the community sample with regard to an ASD diagnosis. However, since the estimated prevalence of ASD is 1% [3,4], we may assume that 31 children out of 3127 children will receive a diagnosis of ASD. It is difficult to estimate exactly what the effect is on our results. However, if the effect would be significant (i.e. a community sample with definitely no children with ASD would lead to other results), the mean BITSEA scores of that community sample would be more favourable than in the present study. This would mean an even larger difference in BITSEA scores, compared to the ASD sample, possibly leading to larger AUC and better sensitivity and specificity than we have found in the present study. So, due to this limitation we rather underestimate than overestimate the 'true' results.
A strength of our study is that the analyses were performed on a large community sample and ASD-sample which adds to the power of the study. Moreover, children in the ASD-sample were diagnosed by experienced clinicians and diagnoses were based on extensive multidisciplinary diagnostic procedures.
Additionally, another strength of our study is that parents completed the questionnaire prior to receiving a diagnostic evaluation. So parents were not biased by knowledge of an ASD diagnosis when answering the questions.

Future research
This study evaluated the screening accuracy of the BITSEA for ASD specifically. We recommend future studies to evaluate the screening accuracy of the BITSEA for a broader range of psychosocial problems.

Conclusions
Both the Problem and Competence scales and the Autism score have a good screening accuracy with regard to ASD for the total population and for boys and girls separately. The Autism score does not have added value to the already existing Competence score; for the screening of ASD, the Competence score is just as effective as the Autism score. Furthermore, the BITSEA is a short questionnaire and has in earlier research shown to have good reliability and validity. As mentioned before, in the introduction, early detection instruments that are used in preventive health care should cover a broad range of psychosocial problems. The BITSEA might therefore precede more extensive evaluations on ASD with other instruments, (e.g. the M-CHAT), by more specialized mental health care providers, when scores on the BITSEA indicate concern for ASD. The results of this study indicate that the BITSEA is suitable for use in the setting of (preventive) child health care for the early identification of ASD.