Detection of pediatric developmental delay with machine learning technologies

Shin-Bo Chen; Chi-Hung Huang; Sheng-Chin Weng; Yen-Jen Oyang

doi:10.1371/journal.pone.0324204

Abstract

Objective

Accurate identification of children who will develop delay (DD) is challenging for therapists because recent studies have reported that children who underwent early intervention achieved more favorable outcomes than those who did not. In this study, we have investigated how the frequencies of three types of therapy, namely the physical therapy, the occupational therapy, and the speech therapy, received by a child can be exploited to predict whether the child suffers from DD or not. The effectiveness of the proposed approach is of high interest as these features can be obtained with essentially no cost and therefore a prediction model built accordingly can be employed to screen the subjects who may develop DD before advanced and costly diagnoses are carried out.

Methods

This study has been conducted based on a data set comprising the records of 2,552 outpatients (N = 34,862 visits, mean age = 72.34 months) collected at a hospital in Taiwan from 2012 to 2016. We then built 3 types of machine learning based prediction models, namely the deep neural network models (DNN), the support vector machine (SVM) models, and the decision tree (DT) models, to evaluate the effectiveness of the proposed approach.

Results

Experimental results reveal that in terms of the F1 score, which is the harmonic mean of the sensitivity and the positive predictive value, the DT models outperformed the DNN models and the SVM models, if a high level of sensitivity is desired. In particular, the DT model developed in this study delivered the sensitivity at 0.902 and the positive predictive value at 0.723.

Conclusions

What has been learned from this study is that the frequencies of the therapies that a child has received provide valuable information for predicting whether the child suffers from DD. Due to the performance observed in the experiments and the fact that these features can be obtained essentially without any cost, it is conceivable that the prediction models built accordingly can be wide exploited in clinical practices and significantly improve the treatment outcomes of the children who develop DD.

Citation: Chen S-B, Huang C-H, Weng S-C, Oyang Y-J (2025) Detection of pediatric developmental delay with machine learning technologies. PLoS One 20(5): e0324204. https://doi.org/10.1371/journal.pone.0324204

Editor: Zeheng Wang, Commonwealth Scientific and Industrial Research Organisation, AUSTRALIA

Received: August 28, 2024; Accepted: April 21, 2025; Published: May 20, 2025

Copyright: © 2025 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The dataset employed in this study is available in file "Supporting data".

Funding: Our work was partially supported by National Taiwan University grant #FD107016. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: DD, developmental delay; ASD, autism spectrum disorder; DT, decision tree; SVM, support vector machine; DNN, deep neural network; OTS, occupational therapy service; PTS, physical therapy service; STS, speech therapy service; OPD, outpatient department

Introduction

Developmental delay (DD) refers to a distinct set of early childhood developmental disabilities, and it is primarily diagnosed by assessing a child’s behavioral and mental capacities [1]. Rehabilitation physicians employ various methods, strategies, and diagnostic tools to diagnose and treat DD, including classification techniques. However, these classification methods are often subjective, time-consuming, and prone to inconclusive results. Moreover, they fail to clarify the underlying causes and are ineffective for early detection [2]. Early intervention significantly improves a child’s likelihood of reaching their full potential, with studies reporting that children who receive early intervention achieve more favorable outcomes than those who do not. Therefore, accurate classification of DD is crucial for providing effective early intervention services that ensure positive outcomes for children with DD [3].

Machine learning has been applied to develop novel computational methods that incorporate mathematical learning, statistical estimation, and information theories [4]. These methods automatically identify meaningful patterns within large datasets. A key advantage of machine learning is its ability to generate highly accurate and reliable predictions based on data comprising multiple variables. Additionally, machine learning enables causal inference from non-experimental datasets [5].

Psychiatric studies have successfully demonstrated that machine learning methods can be used to diagnose autism spectrum disorder (ASD) [6], classifying attention deficit hyperactivity disorder (ADHD) [7] based on altered event–related potentials, and identifying schizophrenia through free speech analysis [8]. For example, Bishop et al. employed machine learning to analyze the lifetime health issues of adults with ASD, accurately predicting cardiovascular, urinary, and respiratory conditions [9].

Studies have reported that that most cases of DD gradually resolve over time [10]. However, few studies have explored the application of machine learning methods to identify predictive factors and optimize rehabilitation therapy frequency for improved outcomes.

The novelty of our approach lies in utilizing therapy frequency as a predictive factor for DD classification. Machine learning methods enable valuable patterns to be automatically extracted from large data sets, and this capability is difficult to achieve with traditional statistical methods. This study proposes that machine learning can be used to identify key predictors of DD outcomes and thereby provide support for the development of personalized interventions and effective therapeutic strategies.

Researchers, including Osman Altay et al. [11], have used various classification methods to diagnose ASD, employing algorithms such as linear discriminant analysis (LDA) and K-nearest neighbors (K-NN). Notably, LDA has demonstrated higher precision compared to K-NN. Fatiha Nur, Ali Öztürk compared classification methods and reported achieving more favorable outcomes with the random forest method than with methods based on K-NN, naïve Bayes, and radial basis function (RBF) networks [12]. These literature findings underscore the importance of selecting appropriate algorithms in achieving accurate DD classification.

Integrating brain imaging data into machine learning models has produced promising results for enhancing DD classification. For example, Dvornek et al. combined phenotypic data with resting-state functional magnetic resonance imaging (rs-fMRI) and applied deep learning techniques for ASD classification [13]. Similarly, Liao et al. proposed a novel model based on community structure and deep learning, and they achieved improved accuracy with this model relative to that achieved using traditional methods [14]. Dekhil et al. integrated anatomical and functional information from structural MRI (sMRI) and functional MRI (fMRI), and they successfully applied their model to distinguish between autism and normal development [15]. These findings highlight the potential of incorporating brain imaging data to enhance DD classification.

Methods based on analyzing cortical measures and functional communication patterns have emerged as a valuable means of understanding DD. For instance, Yun Jiao et al. employed surface-based morphometry to classify ASD, revealing cortical thickness as a key predictive feature [16]. Heinsfeld et al. objectively classified patients with ASD by studying functional communication patterns derived from functional brain imaging and identifying relevant neural structures [17]. Research suggests that deep learning techniques can effectively differentiate individuals with ASD from those with typical development, reinforcing the utility of cortical measures and functional communication patterns in understanding DD.

Notable advances have been made in the development of algorithms for DD classification. Bone et al. developed highly effective, adaptable, and reliable algorithms for DD classification, outperforming existing methods. Their algorithms, which could weigh sensitivity and specificity separately, yielded promising results when they were used to analyze Autism Diagnostic Interview-Revised scores and Social Responsiveness Scale scores [18]. Jin Y et al. demonstrated the feasibility of using machine learning methods to classify infants at high risk of ASD at as early as 6 months after birth. Their multi-kernel support vector machine (SVM) classification system, which incorporated white matter tract and whole-brain integration features, exhibited improved accuracy relative to single-scale parameter networks [19]. A notable study by Kim et al. demonstrated the outstanding performance of an SVM system for predicting the prognosis of Class III malocclusion, with the system outperforming conventional statistical methods [20]. These advancements have led to the development of promising methods for enhancing early detection of and intervention strategies for DD.

Researchers have applied numerous methods for predicting DD. Table 1 presents a comprehensive summary of the existing machine learning based predictors for identifying patients who develop DD.

Download:

Table 1. A summary of the existing machine learning based predictors for identifying patients who may develop DD.

https://doi.org/10.1371/journal.pone.0324204.t001

In summary, the studies addressed above primarily relied on DD symptoms to develop machine learning based prediction models. However, limitations pertaining to the accuracy and availability of the diagnostic data may impact the reliability of these machine learning based approaches [21–24]. In this study, we have proposed that the frequencies of three types of therapy, namely the physical therapy, the occupational therapy, and the speech therapy, received by a child can be exploited to predict whether the child suffers from DD or not. The effectiveness of the proposed approach is of high interest as these features can be obtained with high accuracy and essentially without any cost. Therefore, a prediction model built accordingly can be employed to screen the subjects who may develop DD before advanced and costly diagnoses are carried out. Due to the performance observed, it is anticipated that the proposed prediction models can be wide exploited in clinical practices and significantly improve the treatment outcomes of the children who develop DD.

Methods

Data collection and outcome measurement

In the present study, all patients included in the clinical group were previously given a diagnosis based on the criteria established in the Diagnostic and Statistical Manual of Mental Disorders-V-TR (DSM-5-TR) [25–26]. For example, the DSM-5-TR defines autism spectrum disorder (ASD) as involving persistent deficits in social communication across multiple environments, as outlined in the relevant diagnostic criterion. Assessments of comorbid psychiatric diagnoses and development of treatment plans were completed by child psychiatrists. The main caregivers of the included participants received assistance from rehabilitation therapists with gathering sociodemographicand rehabilitation clinical information and completing several forms.

Assessments of DD symptoms were conducted by a rehabilitation physician who used the Rehabilitation Developmental Evaluation Form. In the outpatient department (OPD) of the study hospital, children with DD or child and adolescent psychiatry patients typically received rehabilitation therapy, and a data form was used to update their medical service records, which included information pertaining to the frequencies of occupational therapy service (OTS), physical therapy service (PTS), and speech therapy service (STS). The dataset used in this study comprises the medical records of the outpatients who visited the rehabilitation clinic of the study hospital with suspected DD between January 1, 2012, and December 31, 2016. The Institutional Review Board of the En Chu Kong Hospital reviewed the above documents and approved the study on 2024/07/23 (ECK-IRB Number: ECKIRB1130501). This approval is valid till 2025/07/22. To protect patient information and confidentiality, no subject names were collected. Each patient was anonymously assigned a study ID. Supporting information S1 Table provides the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes used to define DD. The patients’ records extracted from the dataset included age, gender, and the frequencies of OTS, PTS, and STS received. Specific DD problems and disabilities were determined using a comprehensive literature review and after a consensus was reached by rehabilitation physicians and child psychiatry specialists. The International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes used to identify various types of DD are presented in Supporting information Table S2. Fig 1 illustrates the participant selection process as a flow diagram. This study identified 2552 outpatients with age under 12 years who made one or more OPD visits. Among these patients, 1719 (67.4%) had DD. The total number of OPD visits was 34,862. Table 2 presents the demographic and clinical characteristics of the patients with DD. It is observed that due to the strict flowchart employed by the hospital, the outpatient medical records from which our dataset was derived are highly accurate and include minimal missing data and few unmeasured confounding variables.

Download:

Fig 1. Flow diagram for generating the study dataset.

https://doi.org/10.1371/journal.pone.0324204.g001

Download:

Table 2. Demographic and clinical characteristics of patients with DD (n = 2552).

https://doi.org/10.1371/journal.pone.0324204.t002

The present study was approved by the ethics committee of En Chu Kong Hospital prior to data collection. Informed consent was waived by the committee because of the de-identification and non-interventional design of the present study.

Experimental procedures

The present study extracted information from OPD records, including data on demographic characteristics, such as gender, age, and the frequencies of therapy services used (OTS, PTS and STS). The patients were divided into two groups, namely a DD group and a non-DD group. Fig 2 presents the experimental procedure that was employed to assess the performance of several prediction models.

Download:

Fig 2. The experimental procedure.

https://doi.org/10.1371/journal.pone.0324204.g002

Feature selection

In this study, we included 4 features in our dataset: gender and frequencies of OTS, PTS, and STS. In this respect, the frequency of a particular therapy service was defined to be the average number of services received by a patient in one year. Then, we conducted chi-squared tests to figure out whether a feature was correlated to the outcome variable [27–29]. For feature gender, we carried out the chi-squared test of independence. For the frequencies of OTS, PTS, and STS, we carried out the chi-squared test of goodness of fit, with the null hypothesis set to the average frequency of patients with DD equal to the average frequency of patients without DD [30]. Table 3 shows the p-values obtained from these tests. Accordingly, we included gender and frequencies of OTS, PTS, and STS to build the prediction models.

Download:

Table 3. Feature selection with the Chi-square Test.

https://doi.org/10.1371/journal.pone.0324204.t003

Development of prediction models and performance evaluation

In this study, we investigated the prediction performance of three categories of machine learning models, namely the DT models [31–33], the SVM models [20], and the DNN models [34]^. The DT models are preferred by many clinicians due to the explicit decision rules output by the algorithm. On the other hand, the SVM models and the DNN models are two categories of the most advanced machine learning models that can generally outperform the DT models due to the non-linear transformations invoked in the prediction process. However, the non-linear transformations invoked also make it almost impossible for a user to comprehend how the prediction is made. As a result, many clinicians are reluctant to trust the models that work like a black box. Therefore, it is of interest to investigate how the performance of alternative categories of machine learning models compares. If the performance of the DT models observed in the experiments is comparable with the performance of the advanced machine learning models, which was observed in our recent studies [35,36], then the DT models are favorite due to explicit decision rules output by the algorithm.

In order to obtain comprehensive pictures of how each category of prediction models performed, we employed alternative parameter settings to generate prediction models with different performance characteristics. Table 4 provides a summary of the software packages and alternative parameter settings employed to build the prediction models. Then, we conducted 10-fold cross validation to evaluate the performance characteristics of each prediction model generated [37–39]. The performance metrics considered in this study include accuracy, sensitivity, specificity, positive predictive value (PPV as known as precision), and F1 score. The F1 score, which is the harmonic mean of the sensitivity and the PPV, is commonly employed in machine learning research and has increasingly been employed in biomedical research [40]. Furthermore, for each category of prediction models, e.g., the DT models, the SVM models, or the DNN models, we evaluated its overall performance based on the area under the receiver operating characteristic (ROC) curve. In order to generate the ROC curve, we picked up the prediction model that delivered the highest F1 score at each level of sensitivity.

Download:

Table 4. Software packages and parameter settings employed to build the models.

https://doi.org/10.1371/journal.pone.0324204.t004

Results

Fig 4 shows ROC curves [41,42] and the corresponding areas under the curves (AUCs) of the DT, SVM, and DNN models. Table 5 shows the detailed performance data of the models that delivered sensitivities at the 0.80 level and at the 0.90 level. It is observed that the DNN models and the DT models outperformed the SVM models in terms of AUC. On the other hand, as shown in Table 5, if a high level of sensitivity is desirable, then the DT models significantly outperformed the DNN models and the SVM models in terms of the F1 score, which is the harmonic mean of the sensitivity (also called recall) and the positive predictive value (PPV, also called precision).

Download:

Fig 3. The structure of the DT model generated by feeding our dataset into the software package and with cp and prior set to 0.01 and 0.55, respectively.

https://doi.org/10.1371/journal.pone.0324204.g003

Download:

Table 5. Detailed performance characteristics of alternative prediction models.

https://doi.org/10.1371/journal.pone.0324204.t005

Download:

Fig 4. ROC curves of the DNN, DT, SVM models.

https://doi.org/10.1371/journal.pone.0324204.g004

Based on the data shown in Table 5, it is conceivable that the DT model that delivered the sensitivity at the 0.90 level is the favorite choice due to two reasons. Firstly, the PPV with this particular DT model is significantly higher than the PPVs with the SVM model and the DNN model that delivered the same level of sensitivity. Therefore, in clinical applications, the number of false positive predicted by this DT model should be significantly lower than the numbers of false positive predicted by the SVM model and the DNN model with the same level of sensitivity. Secondly, the PPV with this DT model is almost the same as the PPV with the DT model that delivered the sensitivity at the 0.80 level. Accordingly, in the subsequent discussions, we will focus on the DT model that delivered the sensitivity at the 0.90 level.

Fig 3 shows the structure of the DT model generated by feeding our dataset into the software package and with cp and prior set to 0.01 and 0.55 respectively. According to our performance evaluation, this DT model should be able to deliver a sensitivity at the 0.90 level and a PPV above 0.70. The top-down path, following the red arrows, illustrates how the prediction for a subject with sex = female, f_OTS = 60, f_PTS = 30, and f_STS = 50 is made. The prediction made is positive, i.e., the subject suffers from DD, as the path ends at a red node. On the other hand, a subject is predicted to be negative, if the path corresponding to the subject’s feature values ends at a blue node. The n+ and n- values associated with each node respectively specify the percentages of positive subjects and negative subjects among all the subjects that meet the criteria specified along the path to the node. In fact, a user can figure out the probability that a subject is positive or negative by examining the n+ and n- of the leaf node that the feature values of this subject fit into.

Discussion

In this study, we have investigated how the frequencies of therapies can be exploited to build machine learning based prediction models for identifying children with development delay. Based on the experimental results observed, it is conceivable that the proposed approach can be widely exploited in clinical practices due to several reasons. Firstly, the performance observed with the prediction models developed in this study should meet the criteria acceptable by most physicians. For example, based on our experimental results, we can anticipate that the DT model shown in Table 5 can identify about 90.0% of the subjects who will develop DD in the future, while about 72% of the subjects predicted to be positive are actually true positives. Secondly, the features employed to build the prediction models can be obtained with essentially no costs. Therefore, the prediction models can be exploited to screen the subjects who may develop DD before advanced and costly diagnoses are carried out.

The experimental results also demonstrate that for the applications targeted by this study we do not need to trade performance for the interpretability of the prediction model. The F1 scores presented in Table 5 show that the DT models that delivered the sensitivity at the 0.90 level and at the 0.80 level outperformed the DNN models and the SVM model that delivered the sensitivity at the same level. For most applications, it is typical that advanced machine learning based prediction models such as the DNN models and the SVM models outperform the DT models due to the non-linear transformations invoked. However, the non-linear transformations invoked also make it almost impossible for a user to figure out how the prediction is made. Fortunately, for our applications, we do not need to trade performance for the interpretability of the prediction model.

The DT structure shown in Fig 3 illustrates how a user can examine the structure to figure out the decision rules followed by the prediction model to make predictions. Furthermore, the ratio of between the number of positive subjects and the number of negative subjects at each leaf node specifies how likely a subject that meets the criteria corresponding to the path to this particular leaf node develops DD. For example, the probability that the subject with sex = female, f_OTS = 60, f_PTS = 30, and f_STS = 50 develops DD is 0.73. In clinical practice, a physician can refer to this specific probability and his/her clinical experiences to make the final diagnosis.

In summary, the major finding due to this study is that the frequencies of the therapies that a child has received provide valuable information for predicting whether the child suffers from DD. Due to the performance observed in the experiments and the fact that these features can be obtained essentially without any cost, it is conceivable that the prediction models built accordingly can be wide exploited in clinical practices and significantly improve the treatment outcomes of the children who develop DD. Though the study was based on a dataset collected in a hospital in Taiwan, we anticipate that the proposed method can be exploited to build accurate prediction models for populations in different countries with various race groups.

Limitations

Several limitations of this study should be noted. Firstly, this retrospective study relies on data extracted from the outpatient (OPD) database with children under 12 years old. Consequently, the findings may not be generalized for the other age groups. Secondly, the prediction models developed were solely based on the data collected from a hospital in Taiwan and its applicability to other hospitals has not been validated. Thirdly, the dataset employed in this study was derived from the clinical records in the OPD and therefore these patients were likely to already have DD conditions. Finally, it is observed that there were significantly more male patients than the female patients, which conforms with previous findings [43,44]. Therefore, stratified sampling based on gender was not carried out.

Conclusions and future works

As this study has revealed that the frequencies of the therapies that a child has received provide valuable information for predicting whether the child suffers from DD, it is conceivable that we can built more accurate prediction models by integrating these features with other clinical assessment scores and the records of advanced medical examinations such as brain imaging, electroencephalograms, etc. This study has also revealed that concerning this application the performance delivered by the DT models is favorite in comparison with that delivered by advanced machine learning models such as the SVM models and the DNN models. As the DT structures explicitly exhibit the decision rules employed to make predictions, physicians can incorporate these decision rules with their clinical experiences to make final diagnoses. For future studies, it is of interest to investigate the association between a child’s age and the frequency of clinical therapy. In particular, it is of interest to investigate whether the therapy frequency peaks within a specific age range. The findings may support the principle of early intervention and carry significant clinical and therapeutic implications. Furthermore, identifying the age at which a child with DD is diagnosed, the specific clinical therapy received within rehabilitation, and the period of highest therapy concentration can facilitate our understanding of the most critical timeframe for early intervention.

Supporting information

S1 Table. The ICD-9-CM (International Classification of Disease, 9th Revision, Clinical Modification) codes of developmental delay (DD).

https://doi.org/10.1371/journal.pone.0324204.s001

(DOCX)

S2 Table. The ICD-10-CM (International Classification of Disease, 10th Revision, Clinical Modification) codes of developmental delay (DD).

https://doi.org/10.1371/journal.pone.0324204.s002

(DOCX)

Acknowledgments

We deeply appreciate Mrs. Chi-Hung Huang’s efforts in the early stage of this study, when he served in the En Chu Kong Hospital, New Taipei City, Taiwan. We also deeply appreciate Dr. Sheng-Chin Weng for his professional comments in the finding of outpatient department records, when he served in the En Chu Kong Hospital, New Taipei City, Taiwan.

References

1. Choi WW, McBride CA, Bourke C, Borzi P, Choo K, Walker R, et al. Long-term review of sutureless ward reduction in neonates with gastroschisis in the neonatal unit. J Pediatr Surg. 2012;47(8):1516–20. pmid:22901910
- View Article
- PubMed/NCBI
- Google Scholar
2. Katuwal G. Machine learning-based autism detection using brain imaging. Rochester (NY): Rochester Institute of Technology. 2017. https://scholarworks.rit.edu
3. Smythe T, Zuurmond M, Tann CJ, Gladstone M, Kuper H. Early intervention for children with developmental disabilities in low and middle-income countries - the case for action. Int Health. 2021;13(3):222–31. pmid:32780826
- View Article
- PubMed/NCBI
- Google Scholar
4. Waynforth D. A machine learning algorithm predicting infant psychomotor developmental delay using medical and social determinants. Reprod Med. 2023;4(3):106–17.
- View Article
- Google Scholar
5. Hair FJ, Sarstedt M. Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing. J Mark Theory Pract. 2021;29(1):65–77.
- View Article
- Google Scholar
6. Wall DP, Kosmicki J, Deluca TF, Harstad E, Fusaro VA. Use of machine learning to shorten observation-based screening and diagnosis of autism. Transl Psychiatry. 2012;2(4):e100. pmid:22832900
- View Article
- PubMed/NCBI
- Google Scholar
7. Mueller A, Candrian G, Kropotov J, Ponomarev V, Baschera G. Classification of ADHD patients on the basis of independent ERP components using a machine learning system. Nonlinear Biomed Phys. 2010;4:1.
- View Article
- Google Scholar
8. Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophr. 2015;1:15030. pmid:27336038
- View Article
- PubMed/NCBI
- Google Scholar
9. Bishop-Fitzpatrick L, Movaghar A, Greenberg JS, Page D, DaWalt LS, Brilliant MH, et al. Using machine learning to identify patterns of lifetime health problems in decedents with autism spectrum disorder. Autism Res. 2018;11(8):1120–8. pmid:29734508
- View Article
- PubMed/NCBI
- Google Scholar
10. Herring S, Gray K, Taffe J, Tonge B, Sweeney D, Einfeld S. Behaviour and emotional problems in toddlers with pervasive developmental disorders and developmental delay: associations with parental mental health and family functioning. J Intellect Disabil Res. 2006;50(Pt 12):874–82. pmid:17100948
- View Article
- PubMed/NCBI
- Google Scholar
11. Altay O, Ulas M. Prediction of the autism spectrum disorder diagnosis with linear discriminant analysis classifier and K-nearest neighbor in children. In: 2018 6th International Symposium on Digital Forensic and Security (ISDFS). IEEE. 2018.
12. Büyükoflaz F, Öztürk A. Early autism diagnosis of children with machine learning algorithms. In: 2018 26th Signal Processing and Communications Applications Conference (SIU). IEEE. 2018.
13. Dvornek NC, Ventola P, Duncan JS. Combining phenotypic and resting-state fMRI data for autism classification with recurrent neural networks. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018; Washington (DC), USA: IEEE; 2018.8
14. Liao D, Lu H. Classify autism and control based on deep learning and community structure on resting-state fMRI. In: 2018 IEEE 10th International Conference on Advanced Computational Intelligence (ICACI); 2018; Xiamen, China: IEEE; 2018.
15. Dekhil O, Ismail M, Shalaby A, et al. A novel CAD system for autism diagnosis using structural and functional MRI. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017); 2017; Melbourne (VIC), Australia: IEEE; 2017.
16. Jiao Y, Lu Z. Predictive models for ASD based on multiple cortical features. In: 2011 IEEE 8th International Conference on Fuzzy Systems and Knowledge Discovery; 2011; Shanghai, China: IEEE; 2011. p. 1611-5.
17. Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 2017;17:16–23. pmid:29034163
- View Article
- PubMed/NCBI
- Google Scholar
18. Bone D, Bishop SL, Black MP, Goodwin MS, Lord C, Narayanan SS. Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. J Child Psychol Psychiatry. 2016;57(8):927–37. pmid:27090613
- View Article
- PubMed/NCBI
- Google Scholar
19. Jin Y, Wee C-Y, Shi F, Thung K-H, Ni D, Yap P-T, et al. Identification of infants at high-risk for autism spectrum disorder using multiparameter multiscale white matter connectivity networks. Hum Brain Mapp. 2015;36(12):4880–96. pmid:26368659
- View Article
- PubMed/NCBI
- Google Scholar
20. Kim B-M, Kang B-Y, Kim H-G, Baek S-H. Prognosis prediction for Class III malocclusion treatment by feature wrapping method. Angle Orthod. 2009;79(4):683–91. pmid:19537866
- View Article
- PubMed/NCBI
- Google Scholar
21. Song C, Jiang Z-Q, Liu D, Wu L-L. Application and research progress of machine learning in the diagnosis and treatment of neurodevelopmental disorders in children. Front Psychiatry. 2022;13:960672. pmid:36090350
- View Article
- PubMed/NCBI
- Google Scholar
22. Megerian JT, Dey S, Melmed RD, Coury DL, Lerner M, Nicholls CJ, et al. Evaluation of an artificial intelligence-based medical device for diagnosis of autism spectrum disorder. NPJ Digit Med. 2022;5(1):57. pmid:35513550
- View Article
- PubMed/NCBI
- Google Scholar
23. Maharjan J, Garikipati A, Dinenno FA, Ciobanu M, Barnes G, Browning E, et al. Machine learning determination of applied behavioral analysis treatment plan type. Brain Inform. 2023;10(1):7. pmid:36862316
- View Article
- PubMed/NCBI
- Google Scholar
24. Tariq Q, Fleming SL, Schwartz JN, Dunlap K, Corbin C, Washington P, et al. Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study. J Med Internet Res. 2019;21(4):e13822. pmid:31017583
- View Article
- PubMed/NCBI
- Google Scholar
25. American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-5. 5th ed. ed. Washington (DC): American Psychiatric Association. 2013.
26. El-Baz F, El-Aal MA, Kamal TM, Sadek AA, Othman AA. Study of the C677T and 1298AC polymorphic genotypes of MTHFR Gene in autism spectrum disorder. Electron Physician. 2017;9(9):5287–93. pmid:29038711
- View Article
- PubMed/NCBI
- Google Scholar
27. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112:103375. pmid:31382212
- View Article
- PubMed/NCBI
- Google Scholar
28. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer. 2013.
29. Cho S, Hong H, Ha B. A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance for bankruptcy prediction. Expert Syst Appl. 2010;37(5):3482–8.
- View Article
- Google Scholar
30. Kallenberg W, Oosterhoff J, Schriever B. The number of classes in chi-squared goodness-of-fit tests. J Am Stat Assoc. 1985;80(392):959–68.
- View Article
- Google Scholar
31. Maimon O, Rokach L. Data mining with decision trees: theory and applications. 2nd ed. ed. Hackensack (NJ): World Scientific. 2014.
32. Bai J, Li Y, Li J, Jiang Y, Xia S. Rectified decision trees: towards interpretability, compression, and empirical soundness. arXiv. https://arxiv.org/abs/1903.05965. 2019. Accessed 2025 March 17.
- View Article
- Google Scholar
33. Tharwat A. Classification assessment methods. Appl Comput Inform. 2020;17(2):168–92.
- View Article
- Google Scholar
34. Desai VS, Crook JN, Overstreet GAJ. A comparison of neural networks and linear scoring models in the credit union environment. Eur J Oper Res. 1996;95(1):24–37.
- View Article
- Google Scholar
35. Ho T-S, Weng T-C, Wang J-D, Han H-C, Cheng H-C, Yang C-C, et al. Comparing machine learning with case-control models to identify confirmed dengue cases. PLoS Negl Trop Dis. 2020;14(11):e0008843. pmid:33170848
- View Article
- PubMed/NCBI
- Google Scholar
36. Chiu H-YR, Hwang C-K, Chen S-Y, Shih F-Y, Han H-C, King C-C, et al. Machine learning for emerging infectious disease field responses. Sci Rep. 2022;12(1):328. pmid:35013370
- View Article
- PubMed/NCBI
- Google Scholar
37. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011;21(2):137–46.
- View Article
- Google Scholar
38. Wong TT, Yeh PY. Reliable accuracy estimates from k-fold cross-validation. IEEE Trans Knowl Data Eng. 2019;32(8):1586–94.
- View Article
- Google Scholar
39. Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, et al. Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study. J Med Internet Res. 2020;22(10):e21299. pmid:33001828
- View Article
- PubMed/NCBI
- Google Scholar
40. Lundberg SM, Lee S. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–74.
- View Article
- Google Scholar
41. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. pmid:7063747
- View Article
- PubMed/NCBI
- Google Scholar
42. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74.
- View Article
- Google Scholar
43. Russell G, Steer C, Golding J. Social and demographic factors that influence the diagnosis of autistic spectrum disorders. Soc Psychiatry Psychiatr Epidemiol. 2011;46(12):1283–93. pmid:20938640
- View Article
- PubMed/NCBI
- Google Scholar
44. Fombonne E. Epidemiology of pervasive developmental disorders. Pediatr Res. 2009;65(6):591–8. pmid:19218885customer carecustomer careauthor billing
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Choi WW, McBride CA, Bourke C, Borzi P, Choo K, Walker R, et al. Long-term review of sutureless ward reduction in neonates with gastroschisis in the neonatal unit. J Pediatr Surg. 2012;47(8):1516–20. pmid:22901910
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Katuwal G. Machine learning-based autism detection using brain imaging. Rochester (NY): Rochester Institute of Technology. 2017. https://scholarworks.rit.edu

[ref3] 3. Smythe T, Zuurmond M, Tann CJ, Gladstone M, Kuper H. Early intervention for children with developmental disabilities in low and middle-income countries - the case for action. Int Health. 2021;13(3):222–31. pmid:32780826
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Waynforth D. A machine learning algorithm predicting infant psychomotor developmental delay using medical and social determinants. Reprod Med. 2023;4(3):106–17.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Hair FJ, Sarstedt M. Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing. J Mark Theory Pract. 2021;29(1):65–77.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Wall DP, Kosmicki J, Deluca TF, Harstad E, Fusaro VA. Use of machine learning to shorten observation-based screening and diagnosis of autism. Transl Psychiatry. 2012;2(4):e100. pmid:22832900
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Mueller A, Candrian G, Kropotov J, Ponomarev V, Baschera G. Classification of ADHD patients on the basis of independent ERP components using a machine learning system. Nonlinear Biomed Phys. 2010;4:1.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophr. 2015;1:15030. pmid:27336038
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Bishop-Fitzpatrick L, Movaghar A, Greenberg JS, Page D, DaWalt LS, Brilliant MH, et al. Using machine learning to identify patterns of lifetime health problems in decedents with autism spectrum disorder. Autism Res. 2018;11(8):1120–8. pmid:29734508
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Herring S, Gray K, Taffe J, Tonge B, Sweeney D, Einfeld S. Behaviour and emotional problems in toddlers with pervasive developmental disorders and developmental delay: associations with parental mental health and family functioning. J Intellect Disabil Res. 2006;50(Pt 12):874–82. pmid:17100948
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref11] 11. Altay O, Ulas M. Prediction of the autism spectrum disorder diagnosis with linear discriminant analysis classifier and K-nearest neighbor in children. In: 2018 6th International Symposium on Digital Forensic and Security (ISDFS). IEEE. 2018.

[ref12] 12. Büyükoflaz F, Öztürk A. Early autism diagnosis of children with machine learning algorithms. In: 2018 26th Signal Processing and Communications Applications Conference (SIU). IEEE. 2018.

[ref13] 13. Dvornek NC, Ventola P, Duncan JS. Combining phenotypic and resting-state fMRI data for autism classification with recurrent neural networks. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018; Washington (DC), USA: IEEE; 2018.8

[ref14] 14. Liao D, Lu H. Classify autism and control based on deep learning and community structure on resting-state fMRI. In: 2018 IEEE 10th International Conference on Advanced Computational Intelligence (ICACI); 2018; Xiamen, China: IEEE; 2018.

[ref15] 15. Dekhil O, Ismail M, Shalaby A, et al. A novel CAD system for autism diagnosis using structural and functional MRI. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017); 2017; Melbourne (VIC), Australia: IEEE; 2017.

[ref16] 16. Jiao Y, Lu Z. Predictive models for ASD based on multiple cortical features. In: 2011 IEEE 8th International Conference on Fuzzy Systems and Knowledge Discovery; 2011; Shanghai, China: IEEE; 2011. p. 1611-5.

[ref17] 17. Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A, Meneguzzi F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. Neuroimage Clin. 2017;17:16–23. pmid:29034163
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref18] 18. Bone D, Bishop SL, Black MP, Goodwin MS, Lord C, Narayanan SS. Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. J Child Psychol Psychiatry. 2016;57(8):927–37. pmid:27090613
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref19] 19. Jin Y, Wee C-Y, Shi F, Thung K-H, Ni D, Yap P-T, et al. Identification of infants at high-risk for autism spectrum disorder using multiparameter multiscale white matter connectivity networks. Hum Brain Mapp. 2015;36(12):4880–96. pmid:26368659
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref20] 20. Kim B-M, Kang B-Y, Kim H-G, Baek S-H. Prognosis prediction for Class III malocclusion treatment by feature wrapping method. Angle Orthod. 2009;79(4):683–91. pmid:19537866
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref21] 21. Song C, Jiang Z-Q, Liu D, Wu L-L. Application and research progress of machine learning in the diagnosis and treatment of neurodevelopmental disorders in children. Front Psychiatry. 2022;13:960672. pmid:36090350
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref22] 22. Megerian JT, Dey S, Melmed RD, Coury DL, Lerner M, Nicholls CJ, et al. Evaluation of an artificial intelligence-based medical device for diagnosis of autism spectrum disorder. NPJ Digit Med. 2022;5(1):57. pmid:35513550
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref23] 23. Maharjan J, Garikipati A, Dinenno FA, Ciobanu M, Barnes G, Browning E, et al. Machine learning determination of applied behavioral analysis treatment plan type. Brain Inform. 2023;10(1):7. pmid:36862316
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref24] 24. Tariq Q, Fleming SL, Schwartz JN, Dunlap K, Corbin C, Washington P, et al. Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study. J Med Internet Res. 2019;21(4):e13822. pmid:31017583
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref25] 25. American Psychiatric Association. Diagnostic and statistical manual of mental disorders: DSM-5. 5th ed. ed. Washington (DC): American Psychiatric Association. 2013.

[ref26] 26. El-Baz F, El-Aal MA, Kamal TM, Sadek AA, Othman AA. Study of the C677T and 1298AC polymorphic genotypes of MTHFR Gene in autism spectrum disorder. Electron Physician. 2017;9(9):5287–93. pmid:29038711
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref27] 27. Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019;112:103375. pmid:31382212
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref28] 28. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer. 2013.

[ref29] 29. Cho S, Hong H, Ha B. A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance for bankruptcy prediction. Expert Syst Appl. 2010;37(5):3482–8.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref30] 30. Kallenberg W, Oosterhoff J, Schriever B. The number of classes in chi-squared goodness-of-fit tests. J Am Stat Assoc. 1985;80(392):959–68.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref31] 31. Maimon O, Rokach L. Data mining with decision trees: theory and applications. 2nd ed. ed. Hackensack (NJ): World Scientific. 2014.

[ref32] 32. Bai J, Li Y, Li J, Jiang Y, Xia S. Rectified decision trees: towards interpretability, compression, and empirical soundness. arXiv. https://arxiv.org/abs/1903.05965. 2019. Accessed 2025 March 17.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref33] 33. Tharwat A. Classification assessment methods. Appl Comput Inform. 2020;17(2):168–92.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref34] 34. Desai VS, Crook JN, Overstreet GAJ. A comparison of neural networks and linear scoring models in the credit union environment. Eur J Oper Res. 1996;95(1):24–37.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref35] 35. Ho T-S, Weng T-C, Wang J-D, Han H-C, Cheng H-C, Yang C-C, et al. Comparing machine learning with case-control models to identify confirmed dengue cases. PLoS Negl Trop Dis. 2020;14(11):e0008843. pmid:33170848
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref36] 36. Chiu H-YR, Hwang C-K, Chen S-Y, Shih F-Y, Han H-C, King C-C, et al. Machine learning for emerging infectious disease field responses. Sci Rep. 2022;12(1):328. pmid:35013370
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref37] 37. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011;21(2):137–46.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref38] 38. Wong TT, Yeh PY. Reliable accuracy estimates from k-fold cross-validation. IEEE Trans Knowl Data Eng. 2019;32(8):1586–94.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref39] 39. Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, et al. Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study. J Med Internet Res. 2020;22(10):e21299. pmid:33001828
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref40] 40. Lundberg SM, Lee S. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–74.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. pmid:7063747
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref42] 42. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861–74.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref43] 43. Russell G, Steer C, Golding J. Social and demographic factors that influence the diagnosis of autistic spectrum disorders. Soc Psychiatry Psychiatr Epidemiol. 2011;46(12):1283–93. pmid:20938640
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref44] 44. Fombonne E. Epidemiology of pervasive developmental disorders. Pediatr Res. 2009;65(6):591–8. pmid:19218885customer carecustomer careauthor billing
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

Figures

Abstract

Objective

Methods

Results

Conclusions

Introduction

Methods

Data collection and outcome measurement

Experimental procedures

Feature selection

Development of prediction models and performance evaluation

Results

Discussion

Limitations

Conclusions and future works

Supporting information

S1 Table. The ICD-9-CM (International Classification of Disease, 9th Revision, Clinical Modification) codes of developmental delay (DD).

S2 Table. The ICD-10-CM (International Classification of Disease, 10th Revision, Clinical Modification) codes of developmental delay (DD).

Acknowledgments

References