Figures
Abstract
Intravenous immunoglobulin (IVIG) has been established as the first-line therapy for Kawasaki disease (KD). However, approximately 10%–20% of pediatric patients exhibit IVIG resistance. Current machine learning (ML) models demonstrate suboptimal predictive performance in KD treatment response prediction, primarily due to their limited ability to effectively process categorical variables and interpret tabular clinical data. This study aims to develop and interpretable transformer-based clinical prediction model for IVIG resistant KD and validate its clinical utility. This retrospective study analyzed clinical records of KD patients from the Affiliated Hospital of North Sichuan Medical College (Nanchong, China) between January 1, 2014 and December 31, 2024. A cohort of 1,578 pediatric KD cases was systematically divided into training and validation sets. Six machine learning algorithms - Random Forest (RF), AdaBoost, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Tabular Prior-data Fitted Network version 2.0 (TabPFN-V2) - were implemented with five-fold cross-validation to optimize model hyperparameters. Model performance was rigorously evaluated using seven metrics: accuracy, precision, recall, F1-score, Matthews correlation coefficient (MCC), area under the receiver operating characteristic (ROC-AUC), and area under the precision-recall curve (PR-AUC). The top-performing model was subsequently subjected to interpretability analysis through Shapley Additive Explanations (SHAP) to elucidate feature contributions. The transformer-based TabPFN-V2 model demonstrated superior predictive performance in KD analysis, achieving an impressive validation set accuracy of 0.97. Comprehensive evaluation metrics confirmed its robust performance: precision 0.98, recall 0.97, F1-score 0.98, MCC 0.95, ROC-AUC 0.99, and PR-AUC 0.99. Global interpretability analysis through kernel SHAP methodology identified the ten most influential predictive features ranked by significance: Coronary artery lesions (CAL), Aspartate aminotransferase (AST), C-reactive protein (CRP), whether it was incomplete KD (KDtype), Neutrophil count (N), Platelet count (PLT), Albumin (ALB), age, White blood cell count (WBC) and Hemoglobin (Hb). Local interpretability analysis revealed distinct correlation patterns with IVIG resistance:AST, CRP, and N demonstrated significant positive correlations, where elevated values corresponded to increased IVIG resistance risk; PLT and ALB showed negative correlations, with higher levels associated with reduced resistance probability. Notably, age and WBC parameters demonstrated threshold effects, where optimal cutoff values enabled re-calibration of single-variable predictive scores. This threshold-dependent relationship suggests potential clinical utility in risk stratification protocols.The TabPFN-V2 model, leveraging an interpretable transformer architecture, demonstrates dual clinical utilities in KD management: (1) accurate prediction of IVIG resistance risk, and (2) data-driven support for personalized therapeutic decision-making. This framework enables probabilistic estimation of treatment resistance likelihood while providing transparent feature contribution analyses essential for developing patient-specific management protocols.
Citation: Chen G, Yang Z (2025) Clinical prediction of intravenous immunoglobulin-resistant Kawasaki disease based on interpretable Transformer model. PLoS One 20(7): e0327564. https://doi.org/10.1371/journal.pone.0327564
Editor: Benjamin M. Liu, Children’s National Hospital, George Washington University, UNITED STATES OF AMERICA
Received: May 9, 2025; Accepted: June 17, 2025; Published: July 9, 2025
Copyright: © 2025 Chen, Yang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data and codes are available from the https://github.com/zhuzhuchifei/kawasaki-disease-2025.
Funding: Nanchong Social Science Research Initiative: 14th Five-Year Plan Project 2025 (NC25C189), Yang, Nanchong Federation of Social Sciences and Humanities, www.ncskl.cn. 2024 Nanchong Social Science Planning: Healthy City Development Research Initiative (NC24JK18), Chen, Nanchong Federation of Social Sciences and Humanities, www.ncskl.cn. Affiliated Hospital of North Sichuan Medical College Research Development Plan (2023JC019), Yang, Affiliated Hospital of North Sichuan Medical College, https://www.hospital-nsmc.com.cn. Scientific Research Development Plan Project of North Sichuan Medical College (CBY21-QA55). Chen, North Sichuan Medical College, https://www.nsmc.edu.cn.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Kawasaki disease (KD), an acute systemic vasculitis predominantly affecting children under five years old, represents the leading cause of acquired pediatric heart disease due to its propensity to induce coronary artery lesions (CAL) [1]. While intravenous immunoglobulin (IVIG) administration remains the established first-line therapy, approximately 10%–20% of patients exhibit IVIG resistance [2], necessitating clinical reassessment of therapeutic strategies. Although retreatment with IVIG is frequently attempted in such cases, concerns persist regarding its therapeutic efficacy, potential adverse effects, and cost-effectiveness [3]. Importantly, IVIG resistance has been strongly associated with an elevated risk of CAL [4]. These clinical realities underscore the critical need to establish robust predictive models capable of early identification of high-risk patients, enabling pediatricians to implement timely therapeutic interventions.
Currently, the prediction of IVIG resistant KD primarily relies on scoring systems and traditional machine learning (ML) model, such as Egami, Kobayashi, and Sano [5]. However, conventional models face significant limitations in processing high-dimensional and large-scale datasets characteristic of IVIG resistant KD, which may compromise both prediction accuracy and clinical applicability [6–8]. These models frequently fail to account for complex interactions among multiple clinical and laboratory variables, potentially leading to unreliable risk stratification. Furthermore, existing ML implementations demonstrate suboptimal performance in handling categorical variables and tabular medical data [9, 10]. Specifically, conventional label encoding methods for categorical variables tend to generate high dimensional sparse matrices, creating computational challenges that hinder effective model training and ultimately result in diminished predictive performance. These limitations underscore the critical need for developing more efficient analytical frameworks specifically optimized for medical tabular data processing, which could substantially enhance both predictive accuracy and clinical decision-making in KD management.
TabPFN presents a novel Transformer-based architecture specifically designed to enhance traditional ML approaches in tabular data processing. The model innovatively implements a bidirectional hierarchical attention mechanism that dynamically learns inter-feature relationships, seamlessly integrating categorical and numerical feature processing through dual-path representation learning. This architecture significantly boosts both computational efficiency and predictive accuracy while maintaining interpretability through its structured attention patterns [11]. The proposed architecture synergizes In-Context Learning (ICL) with Bayesian principles through a probabilistic fusion framework, establishing an adaptive inference mechanism that enables efficient Bayesian integration in deep neural networks. The core innovation of TabPFN lies in its reformulation of posterior approximation as a supervised learning task. Specifically, the model employs a groundbreaking methodological framework in tabular data learning where it: 1) draws function samples from predefined prior distributions, 2) generates synthetic datasets by extracting feature-label pairs, 3) intentionally masks subsets of labels to create missing-value scenarios, and 4) optimizes its parameters to perform Bayesian inference by predicting complete probability distributions conditioned on the observable data patterns. TabPFN-V2 architecture demonstrates substantial advancements through pre-training on a comprehensive 130-million synthetic tabular prediction dataset while achieving state-of-the-art inference efficiency. This next-generation model incorporates optimized computational paradigms specifically engineered for 2025 deployment scenarios.
The opaque nature of ML models may undermine trust among patients and clinicians, consequently restricting their clinical implementation. To address this interpretability challenge, SHapley Additive exPlanations (SHAP) provides game-theoretic principles for quantifying individual feature contributions through optimal Shapley value estimation [12]. As a model-agnostic interpretation framework, Kernel SHAP extends the Local Interpretable Model-agnostic Explanations (LIME) paradigm by operationalizing Shapley values to decompose prediction outcomes into feature-specific attribution scores. This approach enables quantitative assessment of input variable influences across arbitrary ML architectures, offering clinically actionable insights through transparent visualization of feature-prediction interactions. Particularly in medical diagnostics, such interpretability frameworks have demonstrated particular efficacy in elucidating pathological decision pathways, thereby enhancing clinicians’ ability to validate model predictions and optimize therapeutic interventions. Therefore, this study seeks to develop and validate a clinical risk stratification model for IVIG resistant KD through implementation of an interpretable deep learning-based Transformer framework, incorporating state-of-the-art ML methodologies to enhance predictive performance and clinical translatability. This article is presented in accordance with the TRIPOD+AI reporting checklist [13]. For more information, see S1 Table.
Materials and methods
Clinical features
This retrospective study analyzed clinical records of KD patients diagnosed at the Affiliated Hospital of North Sichuan Medical College in Nanchong, Sichuan Province, from January 1, 2014 to December 31, 2024. The data were accessed for research purposes in 02/02/2025, after the data collection had been completed. Diagnosis was established according to the 2017 American Heart Association (AHA) guidelines for KD [14]. The final cohort comprised 1,578 confirmed cases, including 170 cases (10.77%) of IVIG resistant KD. Demographic data revealed a male predominance (937 males vs. 641 females; male-to-female ratio 1.5:1), with age at presentation ranging from 0 to 210 months (mean SD: 34.10
25.53 months). Incomplete KD was diagnosed when patients presented with 2-3 major clinical features along with compatible laboratory findings, after excluding alternative diagnoses exhibiting exudative conjunctivitis, pharyngeal exudates, ulcerative stomatitis, vesiculobullous rash, generalized lymphadenopathy, or hepatosplenomegaly. In consideration of the potential confounding effects associated with Multisystem Inflammatory Syndrome in Children (MIS-C), individuals with laboratory-confirmed SARS-CoV-2 infection were systematically excluded from the study population [15]. IVIG resistance was defined as either persistent fever (>36 hours) following standard therapy with high-dose IVIG (2 g/kg) and aspirin (30-50 mg/kg/day), or recrudescent fever occurring 2-7 days post-treatment. CAL were identified using echocardiographic criteria (Z-score >2.5 in any coronary segment). The study protocol received ethical approval from the Institutional Review Board of North Sichuan Medical College Affiliated Hospital (approval number: [2022ER235-1]). Written informed consent was secured from all participants’ family members prior to study initiation, ensuring compliance with the ethical principles outlined in the Declaration of Helsinki.
Data collection
The standardized clinical dataset (S1 Appendix) of KD comprises 1,578 cases documenting sex, age at onset, disease classification (complete vs. incomplete KD), coronary artery lesions (CAL) status, recurrence history, and 10 laboratory parameters: C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), white blood cell count (WBC), neutrophil count (N), platelet count (PLT), hemoglobin (Hb), albumin (ALB), total protein (TP), aspartate aminotransferase (AST), and alanine aminotransferase (ALT).
Statistical analysis
The data preprocessing and analytical procedures were implemented in Python 3.10 utilizing scikit-learn 1.4.2. A comparative evaluation of ML algorithms was conducted, incorporating six distinct ensemble architectures: Random Forest and AdaBoost from scikit-learn, XGBoost 1.7.3, LightGBM 4.1.0, CatBoost 1.2, and TabPFN 2.0. Model performance was rigorously assessed through seven complementary metrics: accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), along with receiver operating characteristic (ROC) and precision-recall (PR) curves. This study defined true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for each predicted class. For instance, for the IVIG resistant class, TP is the number of IVIG resistant samples predicted correctly. TN is the number of non-IVIG resistant samples. FP is the number of IVIG responsiveness predicted as KD. FN is the number of non-IVIG responsive samples.
MCC represents a statistically robust metric that addresses critical limitations in class-imbalanced learning scenarios. Its -based formulation demonstrates enhanced reliability in both small-sample regimes and skewed class distributions , while simultaneously mitigating stochastic outcome variations through explicit covariance normalization.
To ensure robust generalization and mitigate overfitting, all evaluations employed stratified 5-fold cross-validation, with final model selection based on comprehensive validation outcomes across all evaluation criteria (Fig 1). Continuous variables were assessed using the Mann-Whitney U test, while categorical variables were evaluated through Pearson’s chi-square test. The P value <0.05 was considered statistically significant.
Data processing methodology
Continuous variables were stratified based on their distributional characteristics. Parameters demonstrating approximate normality (ESR, WBC, N, PLT, Hb, ALB, TP) underwent missing value imputation using arithmetic means. Conversely, non-normally distributed variables (age, CRP, AST, ALT) were imputed with median values. Categorical variables (KDtype, CAL, Recurrence) were completed using mode values. To address class imbalance, this study implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous features (SMOTE-NC). This algorithm synthesizes new minority class instances through k-nearest neighbor interpolation in feature space, simultaneously preserving categorical variable integrity while augmenting underrepresented categories. The dataset underwent stratified partitioning into training (80%) and validation (20%) subsets, with randomization controlled by a fixed seed (42) to ensure reproducibility. This partitioning strategy maintained proportional class representation across subsets while permitting rigorous evaluation of model generalizability.
Model explanation
The core innovation of TabPFN-V2 lies in its two-stage learning framework: generating extensive synthetic tabular datasets followed by training a Transformer-based architecture to acquire meta-learning capabilities through exposure to diverse synthetic prediction tasks. Unlike conventional approaches that require manual feature engineering or problem-specific adaptations when confronting data challenges like missing values, TabPFN develops intrinsic problem-solving strategies by systematically addressing synthetically generated tasks that emulate real-world data imperfections. This pre-training paradigm enables rapid adaptation to new tabular prediction problems with minimal fine-tuning (Fig 2).
(A) Overview of TabPFN synthetic pre-training. (B) TabPFN architecture.
The SHAP 0.42.1 was employed for interpretable ML analysis of the optimal prediction model. Utilizing the kernel SHAP approximation framework, this study conducted both global interpretability analysis and local instance-level explanations. Feature importance ranking was determined by aggregating absolute SHAP values across the dataset, where each feature’s significance was quantified by its mean absolute Shapley value contribution. This integration of ML with game-theoretic interpretation establishes a mathematically grounded framework for explaining predictive model behavior, enhancing both transparency and scientific validity.
Results
Clinical cohort characteristics
Table 1 presents the demographic characteristics and laboratory profiles of 1,578 KD patients enrolled in the training and validation cohorts. The study population comprised 937 males (59.4%) with 378 cases (24.0%) exhibiting CALs, 422 cases (26.7%) diagnosed as incomplete KD, and 75 cases (4.8%) experiencing disease relapse. Comparative analysis revealed the IVIG resistant subgroup (n=170, 10.8%) displayed significantly elevated inflammatory markers and hepatic enzymes, including CRP, ESR, N, AST, and ALT. Conversely, this subgroup demonstrated younger age at onset, reduced PLT, and ALB levels compared to IVIG responsive counterparts. Three categorical key predictors of IVIG resistance in KD is Recurrence, CAL, and KDtype (Table 1).
Training set accuracy and IVIG responsiveness evaluation
Six ML algorithms (Random Forest, AdaBoost, XGBoost, LightGBM, CatBoost, and TabPFN-V2) were systematically evaluated using 5-fold cross-validation for hyperparameter optimization. The training set accuracies demonstrated notable variation across models: RF achieved 0.82, AdaBoost 0.80, LightGBM 0.84, XGBoost 0.81, CatBoost 0.92, and TabPFN-V2 attained perfect classification accuracy 1.00 (Table 2). This comparative analysis revealed TabPFN-V2 as the top-performing model, exhibiting exceptional predictive capabilities. Further evaluation of the TabPFN-V2 model on the IVIG responsive subgroup demonstrated outstanding classification metrics: precision, recall, and F1-score all reached 0.98. The model’s perfect training accuracy and strong subgroup performance suggest exceptional pattern recognition capabilities on this dataset, though potential overfitting considerations warrant further investigation in validation set.
Validation set accuracy and IVIG resistance evaluation
The validation set was utilized to evaluate the trained models, with subsequent evaluation demonstrating TabPFN-V2’s superior performance. As shown in Table 3, this model achieved the highest accuracy (0.97) while exhibiting optimal model fit without evidence of overfitting. In the IVIG resistance classification, TabPFN-V2 displayed exceptional metrics: precision of 0.98, recall of 0.97, F1 score of 0.98, and MCC of 0.95. The model’s discriminative power was further evidenced by AUC of 0.99 (Fig 3) and PRC of 0.99 (Fig 4). Collectively, these robust performance metrics across multiple evaluation frameworks identify TabPFN-V2 as the superior model for the current dataset. The model demonstrates optimal suitability for classification modeling tasks in this domain, combining high predictive accuracy with stable generalization.
(A) Random Forest, (B) AdaBoost, (C) LightGBM, (D) XGBoost, (E) CatBoost, (F) TabPFN-V2.
(A) Random Forest, (B) AdaBoost, (C) LightGBM, (D) XGBoost, (E) CatBoost, (F) TabPFN-V2.
TabPFN-V2 model interpretation with kernel SHAP method
The TabPFN-V2 model was analyzed using the kernel SHAP method to quantify feature importance in predicting IVIG resistance. Global interpretability results (Fig 5) reveal the relative influence of clinical variables through distinct horizontal bands, where each feature’s contribution is visualized as color-coded data points across all patient cases. Red and blue markers correspond to high and low SHAP values, respectively, indicating positive and negative impacts on IVIG resistance risk. Among the analyzed predictors, AST, CRP, N, PLT, and ALB emerged as the most influential determinants. Notably, CAL and incomplete KD presentation demonstrated substantial predictive weight. In contrast, patient sex and disease recurrence exhibited minimal contributions to the model’s decision-making framework. This hierarchical feature importance profile provides actionable insights for clinical risk stratification while highlighting key biomarkers warranting closer monitoring.
CA, coronary artery lesions; AST, aspartate transaminase; CRP, C-reactive protein; KDtype, incomplete Kawasaki disease; N, Neutrophil count; PLT, platelet count; ALB, albumin; ALT, alanine aminotransferase; WBC, white blood cell; Hb, hemoglobin; TP, total protein. ESR, erythrocyte sedimentation rate; SHAP, SHapley Additive exPlanations.
Interaction network analysis
In Fig 6, the SHAP scores for AST, CRP, and N demonstrate a positive correlation with IVIG resistance, showing a progressive increase as biomarker values rise. However, this relationship exhibits a non-linear pattern where the contribution value plateaus beyond specific thresholds despite continued biomarker elevation. Notably, the analysis identifies critical inflection points (AST >100 U/L, CRP >110 mg/L, and N >12109/L) corresponding to SHAP values exceeding baseline (SHAP >0.0). These threshold parameters demonstrate significant predictive value and may serve as foundational criteria for developing a single-index IVIG resistance scoring system.
(A) The X-axis represents the range of characteristic values for AST features, and the Y-axis represents the shap value of AST features, which will affect the output of the model. (B) Feature dependence plots of CRP. (C) Feature dependence plots of N. AST, aspartate transaminase; CRP, C-reactive protein; N, Neutrophil count.
The SHAP dependence plots in Fig 7 reveal a significant negative correlation between PLT and ALB with IVIG resistance risk. As these hematological parameters increase in value, their corresponding SHAP values demonstrate a progressive decline, suggesting diminishing predictive contribution to IVIG resistance at higher measurement ranges. This inverse relationship confirms PLT and ALB as protective factors. Notably, the SHAP threshold analysis identifies clinically relevant cutoff values when the SHAP value crosses below zero. Specifically, PLT<300109/L and ALB<38 g/L emerge as critical thresholds for elevated IVIG resistance risk. These biomarkers demonstrate particular prognostic value in the subnormal range, potentially serving as key components for developing a weighted prognostic scoring system to stratify IVIG resistance risk.
(A) The X-axis represents the range of characteristic values for PLT features, and the Y-axis represents the shap value of PLT features, which will affect the output of the model. (B) Feature dependence plots of ALB. PLT, platelet count; ALB, albumin.
While variables such as age and WBC in Fig 8 demonstrate relatively lower feature importance compared to other predictors, the SHAP interpretation reveals clinically meaningful value ranges: age between 25-50 months and WBC levels of 12-15109/L. These identified thresholds offer valuable insights for developing a clinical scoring system, particularly in establishing evidence-based cutoff points for risk stratification.
(A) The X-axis represents the range of characteristic values for age features, and the Y-axis represents the shap value of age features, which will affect the output of the model. (B) Feature dependence plots of WBC. WBC, white blood cell.
Conversely, biochemical markers including ALT, Hb, TP, and ESR demonstrated negligible contribution to model performance, suggesting their potential exclusion from future predictive models in similar clinical contexts (Fig 9).
(A) The X-axis represents the range of characteristic values for ALT features, and the Y-axis represents the shap value of ALT features, which will affect the output of the model. (B) Feature dependence plots of Hb. (C) Feature dependence plots of TP. (D) Feature dependence plots of ESR. ALT, alanine aminotransferase; Hb, hemoglobin; TP, total protein. ESR, erythrocyte sedimentation rate.
Discussion
This study developed a predictive model for IVIG resistant KD by employing six ML algorithms to identify critical risk factors. Among the evaluated models (RF, AdaBoost, LightGBM, XGBoost, CatBoost, and TabPFN-V2), the TabPFN-V2 algorithm demonstrated superior predictive performance. SHAP value analysis revealed coronary artery injury and incomplete KD diagnosis as key risk factors for IVIG resistance. Biomarker analysis identified positive correlations between IVIG resistance and elevated levels of AST, CRP, and neutrophil count, with higher values exhibiting greater predictive importance. Conversely, PLT and ALB showed negative correlations, where increased values corresponded to reduced IVIG resistance risk, establishing them as protective factors. Furthermore, threshold optimization for age and WBC parameters enabled refinement of single-variable predictive models, enhancing their clinical applicability.
Existing predictive frameworks for IVIG resistance in KD have demonstrated limited discriminative performance across heterogeneous cohorts [16–18]. ML analysis revealed that the TabPFN-V2 algorithm significantly outperformed conventional models in internal validation, achieving superior predictive accuracy across multiple evaluation metrics including MCC=0.95, AUC=0.99, and PRC=0.99. This advancement enables early identification of high-risk pediatric patients during the critical therapeutic window, potentially facilitating timely escalation to adjuvant therapies such as corticosteroid administration or biologic agents. However, the generalizability of these findings requires verification through multicenter prospective studies with diverse ethnic populations.
The development of TabPFN originates from the inherent limitations of current traditional ML methods in handling tabular data challenges, particularly when dealing with structural heterogeneity across features and inherent complexity in raw data representations. Released in 2025, TabPFN-V2 represents a significant advancement in tabular data processing, expanding its capabilities beyond classification tasks to include native support for regression analysis. This upgraded architecture maintains its intrinsic handling of missing values and outlier detection while eliminating the need for manual feature engineering. Demonstrating superior performance on small-to-medium scale datasets ( samples and
features), the model achieves unprecedented prediction accuracy with dramatically reduced computational requirements. Built upon a generative Transformer foundation, TabPFN-V2 introduces multifunctional capabilities including adaptive fine-tuning, synthetic data generation, probabilistic density estimation, and production of reusable feature embeddings. Its unique training paradigm leverages millions of synthetically generated datasets to develop generalized pattern recognition abilities. Through enhanced cross-domain modeling proficiency, TabPFN-V2 establishes a new paradigm for automated ML, offering transformative potential for accelerating scientific discovery processes and optimizing high-stakes decision-making across industries.
The clinical application of ML models frequently encounters challenges due to their opaque nature and lack of interpretability, raising concerns about their reliability in predicting disease occurrence within medical practice. To address these limitations, this study implemented SHAP value estimation through the TabPFN-V2 architecture and employed kernel SHAP methodology to generate visual explanations of prediction outcomes, subsequently developing a predictive scoring system for IVIG resistance. Prior explainable ML studies furnish a rigorous methodological scaffold and operational protocols that systematically inform IVIG therapeutic optimization research [19].
This investigation identified CAL and incomplete KD as critical predictors of IVIG resistance. The findings suggest that greater inflammatory severity in KD pathogenesis correlates with three clinically significant outcomes: diminished therapeutic response to IVIG, lower diagnostic recognition rates, and increased probability of treatment resistance - particularly in cases with atypical presentations. These insights emphasize the need for enhanced clinical vigilance in detecting incomplete KD manifestations.
The analysis further revealed three key biochemical predictors: elevated AST, CRP, and neutrophil count. These findings align with established pathological mechanisms. First, CRP and neutrophil elevations reflect systemic inflammation consistent with prior studies linking these markers to infectious morbidity [20, 21]. Second, AST elevation specifically indicates cardiovascular compromise associated with CAL. Notably, PLT and ALB emerged as protective factors against IVIG resistance, corroborating conventional diagnostic models [5] though highlighting discrepancies in current scoring systems’ representation of CRP, N, and AST significance. The model optimization identified critical thresholds for risk stratification: age 25-50 months and WBC 12-15109/L. These parameters demonstrate significant predictive value for IVIG resistance, potentially informing future diagnostic criteria development.The kernel SHAP analysis provides clinically actionable insights. First, Quantifying variable contributions through unified scaling. Second, Enabling transparent interpretation of ML outputs. Third, Facilitating personalized risk assessment beyond generic algorithmic predictions. This explanatory framework enhances clinical decision-making while acknowledging the need for further validation of kernel SHAP’s practical utility in medical contexts.
This study acknowledges several noteworthy limitations that warrant careful consideration. First, while the cohort size exceeds those reported in prior studies, it remains suboptimal relative to the data requirements of modern ML algorithms, potentially limiting model generalizability. Second, the absence of a systematic methodology for feature selection introduces susceptibility to data noise contamination, which may compromise model efficacy through spurious correlations. Third, despite TabPFN-V2’s established versatility in tabular data processing, the single-institution provenance of dataset raises concerns regarding clinical translatability, necessitating rigorous external validation across multi-center cohorts with geographic and demographic diversity. The current study was constrained to fundamental laboratory parameters, while omitting more comprehensive analyses of clinically relevant biomarkers including many liver function markers, Epstein-Barr virus serological profiles, immunological indicators, and echocardiographic parameters. The absence of these multidimensional datasets represents a significant methodological limitation that merits systematic exploration in subsequent research endeavors. Regarding model interpretability, although SHAP analysis was implemented, the computational complexity of kernel SHAP based logical operation interpretation presents substantial challenges in clinical deployment contexts, demanding prohibitive temporal and hardware resources. This underscores the need to investigate emerging interpretability frameworks specifically optimized for medical applications, such as the SHAP-IQ, which may offer pediatricians more clinically actionable insights through enhanced visualization capabilities.
Conclusion
In conclusion, the findings demonstrate that CAL and incomplete KD presentation serve as significant clinical risk factors for IVIG resistance. Through ML analysis, this study identified elevated AST, CRP, and neutrophil count, along with decreased PLT and ALB levels, as independent predictors of treatment non-responsiveness. The Transformer based TabPFN-V2 prediction model demonstrated superior predictive performance, achieving an AUR of 0.99 in validation cohorts. Through interpretable ML framework using Kernel SHAP analysis, this study confirmed the model’s ability to generate clinically meaningful risk stratification. This advanced decision-support tool holds substantial translational value, enabling early identification of high-risk KD patients and assisting clinicians in optimizing therapeutic strategies through personalized IVIG resistance risk assessment.
Supporting information
S1 Table. TRIPOD+AI checklist for the reporting of prediction model studies
https://doi.org/10.1371/journal.pone.0327564.s001
(DOC)
S1 Appendix. Data and codes are available at:
https://doi.org/10.1371/journal.pone.0327564.s002
(DOCX)
References
- 1. Kuo H-C, Liu S-F, Lin P-X, Yang KD, Lin B-S. Near infrared spectroscopy detects change of tissue hemoglobin and water Levelsin Kawasaki disease and coronary artery lesions. Children (Basel). 2022;9(3):299. pmid:35327671
- 2. Matsuguma C, Wakiguchi H, Suzuki Y, Okada S, Furuta T, Ohnishi Y, et al. Dynamics of immunocyte activation during intravenous immunoglobulin treatment in Kawasaki disease. Scand J Rheumatol. 2019;48(6):491–6. pmid:31272272
- 3. Shaanxi Provincial Diagnosis and Treatment Center of Kawasaki Disease, Clinical Research Center for Childhood Diseases of Shaanxi Province, Children’s Hospital of Shaanxi Provincial People’s Hospital, Editorial Board of Chinese Journal of Contemporary Pediatrics, Expert Committee of Advanced Training for Pediatrician, China Maternal andChildren’s Health Association, General Pediatrician Group, Society of Pediatrician, Chinese Doctor Association. Shaanxi provincial diagnosis and treatment center of Kawasaki Disease. Zhongguo Dang Dai Er Ke Za Zhi. 2021;23(9):867–76. pmid:34535199
- 4. Kuo H-C, Lin M-C, Kao C-C, Weng K-P, Ding Y, Han Z, et al. Intravenous immunoglobulin alone for coronary artery lesion treatment of kawasaki disease: a randomized clinical trial. JAMA Netw Open. 2025;8(4):e253063. pmid:40178858
- 5. Lim YT, Kwon JE, Kim YH. Evaluating the performance of egami, Kobayashi and sano scores in predicting IVIG resistance in infant Kawasaki disease. BMC Pediatr. 2024;24(1):606. pmid:39342137
- 6. Ahmed N, Pal P, Azad SM, Ghosh A, Banerjee P, Sarkar SD. Risk factors in IVIG-resistant Kawasaki disease and correlation with Japanese scoring systems - a study from Eastern India. Clin Rheumatol. 2023;42(1):145–50. pmid:36053472
- 7. Kuniyoshi Y, Tokutake H, Takahashi N, Kamura A, Yasuda S, Tashiro M. Comparison of machine learning models for prediction of initial intravenous immunoglobulin resistance in children with Kawasaki disease. Front Pediatr. 2020;8:570834. pmid:33344380
- 8. Wang Y, Cao Y, Li Y, Zhu F, Yuan M, Xu J, et al. Development of an immunoinflammatory indicator-related dynamic nomogram based on machine learning for the prediction of intravenous immunoglobulin-resistant Kawasaki disease patients. Int Immunopharmacol. 2024;134:112194. pmid:38703570
- 9. Deng L, Zhao J, Wang T, Liu B, Jiang J, Jia P, et al. Construction and validation of predictive models for intravenous immunoglobulin-resistant Kawasaki disease using an interpretable machine learning approach. Clin Exp Pediatr. 2024;67(8):405–14. pmid:39048087
- 10. Wang T, Liu G, Lin H. A machine learning approach to predict intravenous immunoglobulin resistance in Kawasaki disease patients: a study based on a Southeast China population. PLoS One. 2020;15(8):e0237321. pmid:32853226
- 11. Hollmann N, Müller S, Purucker L, Krishnakumar A, Körfer M, Hoo SB, et al. Accurate predictions on small data with a tabular foundation model. Nature. 2025;637(8045):319–26. pmid:39780007
- 12. Yano Y, Kanegae H, Node K, Mizuno A, Nishiyama A, Rakugi H, et al. The associations of the national health and productivity management program with corporate profits in Japan. Epidemiol Health. 2022;44:e2022080. pmid:36177978
- 13. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. pmid:38626948
- 14. McCrindle BW, Rowley AH, Newburger JW, Burns JC, Bolger AF, Gewitz M, et al. Diagnosis, treatment, and long-term management of Kawasaki disease: a scientific statement for health professionals from the american heart association. Circulation. 2017;135(17):e927–99. pmid:28356445
- 15. Liu BM, Martins TB, Peterson LK, Hill HR. Clinical significance of measuring serum cytokine levels as inflammatory biomarkers in adult and pediatric COVID-19 cases: a review. Cytokine. 2021;142:155478. pmid:33667962
- 16. Liu J, Zhang J, Huang H, Wang Y, Zhang Z, Ma Y, et al. A machine learning model to predict intravenous immunoglobulin-resistant Kawasaki disease patients: a retrospective study based on the chongqing population. Front Pediatr. 2021;9:756095. pmid:34820343
- 17. Lam JY, Song M-S, Kim G-B, Shimizu C, Bainto E, Tremoulet AH, et al. Intravenous immunoglobulin resistance in Kawasaki disease patients: prediction using clinical data. Pediatr Res. 2024;95(3):692–7. pmid:36797460
- 18. Cheon EJ, Kim GB, Park S. Predictive modeling of consecutive intravenous immunoglobulin treatment resistance in Kawasaki disease: a nationwide study. Sci Rep. 2025;15(1):903. pmid:39762543
- 19. Feng J, Liang J, Qiang Z, Hao Y, Li X, Li L, et al. A hybrid stacked ensemble and Kernel SHAP-based model for intelligent cardiotocography classification and interpretability. BMC Med Inform Decis Mak. 2023;23(1):273. pmid:38017460
- 20. Yang Y, Hu X. The predictive values of MMP-9, PLTs, ESR, and CRP levels in Kawasaki disease with cardiovascular injury. Evid Based Complement Alternat Med. 2022;2022:6913315. pmid:36159562
- 21. Farias JS, Villarreal EG, Savorgnan F, Acosta S, Flores S, Loomba RS. The use of neutrophil-lymphocyte ratio for the prediction of refractory disease and coronary artery lesions in patients with Kawasaki disease. Cardiol Young. 2023;33(8):1409–17. pmid:37012661