Figures
Abstract
Background
This study aimed to develop and validate machine learning (ML) models for predicting the risk of cognitive frailty in community-dwelling elderly adults with stroke.
Methods
This study involved 2,325 stroke survivors from the China Health and Retirement Longitudinal Study (CHARLS), conducted between 2018 and 2020. We examined 22 behavioral variables, encompassing indicators from the sociodemographic, physical, psychological, cognitive, and social domains. LASSO regression was employed to identify predictive factors, and eight machine learning models—Logistic Regression, Decision Tree, XGBoost, Support Vector Machine, k-Nearest Neighbors, Naïve Bayes, Random Forest, and LightGBM—were utilized to ascertain the optimal model for predicting cognitive frailty among stroke survivors. SHapley Additive exPlanations (SHAP) values were applied to interpret the contributions of the variables.
Results
A total of 2,325 stroke patients were included in the study, among whom 688 (29.59%) exhibited symptoms of cognitive frailty. Of the eight models evaluated, XGBoost (AUC = 0.810) and Random Forest (AUC = 0.795) demonstrated the highest predictive performance for stroke-related cognitive frailty. Key predictors identified were education, nutritional status, physical exercise, Instrumental Activities of Daily Living (IADL), and age, with corresponding SHAP values of 0.28, 0.18, 0.16, 0.21, and 0.32, respectively. The SHAP values indicated that age and education level are the most significant factors in predicting the risk of cognitive frailty in this population.
Conclusion
This study developed eight risk prediction models for post-stroke cognitive frailty utilizing machine learning, with the XGBoost algorithm demonstrating superior performance. Leveraging readily available clinical and demographic indicators, the optimized XGBoost model serves as a practical tool for the early screening of cognitive frailty risk among community-dwelling elderly stroke survivors, particularly within primary care settings. This model can aid clinicians in devising targeted intervention strategies to mitigate disease progression and establish a foundation for future prospective studies examining the mechanisms underlying cognitive frailty in stroke populations. Further external validation is necessary to confirm its generalizability across various clinical contexts.
Citation: Zuo S, Liu N, Wang J, Li J, Zhu X, Jia Y (2026) Development and validation of a prediction model for long-term cognitive frailty risk in stroke patients based on CHARLS data. PLoS One 21(3): e0340715. https://doi.org/10.1371/journal.pone.0340715
Editor: Marina De Rui, University Hospital of Padova, ITALY
Received: June 27, 2025; Accepted: December 25, 2025; Published: March 25, 2026
Copyright: © 2026 Zuo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly because of CHARLS database.. Data are available from the (http://charls.pku.edu.cn) Institutional Data Access / Ethics Committee (contact via Ethics Review Board of Peking University: irb@pku.edu.cn) (IRB00001052-11015) for researchers who meet the criteria for access to confidential data.
Funding: This study was supported by the National Natural Science Foundation of China (Project title: Implementation of science-driven agent simulation to construct psychological and behavioural intervention procedures for stroke survivors based on stepped wedge randomized controlled trial, Grant No. 82260281); the Guizhou Science and Technology Plan Project and Guizhou Science and Technology Cooperation (Qiankehe) Foundation (Project title: Scientific study on the Construction of shared Medical Model and Optimization Mechanism for patients with Stroke Depression in low Resource area, Key Project No. ZK [2024] 069); the Science and Technology Fund Project of the Guizhou Provincial Health Commission (Grant No. gzwkj2023-588); and the Zunyi Medical University ’12345′ Future Teaching Masters Funding (First batch Project Code: 301216). All the aforementioned funding awards are awarded to Liu Ning. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Cognitive frailty is a heterogeneous syndrome defined by concurrent cognitive impairment and physical frailty in the absence of Alzheimer’s disease or other dementias [1,2]. It is considered potentially reversible and comprises two subtypes: reversible cognitive frailty and potentially reversible cognitive frailty [3]. The interaction between cognitive decline and physical deterioration poses a substantial risk, especially for middle-aged and older adults after stroke [4,5]. Reported prevalence of cognitive frailty after stroke ranges from 50% to 70% [6–8]. During post-stroke rehabilitation, cognitive domains such as attention, memory, and executive function frequently worsen [2,9]. Cognitive frailty impairs independent living and increases risks of depression, anxiety, dependence, reduced quality of life, and higher healthcare costs, thereby imposing additional societal burden [10–14]. Emerging evidence indicates that the reversibility of cognitive frailty depends on its stage, with earlier interventions offering greater potential for recovery [15–18]. Nevertheless, post-stroke rehabilitation remains largely focused on physical recovery, while cognitive assessment and intervention are insufficiently addressed [19]. Consequently, there is an urgent need to clarify risk factors and mechanisms of post-stroke cognitive frailty, develop accurate predictive methods, and design personalized strategies for diagnosis, treatment, and follow-up.
Machine learning (ML) has shown substantial promise in medicine and healthcare because of its ability to recognize patterns and generate predictions [20]. It applies to diverse data types and excels at handling large, high-dimensional, multi-source heterogeneous datasets [21,22]. By extracting latent patterns from complex data, ML models can help optimize individualized treatment strategies and support dynamic monitoring and management of health states. Recently, ML approaches have gained prominence in forecasting complex geriatric syndromes, particularly cognitive disorders. Several models have been developed to identify people at risk of cognitive impairment and dementia. For example, Li et al. [23] developed and validated a risk prediction model for cognitive impairment in older adults; Dong et al. [24] constructed a clinical model for cognitive impairment six months after stroke; and Wei et al. [25] built a machine-learning–based prediction model for poststroke dementia. These efforts have largely targeted cognitive impairment and dementia following stroke. However, few models have been developed to detect the risk of cognitive frailty among stroke survivors in community settings.
This study used data from the China Health and Retirement Longitudinal Study (CHARLS). In the county and village sampling stages, a multi-stage probability-proportional-to-size (PPS) method was applied. The survey encompassed 150 counties across 28 provinces, including autonomous regions and municipalities directly under the central government. A baseline survey was conducted in 2011, with national follow-ups in 2013, 2015, 2018, and 2020. Multidimensional data—psychosocial assessments (CES-D Depression Scale), biomarkers (e.g., cystatin C), and socioeconomic factors—were collected and integrated to enable machine learning (ML) analyses [26]. To date, ML studies using CHARLS have examined cognitive impairment in community-dwelling older adults [27,28]; however, none have developed an interpretable model specifically for predicting post-stroke cognitive frailty. This study aims to fill that critical gap.
Although machine learning (ML) models demonstrate robust predictive performance, the interpretation of individual variable contributions poses significant challenges, hindering their clinical application [29]. The SHapley Additive exPlanations (SHAP) method employs principles of optimal credit allocation and local interpretation, thereby enhancing model interpretability through the visual representation of feature importance [30].Consequently, this study aims to utilize CHARLS data to develop and validate an interpretable ML model and to apply SHAP for visual interpretation, with the objective of accurately and promptly predicting long-term cognitive frailty risk (years post-stroke) in community-dwelling stroke survivors.
Methods and materials
Research population
This study used data from the 2018 and 2020 waves of the CHARLS. This study used data from the 2018 and 2020 waves of the China Health and Retirement Longitudinal Study (CHARLS). This study is a retrospective analysis based on data from the CHARLS. The original CHARLS protocol was approved by the Ethics Review Board of Peking University (approval number: IRB00001052–11015), and all participants provided written informed consent at the time of enrollment.
We included individuals aged 60 years and older who reported a physician diagnosis of stroke during these survey years. The exclusion criteria were as follows: (1) no history of stroke in 2018 or 2020; (2) diagnosis of Alzheimer’s disease, dementia, or severe psychiatric disorders; (3) severe aphasia or dysarthria that hindered reliable cognitive assessment; (4) significant visual or auditory impairments that interfered with cognitive testing; and (5) severe systemic diseases, such as advanced heart failure or end-stage liver disease. After applying these criteria, a total of 2,325 stroke survivors were included in the final analysis. A flowchart illustrating participant selection, from the original CHARLS sample to the final cohort, is presented in Fig 1.
Outcome: Cognitive frailty
Cognitive frailty is defined by international consensus as the coexistence of physical frailty and mild cognitive impairment without dementia [1]. Physical frailty was assessed using the Fried frailty phenotype (FP) [31]:consisting of unintentional weight loss, slow walking speed, weak grip strength, self-reported exhaustion, low physical activity. Each component received score 0 (absent) or 1 (present), with scores 0–5. Scores more than 3 were frailty, 1–2 were pre-frailty and score 0 was no frailty. Cognitive impairment was assessed using the Mini-Mental State Examination (MMSE) [32] and the Clinical Dementia Rating (CDR) [33]. Participants were cognitively impaired if they had CDR score less than 18 (no formal education), less than 21 (1–6 years education) or less than 25 (more than 6 years education) were cognitively impaired if they had education-adjusted MMSE score less than 18 (no formal education), less than 21 (1–6 years education) or less than 25 (more than 6 years education). Participants who had Alzheimer’s disease or other dementias (CDR ≥ 1.0) were excluded, as such impairment is severe irreversible cognitive impairment distinct from mild non-demented cognitive impairment required for diagnosis of cognitive frailty.
Data extraction
During the development phase of the cognitive frailty risk prediction model for stroke survivors, variable screening was informed by a review of the literature [34,35], clinical experience, and PSCI scales [36,37]. Correlation analyses were performed to assess the relationship between variables and cognitive decline. Ultimately, 22 predictive factors were identified from the CHARLS dataset to facilitate a comprehensive evaluation. The same set of variables was employed in the model validation phase to ensure methodological consistency and evaluate predictive stability.
- Demographic information: sex, age, education level, residence (urban/rural), living arrangement (alone vs. with others), sleep quality (good vs. poor).
- Lifestyle and activity indicators: physical exercise (structured moderate-intensity activities such as square dancing, brisk walking, tai chi, swimming, or ball games ≥2 times/week, ≥ 30 minutes/session, excluding household chores); intellectual activities (e.g., reading, writing, calligraphy, photography, painting, musical instruments, handicrafts, stock trading, card/mahjong games, chess, internet use, or attending university for older adults ≥2 times/week); social activities (visiting or socializing with friends, volunteering, or club participation >2 times/week); smoking; drinking.
- Health status and self-reported indicators: chronic pain, history of falls, self-rated life satisfaction, self-rated health, and optimism about the future.
- Physical and clinical measures include number of chronic diseases, body mass index (BMI), waist circumference and nutritional status. Nutritional status was assessed using Mini Nutritional Assessment-Short Form (MNA-SF) consisting of food frequency, appetite and weight changes from CHARLS.MNA-SF score below 12 indicates malnutrition, while score above 12 indicates adequate nutrition. Depression was evaluated using 10-item Center for Epidemiologic Studies Depression Scale (CES-D-10) also based on CHARLS. Scores below 10 on CES-D-10 are depressive, while scores below 10 are non-depressive. Instrumental Activities of Daily Living (IADL) capacity was assessed using simplified IADL scale for living scenarios of elderly population. Respondents who reported inability to complete or needed help with at least one of the six IADL items were classified as IADL disability while those who could independently complete all six items were considered intact IADL capacity.
Data processing
Missing values (≤30%) were imputed using the mice package in R, employing predictive mean matching for continuous variables and logistic regression for categorical variables. Outliers were assessed and adjusted based on clinically acceptable ranges. Continuous variables underwent normalization, while categorical variables were subjected to one-hot encoding. For feature selection, we applied least absolute shrinkage and selection operator (LASSO) regression to the training set. We used 10-fold cross-validation to determine the optimal regularization parameter (λ), following the “1-standard error (1-SE) criterion.” This widely used approach favors model parsimony and predictive stability by choosing the largest λ within one standard error of the minimum cross-validation error [38,39]. To evaluate the robustness of the selected variables, we conducted a sensitivity analysis with elastic net regression using α values from 0.1 to 1, where α = 1 corresponds to LASSO. The analysis confirmed stable predictive contributions of the retained features across regularization schemes. This combined strategy reduces collinearity and overfitting while preserving the most prognostically relevant variables for subsequent model development [40].
Model development
Eight machine learning algorithms were employed to develop prediction models: logistic regression (LR), decision tree (DT), support vector machine (SVM), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), naïve Bayes (NB), random forest (RF), and light gradient boosting machine (LightGBM). Participants were randomly assigned to training (70%) and testing (30%) sets through stratified sampling to reduce bias. The models were constructed using the training set and subsequently validated internally with the testing set. All analyses were conducted in R (version 4.4.2). The workflow for model construction is illustrated in Fig 2.
LR: Logistic Regression; DT: Decision Tree;SVM: Support Vector Machine; LGBM:Light Gradient Boosting Machine; XGBoost: EXtreme Gradient Boosting; RF:Random Forest; NB:Naive Bayes; KNN:k-nearest neighbors; AUC:Area Under Curye: SHAP. SHapley Additive explanation; Se:Sensitivity; SP:Specificity; PPV: Pos Pred Value; NPV:Neg Pred Value; Re:Recall.
We assessed the performance of each model using the test set by generating receiver operating characteristic (ROC) curves and evaluating the predictive model through metrics such as area under the curve (AUC), sensitivity, specificity, recall, F1 score, and accuracy. The F1 score, which evaluates binary models by balancing precision and recall, yields values closer to 1 that indicate superior performance [41]. To improve interpretability, we calculated SHAP values using the “shapforxgboost” R package. These SHAP values quantify the marginal contribution of each predictor to the model’s output by averaging across all possible combinations of predictors. We created SHAP summary plots to visualize overall feature importance and SHAP dependence plots to demonstrate the effects of features on predictions [30].
Statistical analysis
Normally distributed variables were reported as mean ± standard deviation and analyzed using independent-samples t-tests. Non-normally distributed variables were presented as median (interquartile range) and assessed with the Mann–Whitney U test. Categorical variables were expressed as counts and percentages, with group differences evaluated using chi-square or Fisher’s exact tests. Correlations were analyzed using Pearson correlation coefficients. LASSO logistic regression was performed with the glmnet package, and ROC curves were generated using the pROC package. All statistical analyses were conducted in R software (version 4.4.2), which was selected for its compatibility with the latest stable versions of ML and SHAP-related packages. A two-sided p-value <0.05 was deemed statistically significant.
Results
Demographic information about the participants
This study involved 2,325 stroke survivors, among whom 688 (29.59%) were classified as having cognitive frailty (CF). The baseline characteristics of the study population, categorized by CF status, are detailed in Table 1. Participants were randomly assigned to a development set (n = 1,627) and a validation set (n = 698). Within the development set, 493 participants (30.30%) exhibited CF, while the validation set included 195 participants (27.94%) with CF. A comparative analysis of patient characteristics between the two sets is presented in Table 2. In the development set, CF was significantly associated with age, education, sex, marital status, permanent residence, instrumental activities of daily living (IADL), body mass index (BMI), nutritional status, chronic pain, social activity, living arrangement, and physical exercise (all p < 0.05). Similar associations were noted in the validation set.
Model development and performance
We developed a prediction model utilizing LASSO regression on 22 candidate variables. Employing the 1-standard-error (1-SE) criterion derived from 10-fold cross-validation, we identified five predictors: education, nutritional status, physical exercise, instrumental activities of daily living (IADL), and age. In the LASSO coefficient profile (Fig 3 A) and the cross-validation curve (Fig 3 B), these variables exhibited nonzero coefficients across a spectrum of regularization strengths, demonstrating their sustained contributions to prediction and affirming the robustness of their selection. To assess stability, we conducted a sensitivity analysis using elastic net regression across α values ranging from 0.1 to 1, where α = 1 corresponds to LASSO. The coefficient stability plot (Fig 3C) reveals consistent coefficient trajectories for the selected variables across all α values, without sign reversals or significant magnitude shifts. The overlap rate between LASSO-selected variables and those identified by elastic net exceeded 80% across α values, suggesting that the reduction from 22 candidates to five reflects consistent predictive relevance rather than random fluctuation.
(A) Coefficient curve vs. lambda: Shows coefficient changes with lambda; non-zero coefficients (optimal lambda) indicate key CF prediction features. (B) Optimal lambda in LASSO: Log(lambda) vs. partial likelihood deviation curve; optimal lambda (1-SE criterion) identified via tenfold cross-validation (vertical dashed line). (C) Coefficient stability across elastic net α: Trends of LASSO-retained variables’ coefficients across α (0.1–1, α = 1 = LASSO) in elastic net regression.
Using these predictors, various machine learning models were developed, including logistic regression (LR), decision tree (DT), XGBoost, support vector machine (SVM), k-nearest neighbors (KNN), naive Bayes (NB), random forest (RF), and LightGBM. The receiver operating characteristic (ROC) curves for all models are presented in Fig 4(A). Among these models, XGBoost achieved the highest predictive performance, with an area under the curve (AUC) of 0.810 (95% CI: 0.788–0.831), followed by RF (AUC = 0.795, 95% CI: 0.772–0.819), LightGBM (AUC = 0.780, 95% CI: 0.756–0.804), SVM (AUC = 0.777, 95% CI: 0.752–0.801), LR (AUC = 0.776, 95% CI: 0.752–0.800), KNN (AUC = 0.772, 95% CI: 0.748–0.796), NB (AUC = 0.766, 95% CI: 0.740–0.791), and DT (AUC = 0.763, 95% CI: 0.739–0.788). Model accuracy, sensitivity, and specificity are illustrated in Fig 4(B)and detailed in Table 3. XGBoost demonstrated the best overall performance, achieving the highest accuracy of 0.810, with a sensitivity of 0.84 and a specificity of 0.74, indicating a robust balance between identifying true cases of CF and minimizing false positives.
(A) ROC curve. (B) heat map Se:Sensitivity. SP:Specificity; PPV: Pos Pred Value;NPV:Neg Pred Value;Re:Recall;LR: Logistic Regression; DT: Decision Tree;SVM: Support Vector Machine; LGBM:Light Gradient Boosting Machine; XGBoost: EXtreme Gradient Boosting; RF:Random Forest; NB:Naive Bayes; KNN:k-nearest neighbors.
Predictor importance and SHAP analysis
To enhance interpretability, SHAP analysis was conducted on the XGBoost model, and the results are shown in Fig 5. The feature importance ranking plot (Fig 5A) presents predictors in descending order of their impact: education (1.073), nutritional status (0.432), age (0.405), IADL (0.285), and physical exercise (0.180). These findings suggest that education has the most significant influence on predicting cognitive frailty, followed by nutritional status, while physical exercise exerts the least impact among the five predictors. The summary plot (Fig 5B) illustrates the relationship between each predictor and the model output. It shows that higher education and improved nutritional status (indicated in purple, representing negative contributions) correlate with a lower predicted risk of cognitive frailty, whereas advanced age, impaired IADL, and insufficient physical exercise (indicated in yellow, representing positive contributions) elevate the predicted risk. To demonstrate individual-level predictions, a waterfall plot (Fig 5C) and a force plot (Fig 5D) are provided for a representative non-cognitive frailty case. In this instance, the model’s base value (the average prediction across all training samples) was −0.908. Education (value = 3) contributed the largest negative impact (−0.984), significantly reducing the predicted risk, followed by nutritional status (value = 1, contribution = −0.335). Conversely, physical exercise (value = 1), IADL (value = 1), and age (value = 2) had positive contributions (+0.158, + 0.243, + 0.314, respectively), slightly increasing the risk. Ultimately, the cumulative effect of these predictors resulted in a final predicted value of −0.172, indicating a low probability of cognitive frailty development in this individual.
(A) XGBoost model feature importance ranking bar plot. (B) XGBoost model summary plot. (C) XGBoost model waterfall plot. (D) The XGBoost model force plot; Each bar represents a feature. Purple indicates a negative contribution to the prediction result (reducing risk), while yellow indicates a positive contribution to the prediction result (increasing risk).
Discussion
This study developed and validated a machine learning model, specifically the extreme gradient boosting algorithm, to predict the long-term risk of cognitive frailty among middle-aged and older stroke survivors in China, using data from the China Health and Retirement Longitudinal Study. Our model demonstrated robust predictive performance and identified several key predictors, including age, education, nutritional status, physical exercise, and IADL. These findings contribute to the existing literature on the application of machine learning techniques in geriatric risk stratification and the prediction of cognitive frailty.
Currently, the majority of studies on machine learning prediction models related to post-stroke cognitive frailty primarily concentrate on post-stroke cognitive impairment or frailty. In contrast, research specifically addressing cognitive frailty, particularly within community settings for stroke survivors, remains relatively limited.Hu et al. (n = 6718) utilized data from the CLHLS to develop four machine learning models: logistic regression, random forest, XGBoost, and Bayesian networks, aimed at predicting the risk of cognitive impairment among cognitively normal elderly individuals in the Chinese community [42]. Their findings indicated that the combination of Bayesian networks and random forests, utilizing four selected predictors, achieved the highest accuracy, reaching 0.834. In 2023, Ji et al. (n = 397) [43] developed nine machine learning models to predict post-stroke cognitive impairment, identifying the Gaussian Naive Bayes (GNB) model as the most effective, with an AUC of 0.919. Lee et al. [44] also constructed multiple machine learning models to assess the risk of post-stroke cognitive impairment (PSCI) in patients with acute ischemic stroke (AIS). While these models primarily focused on PSCI, it is important to note that cognitive frailty and PSCI, despite potential overlap in clinical practice, have distinct differences in definition, severity, treatment strategies, and intervention methods. As a pre-dementia state, the early identification and accurate prediction of cognitive frailty are crucial for implementing preventive and intervention measures, thereby reducing the risk of cognitive impairment [1]. To our knowledge, studies that specifically develop and interpret machine learning models targeting cognitive vulnerability in post-stroke populations, particularly using large-scale community datasets such as CHARLS, remain relatively scarce.
The SHAP summary graph effectively illustrates the significance of each variable in predicting disability, thereby enhancing the model’s transparency and interpretability. The SHAP chart reveals that education and nutritional status are the two most critical factors. Their SHAP values exhibit a broad distribution range, suggesting that variations in these variables can substantially influence the risk of cognitive frailty. Data from this study indicate that elderly stroke patients face an elevated risk of cognitive frailty when their educational attainment is low. Empirical analysis shows [45] that educational level, a key indicator of cognitive reserve capacity, positively correlates with the preservation of cognitive function post-stroke. Additionally, a meta-analysis indicates [46] that individuals with more than 12 years of education have a 38% lower risk of post-stroke cognitive impairment (PSCI) compared to those with fewer than 6 years of education. Consequently, targeted screening and cognitive interventions are essential for individuals with low educational levels.In the same city, malnutrition may significantly contribute to the pathogenesis of cognitive frailty in elderly stroke patients. A foreign study [47] involving 5,414 community-dwelling individuals aged 55 and older without dementia examined the relationship between nutritional status and cognitive frailty, revealing a notably high prevalence of malnutrition among the elderly exhibiting cognitive frailty. A meta-analysis further indicates [48] that the risk of cognitive frailty in malnourished elderly individuals is 3.06 times greater than that of their well-nourished counterparts. Previous research has also demonstrated [49] that nutrients such as folic acid, flavonoids, and vitamin D exert a protective effect on cognitive function in the elderly, with deficiencies in these nutrients significantly heightening the risk of cognitive impairment. Stroke patients often experience inflammation due to the disease and inadequate nutrient intake, leading to a decline in subcutaneous fat and muscle mass, which markedly increases the risk of frailty [50]. Consequently, it is imperative to closely monitor the nutritional status of stroke patients. The regular application of mini-nutritional assessment (MNA) tools, along with timely dietary support, can play a crucial role in mitigating cognitive frailty.
Regular exercise and the maintenance of functional abilities are inversely related to cognitive frailty. An intervention study involving elderly individuals with cognitive frailty demonstrated that moderate exercise can enhance both cognitive and physical functions in this population [51]. Previous research has indicated that exercise promotes brain remodeling, delays brain atrophy, and improves muscle function, thereby indirectly mitigating cognitive decline [52]. Engaging in scientifically guided exercise holds significant potential for enhancing cognitive function and addressing physical frailty. Additionally, studies reveal that patients with mild cognitive impairment (MCI) frequently encounter challenges in performing complex daily activities, such as financial management and medication adherence, and a decline in these capabilities may signal an increased risk of dementia [53,54]. Consequently, rehabilitation following a stroke should prioritize the recovery of both instrumental activities of daily living and cognitive function, while promoting daily engagement to help postpone cognitive frailty. Age is another critical factor; data indicate that approximately 30% of stroke patients over 65 years old experience cognitive decline within three months of onset, with this figure rising to 50% among those aged 80 and above [55,56]. The interplay of stroke and advanced age may intensify cognitive frailty, leading to a synergistic amplification effect. In clinical practice, it is essential to enhance cognitive monitoring of elderly stroke patients. Standardized tools, such as the Montreal Cognitive Assessment (MoCA), should be integrated into routine evaluations to promote the early identification and intervention of cognitive frailty.
Clinical and research significance
At the clinical level, this model functions as an early risk-screening tool that assists community and clinical staff in identifying stroke survivors at high risk for cognitive frailty, thereby facilitating timely intervention. For patients with low educational attainment, malnutrition, reduced physical activity, or other functional declines, clinicians can implement targeted multidimensional measures, including nutritional support, cognitive training, and exercise guidance. At the research level, this study establishes a foundation for enhancing the understanding and management of poststroke cognitive frailty. By employing explainable machine learning for long-term risk prediction among stroke survivors in Chinese communities, this work addresses a specific research gap for this population. The model combines robust predictive performance with interpretable outputs, and techniques such as SHAP elucidate the direction and relative importance of key predictors while generating hypotheses for mechanistic studies. Furthermore, the use of a large, nationally representative longitudinal cohort enhances the model’s external applicability and the generalizability of the findings, providing a methodological reference for related research.
Research limitation and future research directions
Although we use a nationally representative dataset and advanced machine learning techniques, there are some limitations. First, the definition of cognitive frailty is derived from the Fried phenotype and MMSE/CDR scores, which may not comprehensively capture its multifaceted nature, including physical function, neurobiological changes, and psychosocial factors. Second, the inability to differentiate between various stages of cognitive frailty may hinder the precise identification of intervention windows for high-risk groups. Third, the CHARLS database does not include critical variables such as stroke severity, lesion location, and onset time. Additionally, the reliance on self-reported stroke history, which lacks imaging verification, may result in the underreporting of minor strokes, inaccurate reporting of other cerebrovascular events, and recall bias. Fourth, the absence of acute phase data within several months following a stroke limits the model’s applicability, rendering it suitable only for long-term cognitive frailty risk screening, rather than acute phase prediction. Future research should prioritize the collection of acute phase data to develop a staging prediction model. Furthermore, variables such as stroke subtypes and locations were excluded from the analysis due to insufficient sample size.Fifth, this study employed a cross-sectional design. While the association between variables and cognitive frailty was identified, a causal relationship could not be established, and unmeasured confounding factors may exist. Future cohort studies are necessary to further elucidate the causal pathways between predictors and cognitive frailty. Sixth, the model has undergone only internal validation and has yet to be evaluated externally. Consequently, caution is warranted when applying it to populations outside of China or specific groups, such as stroke patients in institutional care; external validation studies should be conducted subsequently. Finally, due to the limitations inherent in cross-sectional data and self-reported measurements, inferring causal relationships remains challenging despite the provision of risk estimates. Future research should aim to verify stroke diagnoses and explore related mechanisms by integrating medical records with imaging data.
Conclusion
Our research results show that machine learning models, especially XGBoost, can effectively predict cognitive frailty in community environments. The main predictors of cognitive frailty include educational attainment, nutritional status, exercise, IADL and age. However, this model was only developed for the elderly stroke population in the community, and the clinical value of its prediction results in guiding intervention still needs further verification.
Acknowledgments
We thank Peking University for the open data resources and all investigators who participated in the study.We also appreciate the constructive comments from the editors and each reviewerduring the revision of our manuscript.
References
- 1. Kelaiditi E, Cesari M, Canevelli M, van Kan GA, Ousset P-J, Gillette-Guyonnet S, et al. Cognitive frailty: Rational and definition from an (I.A.N.A./I.A.G.G.) international consensus group. J Nutr Health Aging. 2013;17(9):726–34. pmid:24154642
- 2. Sugimoto T, Arai H, Sakurai T. An update on cognitive frailty: Its definition, impact, associated factors and underlying mechanisms, and interventions. Geriatr Gerontol Int. 2022;22(2):99–109. pmid:34882939
- 3. Ruan Q, Yu Z, Chen M, Bao Z, Li J, He W. Cognitive frailty, a novel target for the prevention of elderly dependency. Ageing Res Rev. 2015;20:1–10. pmid:25555677
- 4. El Husseini N, Katzan IL, Rost NS, Blake ML, Byun E, Pendlebury ST, et al. Cognitive impairment after ischemic and hemorrhagic stroke: A scientific statement from the American Heart Association/American Stroke Association. Stroke. 2023;54(6):e272–91. pmid:37125534
- 5. Rost NS, Brodtmann A, Pase MP, van Veluw SJ, Biffi A, Duering M, et al. Post-stroke cognitive impairment and dementia. Circ Res. 2022;130(8):1252–71. pmid:35420911
- 6. Rostamian S, Mahinrad S, Stijnen T, Sabayan B, de Craen AJM. Cognitive impairment and risk of stroke: A systematic review and meta-analysis of prospective cohort studies. Stroke. 2014;45(5):1342–8. pmid:24676778
- 7. Burton L, Tyson SF. Screening for cognitive impairment after stroke: A systematic review of psychometric properties and clinical utility. J Rehabil Med. 2015;47(3):193–203. pmid:25590458
- 8. Donnelly N-A, Sexton E, Merriman NA, Bennett KE, Williams DJ, Horgan F, et al. The prevalence of cognitive impairment on admission to nursing home among residents with and without stroke: A cross-sectional survey of nursing homes in Ireland. Int J Environ Res Public Health. 2020;17(19):7203. pmid:33019730
- 9. Povroznik JM, Ozga JE, Vonder Haar C, Engler-Chiurazzi EB. Executive (dys)function after stroke: Special considerations for behavioral pharmacology. Behav Pharmacol. 2018;29(7):638–53. pmid:30215622
- 10. Orgeta V, McDonald KR, Poliakoff E, Hindle JV, Clare L, Leroi I. Cognitive training interventions for dementia and mild cognitive impairment in Parkinson’s disease. Cochrane Database Syst Rev. 2020;2(2):CD011961. pmid:32101639
- 11. Roesner K, Scheffler B, Kaehler M, Schmidt-Maciejewski B, Boettger T, Saal S. Effects of physical therapy modalities for motor function, functional recovery, and post-stroke complications in patients with severe stroke: A systematic review update. Syst Rev. 2024;13(1):270. pmid:39468642
- 12. Godin J, Armstrong JJ, Wallace L, Rockwood K, Andrew MK. The impact of frailty and cognitive impairment on quality of life: employment and social context matter. Int Psychogeriatr. 2019;31(6):789–97. pmid:30421692
- 13. Rutherford BR, Brewster K, Golub JS, Kim AH, Roose SP. Sensation and psychiatry: Linking age-related hearing loss to late-life depression and cognitive decline. Am J Psychiatry. 2018;175(3):215–24. pmid:29202654
- 14. Ma Q, Li R, Wang L, Yin P, Wang Y, Yan C, et al. Temporal trend and attributable risk factors of stroke burden in China, 1990-2019: an analysis for the Global Burden of Disease Study 2019. Lancet Public Health. 2021;6(12):e897–906. pmid:34838196
- 15. Mazza E, Ferro Y, Pujia R, Mare R, Maurotti S, Montalcini T, et al. Mediterranean diet in healthy aging. J Nutr Health Aging. 2021;25(9):1076–83. pmid:34725664
- 16. Mantovani E, Zucchella C, Schena F, Romanelli MG, Venturelli M, Tamburin S. Towards a redefinition of cognitive frailty. J Alzheimers Dis. 2020;76(3):831–43. pmid:32568197
- 17. Yuan M, Xu C, Fang Y. The transitions and predictors of cognitive frailty with multi-state Markov model: A cohort study. BMC Geriatr. 2022;22(1):550. pmid:35778705
- 18. Facal D, Burgo C, Spuch C, Gaspar P, Campos-Magdaleno M. Cognitive frailty: An update. Front Psychol. 2021;12:813398. pmid:34975703
- 19. Hoffmann T, Bennett S, Koh C-L, McKenna K. A systematic review of cognitive interventions to improve functional ability in people who have cognitive impairment following stroke. Top Stroke Rehabil. 2010;17(2):99–107. pmid:20542852
- 20.
Bennett M, Kleczyk EJ, Hayes K, Mehta R. Evaluating similarities and differences between machine learning and traditional statistical modeling in healthcare analytics. IntechOpen. 2022.
- 21. Das A, Dhillon P. Application of machine learning in measurement of ageing and geriatric diseases: a systematic review. BMC Geriatr. 2023;23(1):841. pmid:38087195
- 22. Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging Clin Exp Res. 2023;35(11):2363–97. pmid:37682491
- 23. Li W, Zeng L, Yuan S, Shang Y, Zhuang W, Chen Z, et al. Machine learning for the prediction of cognitive impairment in older adults. Front Neurosci. 2023;17:1158141. pmid:37179565
- 24. Dong Y, Ding M, Cui M, Fang M, Gong L, Xu Z, et al. Development and validation of a clinical model (DREAM-LDL) for post-stroke cognitive impairment at 6 months. Aging (Albany NY). 2021;13(17):21628–41. pmid:34506303
- 25. Wei Z, Li M, Zhang C, Miao J, Wang W, Fan H. Machine learning-based predictive model for post-stroke dementia. BMC Med Inform Decis Mak. 2024;24(1):334. pmid:39529118
- 26. Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: The China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol. 2014;43(1):61–8. pmid:23243115
- 27. Ye X, Wang X, Wang Y, Lin H. Predicting cognitive function among Chinese community-dwelling older adults: A supervised machine learning approach. Prev Med. 2025;196:108307. pmid:40349986
- 28. Pu L, Pan D, Wang H, He X, Zhang X, Yu Z, et al. A predictive model for the risk of cognitive impairment in community middle-aged and older adults. Asian J Psychiatr. 2023;79:103380. pmid:36495830
- 29. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar LJWIRDM. Interpretability of machine learning‐based prediction models in healthcare. J Med Internet Res. 2020;10(5):e1379.
- 30. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017;30.
- 31. Cesari M, Gambassi G, van Kan GA, Vellas B. The frailty phenotype and the frailty index: different instruments for different purposes. Age Ageing. 2014;43(1):10–2. pmid:24132852
- 32. Cardoso S, Barros R, Marôco J, de Mendonça A, Guerreiro M. Different MMSE domains are associated to cognitive decline and education. Appl Neuropsychol Adult. 2024;31(4):533–9. pmid:35234096
- 33. O’Bryant SE, Lacritz LH, Hall J, Waring SC, Chan W, Khodr ZG, et al. Validation of the new interpretive guidelines for the clinical dementia rating scale sum of boxes score in the national Alzheimer’s coordinating center database. Arch Neurol. 2010;67(6):746–9. pmid:20558394
- 34.
Binning L, Basquill C, Tvrda L, Quinn TJCD. Epidemiology and outcomes associated with cognitive frailty and reserve in a stroke population: systematic review and meta-analysis. 2024;30:1–12.
- 35. Filler J, Georgakis MK, Dichgans M. Risk factors for cognitive impairment and dementia after stroke: A systematic review and meta-analysis. Lancet Healthy Longev. 2024;5(1):e31–44. pmid:38101426
- 36. Lee M, Yeo N-Y, Ahn H-J, Lim J-S, Kim Y, Lee S-H, et al. Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alzheimers Res Ther. 2023;15(1):147. pmid:37653560
- 37. Zhou H, Han Y, Xie D, Zheng K, Zhou Z, Zhu H, et al. Predictors and clinical implications of post-stroke cognitive impairment: A retrospective study. Sci Rep. 2025;15(1):24198. pmid:40624177
- 38. Hu J-Y, Wang Y, Tong X-M, Yang T. When to consider logistic LASSO regression in multivariate analysis?. Eur J Surg Oncol. 2021;47(8):2206. pmid:33895026
- 39. Nalbantov G, Bonvalot S. Reply to “When to consider logistic LASSO regression in multivariate analysis?”. Eur J Surg Oncol. 2021;47(8):2207. pmid:33895024
- 40. Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14:75. pmid:24903709
- 41. Hsu C-W, Lin C-J. A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw. 2002;13(2):415–25. pmid:18244442
- 42. Hu M, Shu X, Yu G, Wu X, Välimäki M, Feng H. A risk prediction model based on machine learning for cognitive impairment among chinese community-dwelling elderly people with normal cognition: Development and Validation study. J Med Internet Res. 2021;23(2):e20298. pmid:33625369
- 43. Ji W, Wang C, Chen H, Liang Y, Wang S. Predicting post-stroke cognitive impairment using machine learning: A prospective cohort study. J Stroke Cerebrovasc Dis. 2023;32(11):107354. pmid:37716104
- 44. Lee M, Yeo N-Y, Ahn H-J, Lim J-S, Kim Y, Lee S-H, et al. Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alzheimers Res Ther. 2023;15(1):147. pmid:37653560
- 45. Mirza SS, Portegies MLP, Wolters FJ, Hofman A, Koudstaal PJ, Tiemeier H, et al. Higher Education is associated with a lower risk of dementia after a stroke or TIA. The rotterdam study. Neuroepidemiology. 2016;46(2):120–7. pmid:26794600
- 46. Contador I, Alzola P, Stern Y, de la Torre-Luque A, Bermejo-Pareja F, Fernández-Calvo B. Is cognitive reserve associated with the prevention of cognitive decline after stroke? A Systematic review and meta-analysis. Ageing Res Rev. 2023;84:101814. pmid:36473672
- 47. Chye L, Wei K, Nyunt MSZ, Gao Q, Wee SL, Ng TP. Strong relationship between malnutrition and cognitive frailty in the singapore longitudinal ageing studies (SLAS-1 and SLAS-2). J Prev Alzheimers Dis. 2018;5(2):142–8. pmid:29616708
- 48. Vatanabe IP, Pedroso RV, Teles RHG, Ribeiro JC, Manzine PR, Pott-Junior H, et al. A systematic review and meta-analysis on cognitive frailty in community-dwelling older adults: risk and associated factors. Aging Ment Health. 2022;26(3):464–76. pmid:33612030
- 49. Scarmeas N, Anastasiou CA, Yannakoulia M. Nutrition and prevention of cognitive impairment. Lancet Neurol. 2018;17(11):1006–15. pmid:30244829
- 50. Damanti S, Senini E, De Lorenzo R, Merolla A, Santoro S, Festorazzi C, et al. Acute Sarcopenia: Mechanisms and Management. Nutrients. 2024;16(20):3428. pmid:39458423
- 51. Yoon DH, Lee J-Y, Song W. Effects of resistance exercise training on cognitive function and physical performance in cognitive frailty: A randomized controlled trial. J Nutr Health Aging. 2018;22(8):944–51. pmid:30272098
- 52. Dulac MC, Aubertin-Leheudre M. Exercise: An important key to prevent physical and cognitive frailty. J Frailty Aging. 2016;5(1):3–5. pmid:26980362
- 53. Sugimoto T, Ono R, Kimura A, Saji N, Niida S, Sakai T, et al. Impact of cognitive frailty on activities of daily living, cognitive function, and conversion to dementia among memory clinic patients with mild cognitive impairment. J Alzheimers Dis. 2020;76(3):895–903. pmid:32568192
- 54. Jekel K, Damian M, Wattmo C, Hausner L, Bullock R, Connelly PJ, et al. Mild cognitive impairment and deficits in instrumental activities of daily living: A systematic review. Alzheimers Res Ther. 2015;7(1):17. pmid:25815063
- 55. Salloway S, Chalkias S, Barkhof F, Burkett P, Barakos J, Purcell D, et al. Amyloid-related imaging abnormalities in 2 Phase 3 studies evaluating aducanumab in patients with early alzheimer disease. JAMA Neurol. 2022;79(1):13–21. pmid:34807243
- 56. Aamodt EB, Alnæs D, de Lange A-MG, Aam S, Schellhorn T, Saltvedt I, et al. Longitudinal brain age prediction and cognitive function after stroke. Neurobiol Aging. 2023;122:55–64. pmid:36502572