Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Development and validation of an interpretable machine learning model for predicting left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients

  • Chaoqun Huang,

    Roles Formal analysis, Software, Writing – original draft

    Affiliation Department of Cardiovascular Medicine, The First Bethune Hospital of Jilin University, Changchun, Jilin Province, China

  • Shangzhi Shu,

    Roles Investigation, Software, Validation

    Affiliation Department of Cardiovascular Medicine, The First Bethune Hospital of Jilin University, Changchun, Jilin Province, China

  • Miaomiao Zhou ,

    Contributed equally to this work with: Miaomiao Zhou, Zhenming Sun

    Roles Data curation, Formal analysis, Methodology

    Affiliation Department of Cardiovascular Medicine, The First Bethune Hospital of Jilin University, Changchun, Jilin Province, China

  • Zhenming Sun ,

    Contributed equally to this work with: Miaomiao Zhou, Zhenming Sun

    Roles Data curation, Formal analysis, Methodology, Software

    Affiliation Department of Cardiovascular Medicine, The First Bethune Hospital of Jilin University, Changchun, Jilin Province, China

  • Shuyan Li

    Roles Supervision, Writing – review & editing

    li_sy@jlu.edu.cn

    Affiliation Department of Cardiovascular Medicine, The First Bethune Hospital of Jilin University, Changchun, Jilin Province, China

Abstract

Purpose

Left atrial thrombus or spontaneous echo contrast (LAT/SEC) are widely recognized as significant contributors to cardiogenic embolism in non-valvular atrial fibrillation (NVAF). This study aimed to construct and validate an interpretable predictive model of LAT/SEC risk in NVAF patients using machine learning (ML) methods.

Methods

Electronic medical records (EMR) data of consecutive NVAF patients scheduled for catheter ablation at the First Hospital of Jilin University from October 1, 2022, to February 1, 2024, were analyzed. A retrospective study of 1,222 NVAF patients was conducted. Nine ML algorithms combined with demographic, clinical, and laboratory data were applied to develop prediction models for LAT/SEC in NVAF patients. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression. Multiple ML classification models were integrated to identify the optimal model, and Shapley Additive exPlanations (SHAP) interpretation was utilized for personalized risk assessment. Diagnostic performances of the optimal model and the CHA2DS2-VASc scoring system for predicting LAT/SEC risk in NVAF were compared.

Results

Among 1,078 patients included, the incidence of LAT/SEC was 10.02%. Six independent predictors, including age, non-paroxysmal AF, diabetes, ischemic stroke or thromboembolism (IS/TE), hyperuricemia, and left atrial diameter (LAD), were identified as the most valuable features. The logistic classification model exhibited the best performance with an area under the receiver operating characteristic curve (AUC) of 0.850, accuracy of 0.812, sensitivity of 0.818, and specificity of 0.780 in the test set. SHAP analysis revealed the contribution of explanatory variables to the model and their relationship with LAT/SEC occurrence. The logistic regression model significantly outperformed the CHA2DS2-VASc scoring system, with AUCs of 0.831 and 0.650, respectively (Z = 7.175, P < 0.001).

Conclusions

ML proves to be a reliable tool for predicting LAT/SEC risk in NVAF patients. The constructed logistic regression model, along with SHAP interpretation, may serve as a clinically useful tool for identifying high-risk NVAF patients. This enables targeted diagnostic evaluations and the development of personalized treatment strategies based on the findings.

Introduction

Atrial fibrillation (AF) is one of the most prevalent arrhythmias, predisposing individuals to thromboembolic events, heart failure, and hospitalizations, while concurrently diminishing life quality, exercise capacity [1]. The prevalence of AF is on the rise, with 33.5 million in 2010 [2], that rose to 37.6 million in 2017 [3], and is projected to double by number in the year 2050 [4].

Systemic embolism, particularly stroke, stands out as a pivotal complication of AF, with AF patients facing a fourfold higher risk of ischemic stroke [5]. While the CHA2DS2-VASc score is extensively utilized to stratify stroke risk in NVAF patients [6], its correlation with left atrial thrombus (LAT) formation exhibits limitations. Left atrial appendage thrombus is part of LAT. The unique anatomical features and functional properties of the left atrial appendage render it the primary site for thrombus formation. LAT and SEC are recognized as significant contributors to cardiogenic embolism in NVAF [7]. However, transesophageal echocardiography (TEE), the gold standard for detecting LAT and SEC, is semi-invasive by nature, with rare but inherent risks and leads to patient discomfort and necessitates some form of sedation. And it demands specialized skills for accurate performance and interpretation [8]. Hence, a potentially non-invasive and efficacious method capable of identifying LAT/SEC would hold substantial clinical value.

ML represents an emerging frontier in medicine, embodying a potent suite of algorithms adept at representing, adapting to, learning from, predicting, and analyzing data. ML is poised as the future of biomedical research, personalized medicine, and computer-aided diagnosis [9, 10]. While several studies have prognosticated LAT/SEC risk in NVAF [1116], there is limited research on utilizing machine learning to develop predictive models for thrombus formation in AF patients.

Therefore, this study pursues three primary objectives: firstly, to pinpoint the important variables for predicting LAT/SEC in NVAF patients; secondly, to identify the optimal performing ML model for predicting LAT/SEC, utilizing SHAP values to quantitatively visualize the relationships between risk factors and outcomes; and finally, to compare the ML-based prediction model with the conventional CHA2DS2-VASc scoring system.

Materials and methods

Materials

Subjects.

A total of 1,222 NVAF patients were scheduled to undergo catheter ablation in the Department of Cardiology, the First Hospital of Jilin University from October 1, 2022 to February 1, 2024. Data for this study was obtained June 1, 2024. Information about identifying individual participants was concealed during or after data collection.

Inclusion criteria.

The inclusion criteria were as follows: (1) NVAF who underwent TEE; (2) informed consent and voluntary participation in the study; and (3) having complete clinical data.

Exclusion standards.

The exclusion criteria were as follows: (1) patients who were unable to cooperate, unwilling to participate; (2) patients with rheumatic heart disease and severe valvular heart disease; (3) patients with permanent AF; (4) patients unable to tolerate at least 3 month of oral anticoagulation therapy post catheter ablation; (5) patients who did not undergo TEE; (6) patients with incomplete clinical data.

Methods

Grouping methods and diagnosis of LAT and SEC.

Patients were divided into two groups according to the presence or absence of LAT or SEC. LAT was defined as an echodense mass with tissue characteristics distinct from the left atrial endocardial wall, while SEC was characterized by an echogenic, swirling blood flow pattern at standard gain settings during the cardiac cycle [17].

Study indicators.

Demographic, clinical, and laboratory data were extracted from the EMR.

  1. Demographic data included age, sex, body mass index (BMI), AF type (paroxysmal or non-paroxysmal), hyperuricemia, history of hypertension, diabetes, ischemic stroke or thromboembolism (IS/TE), coronary heart disease (CHD), hypertrophic cardiomyopathy (HCM), presence of tumors, hypothyroidism, hyperthyroidism, and CHA2DS2-VASc score. Paroxysmal AF was defined as AF episodes terminating within 7 days (either spontaneously or with medical intervention), while non-paroxysmal AF refers to other types of AF [6]. Hyperuricemia was defined as uric acid levels exceeding 420 umol/L in men and 360 umol/L in women [18].
  2. Laboratory parameters included creatinine (Cr), alanine aminotransferase (ALT), aspartate aminotransferase (AST), serum albumin (ALB), fasting blood glucose (FBG), total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL-c), high-density lipoprotein cholesterol (HDL-c), lymphocyte count (LY), monocyte count (MO), red blood cell count (RBC), hemoglobin (HGB), mean corpuscular volume (MCV), platelet count (PLT), and mean platelet volume (MPV).
  3. Echocardiographic parameters encompassed left ventricular ejection fraction (LVEF), left atrial diameter (LAD), left ventricular end-diastolic diameter (LVDD), mitral regurgitation area (MRA), tricuspid regurgitation area (TRA), interventricular septum thickness (IVST), and left ventricular posterior wall thickness (LVPWT).

Feature selection.

Initially, R software (glmnet 4.1.8) was employed to perform LASSO regression, a widely utilized technique for feature selection. LASSO regression constructs a penalty function that compresses some regression coefficients, enforcing the sum of the absolute values of coefficients to be less than a predetermined threshold while setting some coefficients to zero, thereby achieving model refinement. LASSO regression retains the advantage of subset shrinkage as a biased estimator, particularly effective for datasets with complex covariance structures. This algorithm utilizes a 10-fold cross-validation approach to automatically eliminate features with zero coefficients. Subsequently, the results of LASSO regression analysis were utilized to conduct multivariate logistic regression analysis, ultimately identifying significant factors with P < 0.05.

ML model establishment and development.

Nine algorithms were utilized to develop and compare prediction models. The characteristic factors that were selected based on LASSO and multivariate logistic regression were used in the prediction models. The ML models, including eXtreme Gradient Boosting (XGBoost), Logistic Regression, Light Gradient Boosting Machine (LightGBM), RandomForest, Adaptive Boosting (AdaBoost), DecisionTree, Gradient Boosting (GBDT), Gaussian Naïve Bayes (GNB), and Complement Naïve Bayes (CNB), were constructed using Python (scikit-learn 0.22.1, xgboost 1.2.1, lightgbm 3.2.1). A bootstrap resampling technique was employed to train and validate the classification of the ML models. The patients were randomly divided into training and test sets (7:3). The validation dataset was utilized to evaluate and compare the performance of each model. The AUC, accuracy, sensitivity, specifcity, positive predictive value (PPV), negative predictive value (NPV), and F1 scores were used to evaluate the ability of the model to predict LAT/SEC in patients with NVAF. Calibration curves were employed to evaluate the predictive power of the model, and a comprehensive assessment of the predictive model was conducted to validate its utility in decision support or broader simulation modeling.

Model optimization and evaluation.

To ensure model stability, 10-fold cross-validation was employed to evaluate the predictive ability of the model. The training set was randomly divided into 10 groups, with 9 groups used for training in each iteration of the 10-fold cross-validation, and the remaining group designated as the validation set. During each training iteration, a 30% subset was randomly sampled from the training data to assess model performance. Subsequently, model discrimination was quantified using receiver operating characteristic (ROC) curve analysis, and predictive accuracy was evaluated using the obtained AUCs and calibration. Decision curve analysis (DCA) was utilized to estimate clinical utility and net benefit. Feature importance was assessed using SHAP, with higher absolute SHAP values indicating features that had the greatest impact on the model’s prediction score. Additionally, SHAP was employed to calculate prediction performance for an individual sample.

Bias control

The process of data collection should strictly ensure the comprehensiveness, accuracy, non-duplication, and clear definition of the collected data.

Study size

According to predictive research, the effective sample size is determined by the number of outcome events, which should be 5–10 times the number of included variables. This study includes 23 observational indicators, thus the estimated number of outcome events should be at least 115 cases. Preliminary trials indicate that the incidence of LAT/SEC is approximately 11%. Therefore, the required sample size for the training set should be at least 1045 cases.

Statistical analysis

Continuous variables were presented as mean ± standard deviation or median with interquartile range and analyzed using the unpaired t-test or the Mann-Whitney U test, as appropriate. Categorical variables were expressed as absolute numbers (n) and relative frequencies (%) and analyzed using the Chi-squared test. Bilateral p-values less than 0.05 were considered statistically significant. Statistical analysis was conducted using SPSS (version 25.0), R (version 4.2.3), Python (version 3.11.4), and MedCalc (version 22.009).

Ethics declarations

The study was approved by the Review Ethics Committee of the First Bethune Hospital of Jilin University (approval number: 2024–665), and informed consent was waived due to the retrospective nature of the study.

Results

Comparison of baseline data

During the study period, 1,222 consecutive NVAF patients were enrolled, and 144 patients were excluded due to incomplete data. Ultimately, a total of 1078 patients were included in this study. The demographic, clinical, and laboratory data of the patients are summarized in Table 1. Among them, 15 patients had LAT and 93 had SEC, resulting in an overall incidence of LAT/SEC of 10.02% (108/1078). The majority of participants were male (63.27%), and 43.6% had non-paroxysmal AF. The median age was 62 years, the median LVEF was 62%, and the median CHA2DS2-VASc score was 2. Compared to the control group, patients with LAT/SEC were older, had a higher prevalence of non-paroxysmal AF, diabetes, hyperuricemia, history of IS/TE, and tumor, and exhibited lower ABL, LVEF, and higher AST, ALT, FBG, Crea, HGB, as well as larger MCV, MPV, LAD, LVDD, MRA, TRA, and IVST. However, other anthropometric, biochemical, and clinical parameters did not show significant differences between the two groups. Notably, in patients with HCM or cardiac amyloidosis in conjunction with AF, the risk of stroke is significantly increased. Current studies recommend that such patients should routinely receive anticoagulation therapy, regardless of their CHA2DS2-VASc score [6]. However, in this study, the prevalence of HCM did not differ significantly between the two groups, which may be due to the small sample size of HCM patients, leading to a lack of statistical significance. Additionally, patients with cardiac amyloidosis were not mentioned in this study due to the unavailability of relevant data.

thumbnail
Table 1. Comparison of baseline characteristics between the LAT/SEC and control cohorts.

https://doi.org/10.1371/journal.pone.0313562.t001

Feature selection and comparison of multiple classification models

We initially analyzed 37 variables excluding the stroke score. Among them, 23 variables with a P-value of less than 0.1 were selected based on the univariate analysis. These variables included age, BMI, non-paroxysmal AF, hypertension, diabetes, IS/TE, tumor, hyperuricemia, ABL, AST, ALT, FBG, Crea, HGB, MCV, MPV, LVEF, LAD, LVDD, MRA, TRA, IVST, and LVPWT. LASSO regression, a technique that can compress variable coefficients to prevent overfitting and resolve severe collinearity issues, was employed. The LASSO regression analysis (Fig 1) revealed that, at a lambda value with minimum mean square error of 0.007, the initial 23 independent variables were reduced to 13. These 13 variables included age, non-paroxysmal AF, diabetes, IS/TE, tumor, hyperuricemia, AST, ALT, MCV, MPV, LVEF, LAD, and IVST. Subsequently, to further control for the influence of confounding factors, the aforementioned 13 independent variables underwent multivariate logistic regression analysis. Ultimately, only age, non-paroxysmal AF, diabetes, IS/TE, hyperuricemia, and LAD were determined as characteristic factors (p < 0.05), as demonstrated in Table 2.

thumbnail
Fig 1. LASSO regression analysis.

(a) The use of 10-fold cross-validation to draw vertical lines at selected values, where the optimal lambda produces 5 nonzero coefficients. (b) In the LASSO model, the coefficient profiles of 23 texture features were drawn from the log (λ) sequence. Vertical dotted lines are drawn at the minimum mean square error (λ = 0.007) and the standard error of the minimum distance (λ = 0.026).

https://doi.org/10.1371/journal.pone.0313562.g001

Comprehensive analysis of classified multi-model

The performance of the 9 ML classification models in predicting LAT/SEC risk in both the training and validation sets was compared and summarized in Table 3 and Fig 2. Upon a comprehensive evaluation utilizing multiple indicators, it was observed that the logistic regression model exhibited the most robust performance in predicting LAT/SEC among NVAF patients. Fig 2a and 2b depict the comparison of ROC curves for different ML models in both the training and validation sets. Notably, DecisionTree was found to be more susceptible to overfitting, while logistic regression demonstrated relatively stable performance. Calibration curves (Fig 2c) were constructed to assess the accuracy of the models. Additionally, the forest plot (Fig 2d) illustrates the ROC results for LAT/SEC prediction by each model, with error bars indicating the mean and standard deviation of the ROC.

thumbnail
Fig 2. Comprehensive analysis of ML models.

(a) ROC curve analysis of ML algorithms for the prediction of LAT/SEC in the training set. (b) ROC curve analysis of ML algorithms for the prediction of LAT/SEC in the validation set. (c) Calibration plots for predicting LAT/SEC in NVAF patients using various models. (d) Forest Plot of each model AUC score.

https://doi.org/10.1371/journal.pone.0313562.g002

thumbnail
Table 3. Predictive performance of 9 ML algorithms in training and validation sets for LAT/SEC in NVAF patients.

https://doi.org/10.1371/journal.pone.0313562.t003

Optimal model construction and evaluation

Logistic regression analysis and 10-fold cross-validation were conducted. The results demonstrated that the average AUC of the training set was 0.825 (95% CI 0.780–0.871), the average AUC of the validation set was 0.814 (95% CI 0.682–0.943), and the AUC of the test set was 0.850 (95% CI 0.780–0.920) (Table 4 and Fig 3a–3c). Meanwhile, we have presented the performance of CHA2DS2-VASc in predicting LAT/SEC across the training, testing, and validation sets. Considering that the performance of the validation set in terms of the AUC index was slightly lower than that of the test set, or the difference was less than 10%, the model fitting was considered successful. Additionally, the learning curve illustrated that both the training and validation sets exhibited strong fitting and high stability (Fig 3d). Furthermore, calibration plots (Fig 3e) were utilized to assess the accuracy of the model, revealing excellent concordance between the predicted probabilities of the logistic regression model and observed LAT/SEC rates. Subsequent construction of the Decision Curve Analysis (DCA) for this model in our study (Fig 3f) suggested that the ML model provided a greater net benefit compared to a treat-all or treat-none strategy, with a risk threshold ranging from approximately 5% to 67%. Additionally, the KS statistic shows that when the absolute KS value reaches its maximum (0.598), the predicted probability is 0.104 (S1 Fig). This indicates strong model discrimination, with patients at higher risk of LAT/SEC when the predicted score exceeds 0.104. These findings underscored the suitability of the logistic regression model for the classification modeling task of the dataset.

thumbnail
Fig 3. Logistic regression model evaluation.

(a-c) The ROC curves of logistic regression using the 10-fold cross-validation on the training set (a), validation set (b), and test set (c). (d) Machine learning curve. (e) calibration plots for logistic regression. (f) Decision curve analysis graph showing the net benefit against threshold probabilities based on decisions from model outputs.

https://doi.org/10.1371/journal.pone.0313562.g003

thumbnail
Table 4. Diagnostic performance of the logistic regression model for the prediction of LAT/SEC risk in NVAF.

https://doi.org/10.1371/journal.pone.0313562.t004

Interpretation of the model using SHAP

To elucidate the predictive influence of selected variables, we employed SHAP to illustrate their impact on the formation of LAT/SEC within the model. In Fig 4a, the six most significant features in our model are depicted. Each feature’s importance is represented by colored dots, with red indicating high-risk values and blue representing low-risk values. Non-paroxysmal AF, diabetes, IS/TE, hyperuricemia, older age, and larger LAD were associated with increased formation of LAT/SEC in NVAF patients. Fig 4b displays the ranking of these six risk factors based on the average absolute SHAP value, providing insight into their relative importance within the predictive model. We also used SHAP values to demonstrate the relative weightings of each variable in predicting LAT/SEC across the remaining 8 ML models (S2 Fig). Additionally, we present a representative case to illustrate the model’s interpretability: an NVAF patient with LAT/SEC exhibiting a high SHAP predictive score (0.17), as depicted in Fig 4c.

thumbnail
Fig 4. SHAP analysis of the model.

(a) Feature attributions in SHAP. Each line corresponds to a feature, with SHAP values plotted on the abscissa. Red dots denote higher values, while blue dots represent lower values. (b) Importance of variables depicted as bars, indicating their contribution to model predictions. (c) SHAP scores elucidate the predicted risk of LAT/SEC in an individual subject.

https://doi.org/10.1371/journal.pone.0313562.g004

Comparison of diagnostic performances between the optimal model and CHA2DS2-VASc scoring system in predicting the risk of LAT/SEC risk in NVAF

We conducted a comparative analysis of the diagnostic accuracy between the ML-based logistic regression model and the CHA2DS2-VASc scoring system for predicting LAT/SEC risk in NVAF patients (Table 5). The ROC curve of the CHA2DS2-VASc scoring system was 0.650 (95% CI 0.597–0.702), whereas the logistic regression model yielded a higher ROC of 0.831 (95% CI 0.790–0.868). The observed difference between the ROC values was 0.181, which demonstrated statistical significance (Z = 7.175, P < 0.001). Fig 5 illustrates the ROC curves of both models, highlighting the superior performance of the ML-based logistic regression model over the CHA2DS2-VASc scoring system.

thumbnail
Fig 5. ROC of the logistic regression model and CHA2DS2-VASc in predicting LAT/SEC risk in NVAF.

https://doi.org/10.1371/journal.pone.0313562.g005

thumbnail
Table 5. Performance comparison of the proposed ML-based logistic regression model with CHA2DS2-VASc in predicting LAT/SEC risk in NVAF.

https://doi.org/10.1371/journal.pone.0313562.t005

Discussion

While the CHA2DS2-VASc score is commonly utilized for stroke risk stratification in NVAF patients, its efficacy in predicting LAT/SEC, recognized as a major contributor to cardiogenic embolism in NVAF, is limited. Multicenter study showed the C-statistic values of the CHA2DS2-VASc scores concerning LAT/SEC were 0.643 [12], and were 0.650 in this study. Although TEE is the gold standard for detecting LAT and SEC, it requires special skills and is a relatively invasive method. Consequently, there is a pressing need for a non-invasive and effective method to identify NVAF patients at high risk of LAT/SEC, allowing for further diagnostic evaluations, such as TEE, in this population. This would hold significant clinical value.

Previous studies have reported varying incidence rates of LAT/SEC in NVAF patients, ranging from 4.3% to 32.8% [1113, 15, 1922]. This discrepancy could stem from inconsistent anticoagulant therapy or potential racial disparities among the enrolled patients. In our study, the prevalence of LAT/SEC was approximately 10%. Unfortunately, recent anticoagulation data for the patients were unavailable. Leveraging LASSO and multivariate logistic regression analyses, we identified age, non-paroxysmal AF, diabetes, IS/TE, hyperuricemia, and LAD as closely associated with LAT/SEC. Among the ML algorithms examined, the logistic regression model exhibited superior performance. We successfully developed a novel ML-based prediction model for assessing LAT/SEC risk in NVAF patients, further refining its accuracy and clinical validity through automatic parameter adjustment and internal cross-validation. Evaluation using calibration and DCA curves reinforced the model’s utility for classifying the dataset. Notably, our ML-based logistic regression model significantly outperformed the CHA2DS2-VASc scoring system. In clinical practice, based on our newly developed predictive model, patients with NVAF who have a high SHAP prediction score—specifically, above 0.104—are at an increased risk of LAT/SEC. Clinicians should recommend that such patients undergo TEE to evaluate the presence of LAT/SEC. Additionally, comprehensive assessments of left atrial and left atrial appendage function, including parameters such as strain or fibrosis, should be performed. Finally, treatment strategies, including anticoagulation, catheter ablation, or LAA closure, should be determined based on the evaluation results.

However, interpreting ML prediction models comprehensively and visually presenting predictive results to clinicians has always been challenging. Additionally, an exploratory analysis of the ML model, as presented in the SHAP value plot, revealed the importance of six variables. These variables ranked in importance from high to low were LAD, non-paroxysmal AF, age, hyperuricemia, IS/TE, and diabetes. Notably, age, IS/TE, and diabetes were included in the CHA2DS2-VASc score, while LAD, non-paroxysmal AF, and hyperuricemia were not included. In the following, we delve into the impact of LAD, non-paroxysmal AF, and hyperuricemia on LAT and SEC.

Several studies have explored the predictive value of various factors in assessing the risk of LAT or SEC in patients with NVAF. One study identified an enlarged LAD as an independent risk factor for LAT/SEC among NVAF patients with low CHA2DS2-VASc scores [23]. Similarly, another study noted that the LAT/SEC group exhibited a larger LAD compared to the non-LAT/SEC group among NVAF patients, underscoring the significance of LAD in predicting LAT/SEC [20]. Additionally, a combined predictive model incorporating LAD, LVEF, serum uric acid, and brain natriuretic peptide was developed to evaluate the risk of cardiogenic stroke in NVAF patients, further emphasizing the importance of LAD in risk assessment [24]. Consistent with previous findings, our study identified LAD as the most valuable variable for predicting LAT/SEC, with median LAD measurements of 39 mm and 45 mm in the LAT/SEC group and control group, respectively. These findings highlight the critical role of LAD in predicting LAT/SEC in NVAF patients and underscore the necessity for comprehensive risk assessment models that integrate multiple variables to enhance risk prediction accuracy. However, it is important to recognize the limitations of LAD in accurately assessing left atrial size. More reliable parameters, such as left atrial volume or left atrial volume index, could provide a more accurate assessment and potentially improve the model’s reliability. Unfortunately, these data were not available for analysis in this study.

Non-paroxysmal AF emerged as the second most significant predictor variable. Clinically, AF presents in various patterns including first-visit AF, paroxysmal AF, persistent AF, long-term persistent AF, and permanent AF [6]. Clinicians often perceive non-paroxysmal AF to carry a higher stroke risk compared to paroxysmal AF due to prolonged exposure to the embolic-prone state. Notably, the widely accepted CHA2DS2-VASc scoring system does not account for AF patterns. Recent studies have increasingly emphasized the impact of AF patterns on stroke risk among AF patients. For instance, Vanassche et al. [25] conducted an analysis involving a large cohort of non-anticoagulated AF patients, revealing AF pattern as an independent predictor of stroke risk. They found that the incidence of embolic events was significantly higher in persistent AF (3.0%/year) and permanent AF (4.2%/year) compared to paroxysmal AF (2.1%/year). Similarly, an analysis of the ARISTOTLE database demonstrated significantly lower stroke and embolism rates in paroxysmal AF compared to persistent or permanent AF [26]. Moreover, a meta-analysis involving 99,996 AF patients reported a higher incidence of thromboembolic events among patients with non-paroxysmal AF (HR = 1.38, P < 0.001) [27]. Consistently, several studies have independently concluded that non-paroxysmal AF serves as an independent risk factor for LAT/SEC in NVAF patients [13, 20, 28, 29]. These findings underscore the importance of considering non-paroxysmal AF in predicting both stroke and LAT/SEC.

Hyperuricemia emerged as a more significant predictor than diabetes and IS/TE in our final model. Numerous studies have linked hyperuricemia to endothelial or endocardial dysfunction caused by free radicals, leading to excessive proinflammatory effects [30, 31]. Previous research has established hyperuricemia as an independent risk factor for stroke [3234]. Furthermore, studies have suggested that hyperuricemia may independently predict and refine the risk of left atrial stasis among NVAF patients, particularly those with a CHA2DS2-VASc score < 2 [14]. Consistent with our findings, the prevalence of hyperuricemia was significantly higher in the LAT/SEC group (45.37%) compared to the control group (25.57%). These findings collectively emphasize the importance of hyperuricemia as a critical factor in predicting LAT/SEC in NVAF patients.

Nevertheless, our study is subject to several limitations. Firstly, data regarding anticoagulation in NVAF patients were not available, potentially affecting the accuracy of the final prediction model. Secondly, the sample size was relatively small, and data were collected from a single institution, limiting the generalizability of the findings. Moreover, since our study was not conducted across multiple centers, its applicability to broader populations may be constrained. Additionally, despite achieving high consistency in the repeatability analysis within the training and testing sets, the possibility of segmentation uncertainty introduces potential errors.

Conclusions

The new model represents a potentially non-invasive and effective approach for predicting the risk of LAT/SEC in NVAF patients. In practical terms, it holds promise for clinical utility by aiding clinicians in identifying high-risk patients with LAT/SEC, thereby enabling facilitating targeted diagnostic evaluations and the formulation of personalized treatment strategies based on the results.

Supporting information

S1 Fig. KS statistic plot in predicting LAT/SEC risk.

https://doi.org/10.1371/journal.pone.0313562.s001

(TIF)

S2 Fig. Feature attributions in SHAP of 9 ML classification models in predicting LAT/SEC risk.

https://doi.org/10.1371/journal.pone.0313562.s002

(TIF)

References

  1. 1. January C.T., et al., 2019 AHA/ACC/HRS Focused Update of the 2014 AHA/ACC/HRS Guideline for the Management of Patients With Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society in Collaboration With the Society of Thoracic Surgeons. Circulation, 2019. 140(2): p. e125–e151. pmid:30686041
  2. 2. Chugh S.S., et al., Worldwide epidemiology of atrial fibrillation: a Global Burden of Disease 2010 Study. Circulation, 2014. 129(8): p. 837–47. pmid:24345399
  3. 3. Wang L., et al., Trends of global burden of atrial fibrillation/flutter from Global Burden of Disease Study 2017. Heart, 2021. 107(11): p. 881–887. pmid:33148545
  4. 4. Ball J., et al., Atrial fibrillation: profile and burden of an evolving epidemic in the 21st century. Int J Cardiol, 2013. 167(5): p. 1807–24. pmid:23380698
  5. 5. Caliskan E., et al., Interventional and surgical occlusion of the left atrial appendage. Nat Rev Cardiol, 2017. 14(12): p. 727–743. pmid:28795688
  6. 6. Joglar J.A., et al., 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of Atrial Fibrillation: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation, 2024. 149(1): p. e1–e156. pmid:38033089
  7. 7. Fatkin D., Kelly R.P., and Feneley M.P., Relations between left atrial appendage blood flow velocity, spontaneous echocardiographic contrast and thromboembolic risk in vivo. J Am Coll Cardiol, 1994. 23(4): p. 961–9. pmid:8106703
  8. 8. Gianni C., et al., Transesophageal Echocardiography Following Left Atrial Appendage Electrical Isolation: Diagnostic Pitfalls and Clinical Implications. Circ Arrhythm Electrophysiol, 2022. 15(6): p. e010975. pmid:35617267
  9. 9. Greener J.G., et al., A guide to machine learning for biologists. Nat Rev Mol Cell Biol, 2022. 23(1): p. 40–55. pmid:34518686
  10. 10. Handelman G.S., et al., eDoctor: machine learning and the future of medicine. J Intern Med, 2018. 284(6): p. 603–619. pmid:30102808
  11. 11. Çakır O.M., Low vitamin D levels predict left atrial thrombus in nonvalvular atrial fibrillation. Nutr Metab Cardiovasc Dis, 2020. 30(7): p. 1152–1160. pmid:32456946
  12. 12. Han D., et al., Determinants of left atrial thrombus or spontaneous echo contrast in nonvalvular atrial fibrillation. Thromb Res, 2020. 195: p. 233–237. pmid:32799130
  13. 13. Li Z., et al., Nomogram to Predict Left Atrial Thrombus or Spontaneous Echo Contrast in Patients With Non-valvular Atrial Fibrillation. Front Cardiovasc Med, 2021. 8: p. 737551. pmid:34722669
  14. 14. Liu F.Z., et al., Predictive effect of hyperuricemia on left atrial stasis in non-valvular atrial fibrillation patients. Int J Cardiol, 2018. 258: p. 103–108. pmid:29467096
  15. 15. Wang Z., et al., The Prognostic Nutritional Index May Predict Left Atrial Appendage Thrombus or Dense Spontaneous Echo Contrast in Patients With Atrial Fibrillation. Front Cardiovasc Med, 2022. 9: p. 860624. pmid:35571156
  16. 16. Zeng D., et al., A nomogram for predicting left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients using hemodynamic parameters from transthoracic echocardiography. Front Cardiovasc Med, 2024. 11: p. 1337853. pmid:38390444
  17. 17. Romero J., et al., Cardiac imaging for assessment of left atrial appendage stasis and thrombosis. Nat Rev Cardiol, 2014. 11(8): p. 470–80. pmid:24913058
  18. 18. Borghi C., et al., Expert consensus for the diagnosis and treatment of patients with hyperuricemia and high cardiovascular risk: 2023 update. Cardiol J, 2024. 31(1): p. 1–14. pmid:38155566
  19. 19. Angebrandt Belošević P., et al., Left Ventricular Ejection Fraction Can Predict Atrial Thrombosis Even in Non-High-Risk Individuals with Atrial Fibrillation. J Clin Med, 2022. 11(14). pmid:35887729
  20. 20. Lin W.D., et al., Left atrial enlargement and non-paroxysmal atrial fibrillation as risk factors for left atrial thrombus/spontaneous Echo contrast in patients with atrial fibrillation and low CHA(2)DS(2)-VASc score. J Geriatr Cardiol, 2020. 17(3): p. 155–159. pmid:32280332
  21. 21. Liu K., et al., Retrospective Study of 1255 Non-Anticoagulated Patients with Nonvalvular Atrial Fibrillation to Determine the Risk of Ischemic Stroke Associated with Left Atrial Spontaneous Echo Contrast on Transesophageal Echocardiography. Med Sci Monit, 2021. 27: p. e934795. pmid:34893576
  22. 22. Segan L., et al., Identifying Patients at High Risk of Left Atrial Appendage Thrombus Before Cardioversion: The CLOTS-AF Score. J Am Heart Assoc, 2023. 12(12): p. e029259. pmid:37301743
  23. 23. Kamili K., et al., Predictive value of lipoprotein(a) for left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients with low CHA(2)DS(2)-VASc scores: a cross-sectional study. Lipids Health Dis, 2024. 23(1): p. 22. pmid:38254171
  24. 24. Song Z., et al., A Study of Cardiogenic Stroke Risk in Non-valvular Atrial Fibrillation Patients. Front Cardiovasc Med, 2020. 7: p. 604795. pmid:33244472
  25. 25. Vanassche T., et al., Risk of ischaemic stroke according to pattern of atrial fibrillation: analysis of 6563 aspirin-treated patients in ACTIVE-A and AVERROES. Eur Heart J, 2015. 36(5): p. 281–7a. pmid:25187524
  26. 26. Al-Khatib S.M., et al., Outcomes of apixaban vs. warfarin by type and duration of atrial fibrillation: results from the ARISTOTLE trial. Eur Heart J, 2013. 34(31): p. 2464–71. pmid:23594592
  27. 27. Ganesan A.N., et al., The impact of atrial fibrillation type on the risk of thromboembolism, mortality, and bleeding: a systematic review and meta-analysis. Eur Heart J, 2016. 37(20): p. 1591–602. pmid:26888184
  28. 28. Shi S., et al., Left Atrial Thrombus in Patients With Non-valvular Atrial Fibrillation: A Cross-Sectional Study in China. Front Cardiovasc Med, 2022. 9: p. 827101. pmid:35586655
  29. 29. Uziȩbło-Życzkowska B., et al., Risk factors for left atrial thrombus in younger patients (aged < 65 years) with atrial fibrillation or atrial flutter: Data from the multicenter left atrial thrombus on transesophageal echocardiography (LATTEE) registry. Front Cardiovasc Med, 2022. 9: p. 973043.
  30. 30. Papežíková I., et al., Uric acid modulates vascular endothelial function through the down regulation of nitric oxide production. Free Radic Res, 2013. 47(2): p. 82–8. pmid:23136942
  31. 31. Zhao J., et al., Red blood cell distribution width and left atrial thrombus or spontaneous echo contrast in patients with non-valvular atrial fibrillation. Int J Cardiol, 2015. 180: p. 63–5. pmid:25438214
  32. 32. Bos M.J., et al., Uric acid is a risk factor for myocardial infarction and stroke: the Rotterdam study. Stroke, 2006. 37(6): p. 1503–7. pmid:16675740
  33. 33. Heo S.H. and Lee S.H., High levels of serum uric acid are associated with silent brain infarction. J Neurol Sci, 2010. 297(1–2): p. 6–10. pmid:20674933
  34. 34. Hozawa A., et al., Serum uric acid and risk of ischemic stroke: the ARIC Study. Atherosclerosis, 2006. 187(2): p. 401–7. pmid:16239005