Machine learning-based model for predicting recanalization in isolated distal deep vein thrombosis and analysis of predictors

Yingjie Kuang; Jun Zhang; Zhen An; Chunxu Yang; Wenxu Guo; Xiaomin Liu; Yue Zhang

doi:10.1371/journal.pone.0349110

Abstract

Background

Isolated distal deep vein thrombosis (IDDVT) is common, yet tools for predicting poor recanalization remain limited. We aimed to develop and compare machine learning models for predicting poor recanalization in patients with IDDVT and to identify the most informative predictors.

Methods

A total of 1600 patients with IDDVT were retrospectively enrolled. The dataset was randomly divided into a development set (n = 1280) and an independent test set (n = 320) using stratified sampling. Six predictive models were developed and compared: logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and a Voting Ensemble. Model training and hyperparameter tuning were performed in the development set using five-fold stratified cross-validation, and optimal classification thresholds were determined using the Youden index. Model performance was evaluated by discrimination, calibration, and classification metrics, with 95% confidence intervals estimated by bootstrap resampling (10,000 iterations). SHAP analysis was applied to interpret the final model.

Results

In the independent test set, all models showed acceptable to strong discrimination, with AUC values ranging from 0.808 to 0.908. XGBoost achieved the best overall performance, with an optimal threshold of 0.183, an AUC of 0.908 (95% CI, 0.855–0.952), a Brier score of 0.077 (95% CI, 0.058–0.096), an accuracy of 0.900 (95% CI, 0.866–0.931), a precision of 0.650 (95% CI, 0.529–0.767), a recall of 0.803 (95% CI, 0.686–0.906), an F1-score of 0.717 (95% CI, 0.615–0.806), and a specificity of 0.918 (95% CI, 0.884–0.950). The calibration intercept and slope of the XGBoost model were 0.149 (95% CI, −0.192 to 0.454) and 1.410 (95% CI, 1.098–1.809), respectively, indicating acceptable overall calibration. SHAP analysis identified D-dimer rate, provoking-factor-related variables, anticoagulant use, and age group as the most influential predictors.

Conclusion

Among six candidate models, XGBoost showed the best overall performance for predicting poor recanalization in patients with IDDVT. This study establishes an interpretable machine learning-based prediction framework focused specifically on poor recanalization in IDDVT and highlights the contribution of dynamic laboratory information, particularly D-dimer rate. The model may support early risk stratification and individualized follow-up planning, but external validation is required before routine clinical implementation.

Citation: Kuang Y, Zhang J, An Z, Yang C, Guo W, Liu X, et al. (2026) Machine learning-based model for predicting recanalization in isolated distal deep vein thrombosis and analysis of predictors. PLoS One 21(5): e0349110. https://doi.org/10.1371/journal.pone.0349110

Editor: Yufeng Zhou, Chongqing Medical University, CHINA

Received: November 21, 2025; Accepted: April 25, 2026; Published: May 8, 2026

Copyright: © 2026 Kuang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The code used in this study has been publicly deposited and can be accessed via the following DOI: 10.5281/zenodo.18988463. The data used in this study cannot be made publicly available because public sharing has not been approved by the Ethics Committee of Shandong University of Traditional Chinese Medicine Affiliated Hospital. De-identified data may be made available upon reasonable request from qualified researchers, subject to ethical review and approval. Requests for data access should be directed to the Ethics Committee of Shandong University of Traditional Chinese Medicine Affiliated Hospital at sdzyethics@163.com.

Funding: This study was supported through the Shandong Province Medical and Health Science and Technology Project (Grant No. 202304111444).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Isolated distal deep vein thrombosis (IDDVT), a subtype of deep vein thrombosis (DVT), is defined as thrombosis occurring in the infra-popliteal veins, including the anterior tibial, posterior tibial, peroneal veins, and the muscular venous plexus [1]. Following the onset of venous thrombosis, the successful restoration of vascular recanalization and the reduction of residual vein occlusion are critical factors for clinical recovery. Prompt recanalization can effectively reduce the risk of thrombus recurrence, pulmonary embolism (PE), and post-thrombotic syndrome (PTS) [2]. Given that the risk profile of IDDVT differs from that of proximal deep vein thrombosis (PDVT), IDDVT is more strongly associated with transient provoking factors, such as recent surgery, immobilization of the affected limb, or long-distance travel, whereas PDVT is more closely linked to persistent provoking factors, including active cancer, congestive heart failure, respiratory insufficiency, and advanced age (>75 years) [3,4]. Therefore, findings derived from PDVT studies cannot be directly extrapolated to IDDVT. Previous studies have shown that the incidence of PTS after IDDVT may be as high as 17%, and poor venous recanalization after thrombosis is strongly associated with PTS [5]. Accordingly, studies specifically aimed at evaluating recanalization outcomes in IDDVT are warranted.

Clinical prediction models are widely used for the prevention and management of DVT. These models assess DVT risk factors and play a crucial role in reducing the incidence of initial thrombosis, preventing recurrence, and mitigating PTS [6–8]. The Wells score is the most widely used clinical prediction tool and was originally developed to estimate the probability of PE and DVT; however, its diagnostic accuracy for IDDVT is relatively limited [9]. Therefore, it is not well suited for assessing the risk of IDDVT. Existing prediction studies focusing on IDDVT have mainly addressed the occurrence of IDDVT itself or the development of pulmonary embolism, whereas relatively few studies have used venous recanalization in IDDVT as the primary endpoint [10–12]. Compared with traditional multivariable regression, machine learning methods may be better suited to clinical prediction tasks involving complex nonlinear associations, higher-order interactions, and heterogeneous predictors. In IDDVT, where recanalization is likely influenced by multiple interrelated clinical and laboratory factors, machine learning may provide complementary value for individualized risk prediction.

Accordingly, this study aimed to retrospectively analyze patient data to develop a machine learning-based model for predicting poor recanalization in IDDVT, identify the most informative predictors, and provide supportive evidence for clinical decision-making.

Methods

Study dataset construction

Data source and exclusion criteria.

All data in this study were obtained from the Affiliated Hospital of Shandong University of Traditional Chinese Medicine. The study cohort comprised patients diagnosed with IDDVT between January 1, 2020 and October 31, 2023. Exclusion criteria comprised: (1) concomitant venous thrombosis at other sites; (2) interrupted treatment courses; and (3) incomplete medical documentation.

Sample size determination.

This study was a retrospective clinical prediction model development study using machine learning algorithms. Sample size considerations were informed by published methodological guidance for clinical prediction modeling and by the TRIPOD+AI reporting recommendations [13]. Based on the anticipated poor recanalization rate, the number of candidate predictors, and the planned model complexity, the available sample size was considered adequate for model development. Based on the preliminary poor recanalization rate (14.8%), 20 candidate predictors, and the expected model discrimination, the minimum required sample size for model development was estimated to be approximately 679 patients. Given the greater complexity of the machine learning algorithms planned in this study, the actual sample size requirement is generally higher than the minimum required for traditional regression models. Accordingly, all 1,600 eligible patients with IDDVT were included in the final analysis.

Study design

The overall workflow of data splitting, model development, selection, and final evaluation is illustrated in Fig 1.

Download:

Fig 1. Workflow of machine learning model development, selection, and final evaluation.

https://doi.org/10.1371/journal.pone.0349110.g001

Group stratification.

Venous recanalization rates were determined using lower extremity venous Doppler ultrasound reports. Patients with a recanalization rate of <50% were assigned to the poor recanalization group, whereas those with a recanalization rate of ≥50% were classified into the good recanalization group. The detailed methodology for calculating the recanalization rate is provided in S1 File. The poor recanalization group was coded as 1 and the good recanalization group as 0. For comparisons of baseline characteristics between groups, categorical variables were presented as counts (percentages) and analyzed using the chi-square test, with Fisher’s exact test applied when appropriate. Continuous variables were summarized and compared according to their distributional characteristics: variables with approximately normal distributions were presented as mean ± standard deviation and compared using Welch’s t-test, whereas non-normally distributed variables were presented as median (interquartile range) and compared using the Mann–Whitney U test. All tests were two-sided, and a P value < 0.05 was considered statistically significant. Patients in each group (poor recanalization and good recanalization) were randomly allocated to the development and test sets in an 80:20 ratio, while preserving the class distribution within each group.

Candidate predictors.

Candidate predictors were selected based on the published literature, clinical experience, and research team consensus, including sex, age, body mass index (BMI), thrombus location (left-sided, right-sided, or bilateral), outpatient/inpatient status, family history of VTE, and personal history of VTE. Additional factors included provoking factor type, international normalized ratio (INR), platelet count (PLT), fibrinogen (FIB), D-dimer, C-reactive protein (CRP), the rate of change in D-dimer levels, and anticoagulant therapy use. In our center, anticoagulant therapy for patients with IDDVT mainly consisted of subcutaneous low-molecular-weight heparin and oral rivaroxaban. In total, 20 candidate predictors were included in the analysis. Detailed calculation methods for some included predictors are provided in S2 File.

Data preprocessing.

Because several variables had missing values and the available complete-case sample remained sufficiently large for model development, a complete-case analysis was performed. Continuous variables were standardized to improve comparability across different measurement scales and value ranges, with all preprocessing performed within the model pipeline to avoid data leakage. Specifically, preprocessing parameters were fitted only on the training data during model development and then applied to the corresponding validation and test data. Categorical variables were one-hot encoded. Collinearity diagnostics were performed for all candidate predictors, with detailed results provided in S3 File. The results showed strong correlations among some variables, mainly among mutually exclusive dummy variables derived from the same multicategory variable, such as those representing provoking factor type and lesion location. This finding primarily reflects the structural characteristics of one-hot encoding rather than abnormal overlap among independent clinical factors. Accordingly, reference-category coding was used in the logistic regression model to reduce the influence of multicollinearity on parameter estimation and interpretation. Specifically, transient provoking factors were specified as the reference category for provoking factor type, and left-sided location as the reference category for lesion location, while the remaining categories were included as dummy variables.

Predictive model construction.

To compare algorithms with different modeling characteristics and levels of complexity, six predictive models were selected in this study. Logistic regression (LR) was included as a conventional and interpretable baseline model; support vector machine (SVM) was used because of its ability to handle complex decision boundaries; random forest (RF) and extreme gradient boosting (XGBoost) were included as tree-based ensemble methods capable of capturing nonlinear relationships and feature interactions; multilayer perceptron (MLP) was used as a neural-network-based approach; and a Voting Ensemble was included to examine whether combining multiple models could provide complementary predictive value. Based on these considerations, six machine learning algorithms were applied, including LR, XGBoost, RF, MLP, SVM, and a Voting Ensemble. Within the development set, five-fold stratified cross-validation was used for hyperparameter tuning and internal performance assessment. The independent test set was reserved exclusively for final model evaluation. For each model, hyperparameters were optimized within the development set, and the classification threshold was determined using the Youden index and then fixed for evaluation in the independent test set. Model performance was subsequently assessed in the test set.

Model performance evaluation.

Model performance was assessed using sensitivity, specificity, F1-score, accuracy, the area under the receiver operating characteristic curve (AUC), and the Brier score. Discrimination was primarily assessed using the AUC, with higher values indicating better ability to distinguish between patients with and without poor recanalization [14]. SHapley Additive exPlanations (SHAP) analysis was used to visualize feature importance and direction of association through SHAP summary plots. For the final selected model, SHAP values were computed using the independent test set after applying the preprocessing pipeline fitted during model development. This method improved model interpretability and provided intuitive explanations for individual predictions.

Ethics

This retrospective analysis utilized fully de-identified data, containing no personally identifiable information. The study protocol was approved by the Institutional Review Board of the Affiliated Hospital of Shandong University of Traditional Chinese Medicine (Approval No. 2024021-YJS). The requirement for informed consent was waived by the ethics committee due to the retrospective nature of the study and use of anonymized data.

Model development environment

All analyses were performed in a Jupyter Notebook environment using Python 3.11.5 for data processing, model development, and statistical analysis. The main Python packages used were pandas (2.1.4), numpy (1.24.3), scikit-learn (1.1.3), xgboost (1.7.3), shap (0.46.0), matplotlib (3.7.2), scipy (1.15.3), and statsmodels (0.14.6).

Results

Baseline characteristics of the study cohort

A total of 2,352 IDDVT patient records were initially screened for this study, among which 752 cases were excluded due to missing data. Ultimately, 1,600 IDDVT patients were included in the final analysis (Fig 2). Of these, 1,347 patients (84.2%) achieved good recanalization at the 1-month follow-up, while 253 patients (15.8%) showed poor recanalization outcomes. The baseline characteristics of the included patients are summarized in Table 1. The baseline characteristics were generally comparable between the development and test sets, with no marked imbalance observed, indicating that the random stratified split was reasonable (S1 Table).

Download:

Table 1. Baseline characteristics according to recanalization outcome.

https://doi.org/10.1371/journal.pone.0349110.t001

Download:

Fig 2. Flowchart of patient selection.

https://doi.org/10.1371/journal.pone.0349110.g002

Model development

This study developed and evaluated six prediction models, including LR, SVM, RF, MLP, XGBoost, and a Voting Ensemble. All models were trained and hyperparameter-optimized in the development set using five-fold stratified cross-validation, and the optimal classification threshold for each model was determined according to the Youden index. Because poor recanalization events were relatively infrequent in the study cohort, classification thresholds were determined within the development set to achieve an appropriate balance between sensitivity and specificity before being fixed for evaluation in the independent test set.

In the independent test set, the AUCs of the models ranged from 0.808 to 0.908, indicating overall good discrimination (Fig 3). XGBoost achieved the best performance, with an optimal threshold of 0.183, an AUC of 0.908 (95% CI: 0.855–0.952), a Brier score of 0.077 (95% CI: 0.058–0.096), an accuracy of 0.900 (95% CI: 0.866–0.931), a precision of 0.650 (95% CI: 0.529–0.767), a recall of 0.803 (95% CI: 0.686–0.906), an F1-score of 0.717 (95% CI: 0.615–0.806), and a specificity of 0.918 (95% CI: 0.884–0.950). SVM and LR also performed favorably; SVM showed better balance in classification performance, with an F1-score of 0.718 (95% CI: 0.614–0.808), whereas LR achieved a higher recall of 0.822 (95% CI: 0.712–0.922). RF and the Voting Ensemble showed intermediate performance, whereas MLP demonstrated relatively weaker overall performance (Table 2). The detailed classification results are further illustrated by the confusion matrices of all candidate models in the development and independent test sets, which are provided in S1 and S2 Figs, respectively.

Download:

Table 2. Performance of machine learning models for predicting poor recanalization in the test set.

https://doi.org/10.1371/journal.pone.0349110.t002

Download:

Fig 3. Receiver operating characteristic curves of the candidate models.

https://doi.org/10.1371/journal.pone.0349110.g003

Calibration curve analysis showed that the predicted probabilities of the compared models were generally in good agreement with the observed event rates, although some differences were noted across models (Fig 4). Together with its lower Brier score, XGBoost demonstrated favorable overall calibration performance. Further calibration assessment showed that the calibration intercept of XGBoost was 0.149 (95% CI: −0.192 to 0.454), which was close to the ideal value of 0, indicating no evident systematic overestimation or underestimation of risk. The calibration slope was 1.410 (95% CI: 1.098–1.809), which was higher than the ideal value of 1, suggesting that the predicted probabilities may have been insufficiently extreme overall. Overall, XGBoost achieved acceptable calibration while maintaining strong discriminative ability.

Download:

Fig 4. Calibration curves of the prediction models.

https://doi.org/10.1371/journal.pone.0349110.g004

Overall, XGBoost demonstrated the best performance in terms of discrimination, calibration, and overall classification performance, and was therefore selected as the optimal final model.

Analysis of important predictors

To interpret the prediction mechanism of the best-performing XGBoost model, SHAP was further used to evaluate the contribution of each predictor to the model output in the independent test set. The SHAP summary plot (Fig 5) showed that the D-dimer rate, variables representing provoking factor type, age group, and anticoagulant therapy were the main contributors to model predictions. In our analysis, poor recanalization was coded as 1 and good recanalization as 0. Therefore, positive SHAP values indicate that a feature shifts the model output toward class 1 (poor recanalization), whereas negative SHAP values indicate a shift toward class 0 (good recanalization). Specifically, a lower D-dimer rate was associated with a greater tendency toward predicted poor recanalization; persistent provoking factors, multiple provoking factors, and older age groups tended to shift the model toward poor recanalization, whereas transient provoking factors and anticoagulant therapy tended to shift the model toward good recanalization. It should be noted that the variables representing provoking factor type were derived from one-hot encoding of the same multicategory variable. Therefore, their SHAP values reflect the relative predictive contribution of each category within the overall encoding framework and should not be directly interpreted as independent risk effects.

Download:

Fig 5. SHAP summary plot of predictor contributions in the independent test set for the final XGBoost Model.

https://doi.org/10.1371/journal.pone.0349110.g005

Discussion

In this study, six models were developed and compared for predicting the risk of poor recanalization in patients with IDDVT, including logistic regression, support vector machine, random forest, multilayer perceptron, XGBoost, and a Voting Ensemble. Among the candidate models, XGBoost achieved the best overall performance in the independent test set. Specifically, it showed the highest discriminative ability, the lowest Brier score, and acceptable calibration, indicating a favorable balance between classification performance and probability estimation. In addition, SHAP-based interpretation enhanced the transparency of the final model by identifying the key variables driving the predictions. Taken together, these findings support the potential utility of machine learning approaches, particularly XGBoost, in predicting the risk of poor recanalization in patients with IDDVT.

SHAP analysis further enhanced the interpretability of the best-performing XGBoost model and identified the key predictors contributing most to the model output. The results showed that the D-dimer rate, variables representing provoking factor type, anticoagulant therapy, and age group were the main drivers of model predictions. These findings are clinically plausible. Dynamic changes in D-dimer may reflect persistent thrombus burden and fibrinolytic activity and may therefore provide more information relevant to recanalization than a single baseline measurement [15,16]. Variables representing provoking factor type may capture differences in the thrombotic background and the persistence of risk exposure, both of which may substantially affect prognosis [17,18]. Likewise, age and anticoagulant therapy are closely related to thrombus resolution and treatment response, which may explain their high importance in the model. In addition, the baseline statistical analysis showed that older age, recurrent VTE, provoking factor type, anticoagulant therapy, and D-dimer rate differed significantly between the good and poor recanalization groups. These findings are broadly consistent with the predictors highlighted by the final model and support the clinical relevance of these variables. In particular, a history of recurrent VTE may reflect an underlying prothrombotic tendency or a more complex disease background, which could be associated with less favorable thrombus resolution. Nevertheless, these associations should be interpreted cautiously, because the present study was designed primarily for prediction rather than for causal inference. In addition, it should be emphasized that SHAP results reflect the relative contribution of features to model predictions rather than evidence of independent causal effects. Because some overlap was observed in the SHAP distributions of these variables, their roles should be interpreted as joint contributions to model prediction behavior within the overall encoding framework, rather than as mutually independent causal effects. Accordingly, the principal value of SHAP analysis lies in facilitating interpretation of the model’s overall predictive logic, rather than directly establishing causal relationships between individual predictors and recanalization outcomes.

Another important issue is the feasibility of implementing the model in real-world clinical settings. In the final model, the D-dimer rate emerged as one of the most important predictors, suggesting that serial changes in coagulation-related biomarkers may carry additional prognostic information in this cohort. Previous studies have reported a positive linear association between dynamic changes in D-dimer levels and thrombus resolution rate and have suggested their potential value in predicting recanalization outcomes in patients with pulmonary embolism [19,20]. However, this variable depends on serial D-dimer measurements rather than a single baseline measurement, and serial D-dimer testing has not been uniformly incorporated into routine clinical practice. In addition, institutions may differ in sampling time points, testing frequency, and laboratory procedures, which could affect the availability and consistency of this predictor in real-world settings. Therefore, this study should be viewed as exploratory, aiming to evaluate the predictive value of the D-dimer rate in this cohort rather than to demonstrate its direct applicability across all clinical settings. Before wider implementation, multicenter external validation under different testing conditions is still required, and simplified models that do not rely on serial D-dimer-related indicators should be further explored.

From a clinical perspective, the main potential value of this model lies in enabling early risk stratification for poor recanalization in patients with IDDVT, rather than functioning as a standalone tool for treatment decision-making. By identifying patients at higher risk of poor recanalization, the model may assist in tailoring follow-up intensity and monitoring strategies. Given the clinical characteristics of IDDVT, many guidelines support anticoagulation in selected patients, although uncertainties remain regarding the optimal timing of treatment initiation and the most appropriate treatment strategy in specific clinical scenarios [21–23]. Accordingly, in clinical practice, the model developed in this study may support closer ultrasound surveillance, more careful evaluation of treatment response, or earlier specialist reassessment for high-risk patients, whereas a relatively routine follow-up strategy may be considered for those at lower risk.

Previous studies have primarily focused on proximal DVT and the occurrence of PE. As a distinct subtype of lower-extremity DVT, IDDVT has historically received less attention because of its relatively mild symptoms. However, with the growing body of research in this field and advances in diagnostic methods, IDDVT has attracted increasing attention owing to its high incidence and risk of adverse outcomes [5,24]. The present study focused on poor recanalization in IDDVT as a clinically relevant outcome, developed a risk prediction model using machine learning algorithms, and provided an interpretable modeling framework while incorporating dynamic laboratory indicators. In this respect, it may offer a useful complement to previous DVT-related research.

Several limitations of this study should be noted. First, this was a single-center retrospective study. Although an independent test set was used for validation, true external validation is still lacking. Therefore, the generalizability of the model to populations with different demographic and clinical characteristics remains to be established, including younger patients and those receiving different anticoagulation regimens. Second, as an observational study, this analysis could not completely eliminate the possibility of residual confounding. For example, although early ambulation in outpatients may facilitate thrombus recanalization [25,26], outpatient status and some treatment-related variables may also partly reflect milder disease severity, lower thrombus burden, or clinical decision-making processes, rather than direct biological effects on recanalization. Third, although SHAP analysis improved model interpretability, overlap in the SHAP distributions of some variables was observed, suggesting that feature effects at the individual level are not always fully separable. Accordingly, these findings should be interpreted as reflecting the relative contribution of features to model predictions rather than as evidence of causal relationships. In addition, the use of complete-case analysis may have introduced selection bias, because patients with missing data were excluded from model development and evaluation. Finally, although the model showed generally stable performance and a certain degree of robustness, further model simplification, multicenter external validation, and prospective evaluation are still required before wider clinical implementation. Future studies should focus on assessing model performance across heterogeneous clinical workflows and on exploring simplified or dynamically updated prediction models to improve real-world applicability.

Conclusion

Among six candidate models, XGBoost showed the best overall performance for predicting poor recanalization in patients with IDDVT, with strong discrimination (AUC 0.908), low prediction error (Brier score 0.077), acceptable calibration, and good interpretability through SHAP analysis. This study establishes an interpretable machine learning framework specifically for poor recanalization risk prediction in IDDVT and highlights the predictive contribution of dynamic laboratory information, particularly D-dimer rate. The model may support early risk stratification and individualized follow-up planning, although multicenter external validation is still required before routine clinical implementation.

Supporting information

S1 Fig. Confusion matrices of all candidate models in the development set.

https://doi.org/10.1371/journal.pone.0349110.s001

(PNG)

S2 Fig. Confusion matrices of all candidate models in the independent test set.

https://doi.org/10.1371/journal.pone.0349110.s002

(PNG)

S1 File. Method for calculating the venous recanalization rate.

https://doi.org/10.1371/journal.pone.0349110.s003

(PDF)

S2 File. Calculation methods for selected predictors.

https://doi.org/10.1371/journal.pone.0349110.s004

(PDF)

S3 File. Collinearity diagnostics for all candidate predictors and for predictors included in the logistic regression model.

https://doi.org/10.1371/journal.pone.0349110.s005

(PDF)

S1 Table. Comparison of baseline characteristics between the development and test sets.

https://doi.org/10.1371/journal.pone.0349110.s006

(PDF)

S2 Table. List of abbreviations used in the manuscript.

https://doi.org/10.1371/journal.pone.0349110.s007

(PDF)

References

1. Palareti G, Schellong S. Isolated distal deep vein thrombosis: what we know and what we are doing. J Thromb Haemost. 2012;10(1):11–9. pmid:22082302
- View Article
- PubMed/NCBI
- Google Scholar
2. Cosmi B, Legnani C, Cini M, Guazzaloca G, Palareti G. D-dimer and residual vein obstruction as risk factors for recurrence during and after anticoagulation withdrawal in patients with a first episode of provoked deep-vein thrombosis. Thromb Haemost. 2011;105(5):837–45. pmid:21359409
- View Article
- PubMed/NCBI
- Google Scholar
3. Schellong SM, Goldhaber SZ, Weitz JI, Ageno W, Bounameaux H, Turpie AGG, et al. Isolated distal deep vein thrombosis: perspectives from the GARFIELD-VTE registry. Thromb Haemost. 2019;119(10):1675–85. pmid:31370075
- View Article
- PubMed/NCBI
- Google Scholar
4. Galanaud J-P, Sevestre-Pietri M-A, Bosson J-L, Laroche J-P, Righini M, Brisot D, et al. Comparative study on risk factors and early outcome of symptomatic distal versus proximal deep vein thrombosis: results from the OPTIMEV study. Thromb Haemost. 2009;102(3):493–500. pmid:19718469
- View Article
- PubMed/NCBI
- Google Scholar
5. Turner BRH, Thapar A, Jasionowska S, Javed A, Machin M, Lawton R, et al. Systematic review and meta-analysis of the pooled rate of post-thrombotic syndrome after isolated distal deep venous thrombosis. Eur J Vasc Endovasc Surg. 2023;65(2):291–7. pmid:36257568
- View Article
- PubMed/NCBI
- Google Scholar
6. Jin S, Qin D, Liang B-S, Zhang L-C, Wei X-X, Wang Y-J, et al. Machine learning predicts cancer-associated deep vein thrombosis using clinically available variables. Int J Med Inform. 2022;161:104733. pmid:35299099
- View Article
- PubMed/NCBI
- Google Scholar
7. Hwang JH, Seo JW, Kim JH, Park S, Kim YJ, Kim KG. Comparison between deep learning and conventional machine learning in classifying iliofemoral deep venous thrombosis upon CT venography. Diagnostics (Basel). 2022;12(2):274. pmid:35204365
- View Article
- PubMed/NCBI
- Google Scholar
8. Yu T, Shen R, You G, Lv L, Kang S, Wang X, et al. Machine learning-based prediction of the post-thrombotic syndrome: model development and validation study. Front Cardiovasc Med. 2022;9:990788. pmid:36186967
- View Article
- PubMed/NCBI
- Google Scholar
9. Sartori M, Gabrielli F, Favaretto E, Filippini M, Migliaccio L, Cosmi B. Proximal and isolated distal deep vein thrombosis and Wells score accuracy in hospitalized patients. Intern Emerg Med. 2019;14(6):941–7. pmid:30864093
- View Article
- PubMed/NCBI
- Google Scholar
10. Xu Y, Xu M, Zheng X, Jin F, Meng B. Generation of a predictive clinical model for isolated distal deep vein thrombosis (ICMVT) detection. Med Sci Monit. 2023;29:e942840. pmid:38160251
- View Article
- PubMed/NCBI
- Google Scholar
11. Chen X, Li X, Yang Z, Feng Y, Guo J, Yi F, et al. Thrombus size matters: a risk assessment model for predicting pulmonary embolism in isolated distal deep vein thrombosis. Acad Radiol. 2025;32(9):5545–56. pmid:40393831
- View Article
- PubMed/NCBI
- Google Scholar
12. Cheng H-R, Huang G-Q, Wu Z-Q, Wu Y-M, Lin G-Q, Song J-Y, et al. Individualized predictions of early isolated distal deep vein thrombosis in patients with acute ischemic stroke: a retrospective study. BMC Geriatr. 2021;21(1):140. pmid:33632136
- View Article
- PubMed/NCBI
- Google Scholar
13. Riley RD, Snell KIE, Archer L, Ensor J, Debray TPA, van Calster B, et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ. 2024;384:e074821. pmid:38253388
- View Article
- PubMed/NCBI
- Google Scholar
14. Pearce J, Ferrier S. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model. 2000;133(3):225–45.
- View Article
- Google Scholar
15. Konstantinides SV, Meyer G. The 2019 ESC Guidelines on the diagnosis and management of acute pulmonary embolism. Eur Heart J. 2019;40(42):3453–5.
- View Article
- Google Scholar
16. Madoiwa S, Kitajima I, Ohmori T, Sakata Y, Mimuro J. Distinct reactivity of the commercially available monoclonal antibodies of D-dimer and plasma FDP testing to the molecular variants of fibrin degradation products. Thromb Res. 2013;132(4):457–64. pmid:24011388
- View Article
- PubMed/NCBI
- Google Scholar
17. Ageno W, Farjat A, Haas S, Weitz JI, Goldhaber SZ, Turpie AGG, et al. Provoked versus unprovoked venous thromboembolism: Findings from GARFIELD-VTE. Res Pract Thromb Haemost. 2021;5(2):326–41. pmid:33733032
- View Article
- PubMed/NCBI
- Google Scholar
18. Iding AFJ, Pallares Robles A, Ten Cate V, Ten Cate H, Wild PS, Ten Cate-Hoek AJ. Exploring phenotypes of deep vein thrombosis in relation to clinical outcomes beyond recurrence. J Thromb Haemost. 2023;21(5):1238–47. pmid:36736833
- View Article
- PubMed/NCBI
- Google Scholar
19. Wang J, Zheng Y, Yu Y, Fan X, Xu S. Plasma D-dimer changes and clinical value in acute lower extremity deep venous thrombosis treated with catheter-directed thrombolysis. J Vasc Surg Venous Lymphat Disord. 2025;13(3):102167. pmid:39818303
- View Article
- PubMed/NCBI
- Google Scholar
20. Aranda C, Peralta L, Gagliardi L, López A, Jiménez Á, Herreros B. A significant decrease in D-dimer concentration within one month of anticoagulation therapy as a predictor of both complete recanalization and risk of recurrence after initial pulmonary embolism. Thromb Res. 2021;202:31–5. pmid:33711756
- View Article
- PubMed/NCBI
- Google Scholar
21. Stevens SM, Woller SC, Kreuziger LB, Bounameaux H, Doerschug K, Geersing GJ, et al. Antithrombotic therapy for VTE disease: second update of the CHEST Guideline and Expert Panel Report. Chest. 2021;160(6):e545–608.
- View Article
- Google Scholar
22. Mazzolai L, Ageno W, Alatri A, Bauersachs R, Becattini C, Brodmann M, et al. Second consensus document on diagnosis and management of acute deep vein thrombosis: updated document elaborated by the ESC Working Group on aorta and peripheral vascular diseases and the ESC Working Group on pulmonary circulation and right ventricular function. Eur J Prev Cardiol. 2022;29(8):1248–63. pmid:34254133
- View Article
- PubMed/NCBI
- Google Scholar
23. Kakkos SK, Gohel M, Baekgaard N, Bauersachs R, Bellmunt-Montoya S, Black SA, et al. Editor’s choice - European Society for Vascular Surgery (ESVS) 2021 Clinical Practice Guidelines on the Management of Venous Thrombosis. Eur J Vasc Endovasc Surg. 2021;61(1):9–82. pmid:33334670
- View Article
- PubMed/NCBI
- Google Scholar
24. Qiu T, Zhang T, Liu L, Li W, Li Q, Zhang X, et al. The anatomic distribution and pulmonary embolism complications of hospital-acquired lower extremity deep venous thrombosis. J Vasc Surg Venous Lymphat Disord. 2021;9(6):1391–1398.e3. pmid:33753301
- View Article
- PubMed/NCBI
- Google Scholar
25. Egermayer P. The effects of heparin and oral anticoagulants on thrombus propagation and prevention of the postphlebitic syndrome: a critical review of the literature. Prog Cardiovasc Dis. 2001;44(1):69–80. pmid:11533928
- View Article
- PubMed/NCBI
- Google Scholar
26. Rook B, van Rijn MJE, Jansma EP, van Montfrans C. Effect of exercise after a deep venous thrombosis: a systematic review. J Eur Acad Dermatol Venereol. 2024;38(2):289–301. pmid:37731155
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Palareti G, Schellong S. Isolated distal deep vein thrombosis: what we know and what we are doing. J Thromb Haemost. 2012;10(1):11–9. pmid:22082302
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Cosmi B, Legnani C, Cini M, Guazzaloca G, Palareti G. D-dimer and residual vein obstruction as risk factors for recurrence during and after anticoagulation withdrawal in patients with a first episode of provoked deep-vein thrombosis. Thromb Haemost. 2011;105(5):837–45. pmid:21359409
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Schellong SM, Goldhaber SZ, Weitz JI, Ageno W, Bounameaux H, Turpie AGG, et al. Isolated distal deep vein thrombosis: perspectives from the GARFIELD-VTE registry. Thromb Haemost. 2019;119(10):1675–85. pmid:31370075
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Galanaud J-P, Sevestre-Pietri M-A, Bosson J-L, Laroche J-P, Righini M, Brisot D, et al. Comparative study on risk factors and early outcome of symptomatic distal versus proximal deep vein thrombosis: results from the OPTIMEV study. Thromb Haemost. 2009;102(3):493–500. pmid:19718469
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Turner BRH, Thapar A, Jasionowska S, Javed A, Machin M, Lawton R, et al. Systematic review and meta-analysis of the pooled rate of post-thrombotic syndrome after isolated distal deep venous thrombosis. Eur J Vasc Endovasc Surg. 2023;65(2):291–7. pmid:36257568
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Jin S, Qin D, Liang B-S, Zhang L-C, Wei X-X, Wang Y-J, et al. Machine learning predicts cancer-associated deep vein thrombosis using clinically available variables. Int J Med Inform. 2022;161:104733. pmid:35299099
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Hwang JH, Seo JW, Kim JH, Park S, Kim YJ, Kim KG. Comparison between deep learning and conventional machine learning in classifying iliofemoral deep venous thrombosis upon CT venography. Diagnostics (Basel). 2022;12(2):274. pmid:35204365
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Yu T, Shen R, You G, Lv L, Kang S, Wang X, et al. Machine learning-based prediction of the post-thrombotic syndrome: model development and validation study. Front Cardiovasc Med. 2022;9:990788. pmid:36186967
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Sartori M, Gabrielli F, Favaretto E, Filippini M, Migliaccio L, Cosmi B. Proximal and isolated distal deep vein thrombosis and Wells score accuracy in hospitalized patients. Intern Emerg Med. 2019;14(6):941–7. pmid:30864093
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Xu Y, Xu M, Zheng X, Jin F, Meng B. Generation of a predictive clinical model for isolated distal deep vein thrombosis (ICMVT) detection. Med Sci Monit. 2023;29:e942840. pmid:38160251
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Chen X, Li X, Yang Z, Feng Y, Guo J, Yi F, et al. Thrombus size matters: a risk assessment model for predicting pulmonary embolism in isolated distal deep vein thrombosis. Acad Radiol. 2025;32(9):5545–56. pmid:40393831
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Cheng H-R, Huang G-Q, Wu Z-Q, Wu Y-M, Lin G-Q, Song J-Y, et al. Individualized predictions of early isolated distal deep vein thrombosis in patients with acute ischemic stroke: a retrospective study. BMC Geriatr. 2021;21(1):140. pmid:33632136
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Riley RD, Snell KIE, Archer L, Ensor J, Debray TPA, van Calster B, et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ. 2024;384:e074821. pmid:38253388
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Pearce J, Ferrier S. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model. 2000;133(3):225–45.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref15] 15. Konstantinides SV, Meyer G. The 2019 ESC Guidelines on the diagnosis and management of acute pulmonary embolism. Eur Heart J. 2019;40(42):3453–5.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref16] 16. Madoiwa S, Kitajima I, Ohmori T, Sakata Y, Mimuro J. Distinct reactivity of the commercially available monoclonal antibodies of D-dimer and plasma FDP testing to the molecular variants of fibrin degradation products. Thromb Res. 2013;132(4):457–64. pmid:24011388
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref17] 17. Ageno W, Farjat A, Haas S, Weitz JI, Goldhaber SZ, Turpie AGG, et al. Provoked versus unprovoked venous thromboembolism: Findings from GARFIELD-VTE. Res Pract Thromb Haemost. 2021;5(2):326–41. pmid:33733032
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Iding AFJ, Pallares Robles A, Ten Cate V, Ten Cate H, Wild PS, Ten Cate-Hoek AJ. Exploring phenotypes of deep vein thrombosis in relation to clinical outcomes beyond recurrence. J Thromb Haemost. 2023;21(5):1238–47. pmid:36736833
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref19] 19. Wang J, Zheng Y, Yu Y, Fan X, Xu S. Plasma D-dimer changes and clinical value in acute lower extremity deep venous thrombosis treated with catheter-directed thrombolysis. J Vasc Surg Venous Lymphat Disord. 2025;13(3):102167. pmid:39818303
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref20] 20. Aranda C, Peralta L, Gagliardi L, López A, Jiménez Á, Herreros B. A significant decrease in D-dimer concentration within one month of anticoagulation therapy as a predictor of both complete recanalization and risk of recurrence after initial pulmonary embolism. Thromb Res. 2021;202:31–5. pmid:33711756
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref21] 21. Stevens SM, Woller SC, Kreuziger LB, Bounameaux H, Doerschug K, Geersing GJ, et al. Antithrombotic therapy for VTE disease: second update of the CHEST Guideline and Expert Panel Report. Chest. 2021;160(6):e545–608.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref22] 22. Mazzolai L, Ageno W, Alatri A, Bauersachs R, Becattini C, Brodmann M, et al. Second consensus document on diagnosis and management of acute deep vein thrombosis: updated document elaborated by the ESC Working Group on aorta and peripheral vascular diseases and the ESC Working Group on pulmonary circulation and right ventricular function. Eur J Prev Cardiol. 2022;29(8):1248–63. pmid:34254133
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref23] 23. Kakkos SK, Gohel M, Baekgaard N, Bauersachs R, Bellmunt-Montoya S, Black SA, et al. Editor’s choice - European Society for Vascular Surgery (ESVS) 2021 Clinical Practice Guidelines on the Management of Venous Thrombosis. Eur J Vasc Endovasc Surg. 2021;61(1):9–82. pmid:33334670
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref24] 24. Qiu T, Zhang T, Liu L, Li W, Li Q, Zhang X, et al. The anatomic distribution and pulmonary embolism complications of hospital-acquired lower extremity deep venous thrombosis. J Vasc Surg Venous Lymphat Disord. 2021;9(6):1391–1398.e3. pmid:33753301
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref25] 25. Egermayer P. The effects of heparin and oral anticoagulants on thrombus propagation and prevention of the postphlebitic syndrome: a critical review of the literature. Prog Cardiovasc Dis. 2001;44(1):69–80. pmid:11533928
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref26] 26. Rook B, van Rijn MJE, Jansma EP, van Montfrans C. Effect of exercise after a deep venous thrombosis: a systematic review. J Eur Acad Dermatol Venereol. 2024;38(2):289–301. pmid:37731155
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

Figures

Abstract

Background

Methods

Results

Conclusion

Introduction

Methods

Study dataset construction

Data source and exclusion criteria.

Sample size determination.

Study design

Group stratification.

Candidate predictors.

Data preprocessing.

Predictive model construction.

Model performance evaluation.

Ethics

Model development environment

Results

Baseline characteristics of the study cohort

Model development

Analysis of important predictors

Discussion

Conclusion

Supporting information

S1 Fig. Confusion matrices of all candidate models in the development set.

S2 Fig. Confusion matrices of all candidate models in the independent test set.

S1 File. Method for calculating the venous recanalization rate.

S2 File. Calculation methods for selected predictors.

S3 File. Collinearity diagnostics for all candidate predictors and for predictors included in the logistic regression model.

S1 Table. Comparison of baseline characteristics between the development and test sets.

S2 Table. List of abbreviations used in the manuscript.

References