Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The potential of the transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients with ischemic heart disease

Abstract

Introduction

Ischemic heart disease is a leading cause of death worldwide, and its importance is increasing with the aging population. The aim of this study was to evaluate the accuracy of SurvTrace, a survival analysis model using the Transformer—a state-of-the-art deep learning method—for predicting recurrent cardiovascular events and stratifying high-risk patients. The model’s performance was compared to that of a conventional scoring system utilizing real-world data from cardiovascular patients.

Methods

This study consecutively enrolled patients who underwent percutaneous coronary intervention (PCI) at the Department of Cardiovascular Medicine, University of Tokyo Hospital, between 2005 and 2019. Each patient’s initial PCI at our hospital was designated as the index procedure, and a composite of major adverse cardiovascular events (MACE) was monitored for up to two years post-index event. Data regarding patient background, clinical presentation, medical history, medications, and perioperative complications were collected to predict MACE. The performance of two models—a conventional scoring system proposed by Wilson et al. and the Transformer-based model SurvTrace—was evaluated using Harrell’s c-index, Kaplan–Meier curves, and log-rank tests.

Results

A total of 3938 cases were included in the study, with 394 used as the test dataset and the remaining 3544 used for model training. SurvTrace exhibited a mean c-index of 0.72 (95% confidence intervals (CI): 0.69–0.76), which indicated higher prognostic accuracy compared with the conventional scoring system’s 0.64 (95% CI: 0.64–0.64). Moreover, SurvTrace demonstrated superior risk stratification ability, effectively distinguishing between the high-risk group and other risk categories in terms of event occurrence. In contrast, the conventional system only showed a significant difference between the low-risk and high-risk groups.

Conclusion

This study based on real-world cardiovascular patient data underscores the potential of the Transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients.

Introduction

Ischemic heart disease remains the leading cause of death worldwide, despite advancements in treatment modalities and therapeutic technologies [1, 2]. As the population continues to age, improving the prognosis and treatment of ischemic heart disease has become increasingly important. Accurate patient risk stratification is crucial for optimizing treatment, and the effectiveness of scoring systems, such as the Suita score, has been well-documented [3]. Wilson et al. have also reported that scoring models incorporating age and history of catheterization are effective in predicting post-catheterization events [4].

In recent years, rapid advancements in machine learning have shown promise in surpassing conventional methods in patient risk assessment [5, 6]. Beyond standard machine learning survival analysis, new deep learning survival models have been proposed [7]. Specifically, Wang et al. found that a deep learning model known as the “Transformer”, which employs an attention mechanism rather than recurrent neural networks or convolutional neural networks, is effective for survival time analysis [8, 9]. The Transformer model has become pivotal in contemporary deep learning, serving as the foundation for systems like ChatGPT [10, 11]. However, no studies have yet assessed the effectiveness of using the Transformer for survival analysis in the cardiovascular field. Therefore, the aim of this study was to compare and validate the accuracy of the novel Transformer-based model against conventional risk scoring model using real-world data from cardiovascular patients.

Methods

Study design and participants

This study involved consecutive enrollment of patients who underwent percutaneous coronary intervention (PCI) at the Department of Cardiovascular Medicine, University of Tokyo Hospital, between 2005 and 2019. Within this timeframe, the initial PCI performed at our hospital was designated as the index procedure for each individual patient and used for analysis. Data were accessed and collected for research purposes from October 20, 2022 to December 28, 2022. Information that could identify individual participants was anonymized. A correspondence table was created to ensure that patient information could be accessed after collection, if necessary, while maintaining anonymity. The outcomes of these procedures were evaluated retrospectively. Data on patient background, clinical presentation, medical history, admission medications, perioperative complications, and discharge medications were extracted from the electronic health records (EHRs) of those who underwent the index PCI. Hypertension was defined as a systolic blood pressure of 140 mmHg or higher upon admission, a diastolic blood pressure of 90 mmHg or higher upon admission, or ongoing treatment with antihypertensive medications. Diabetes mellitus was defined by a hemoglobin A1c level ≥6.5% upon admission or ongoing treatment with either insulin or oral hypoglycemic agents. Dyslipidemia was defined as a low-density lipoprotein cholesterol level ≥140 mg/dL upon admission, a high-density lipoprotein cholesterol < 40 mg/dL upon admission, triglycerides ≥150 mg/dL upon admission, or ongoing use of dyslipidemia medications. Chronic kidney disease was defined as patients with an eGFR <60 mL/minute/1.73 m2, calculated using the Modification of Diet in Renal Disease (MDRD) equation [12] and serum creatinine levels upon admission modified by Japanese coefficients.

Missing data constituted 1.0% of all variables in the total dataset. These missing values were addressed using the multiple imputation method [13]. This technique substituted missing data points with a set of plausible alternatives, thereby generating multiple complete datasets for analysis. Each dataset was individually analyzed, and the results were then aggregated to produce a single, comprehensive result. In this study, we used Python to generate five pseudo-complete datasets, applying multiple imputations using the Bayesian Ridge method (S1 File).

To improve model interpretability and minimize multicollinearity, Pearson’s correlation coefficient was used to assess the correlation among explanatory variables. Any variable exhibiting a Pearson’s correlation coefficient exceeding 0.90 was omitted from the set of explanatory variables used for model training [14]. In cases where two features were highly correlated, the one with the greater overall correlation to all features was eliminated [14]. During the preprocessing phase, all continuous variables were standardized to have a mean value of 0 and a standard deviation of 1.

The endpoint consisted of a composite of major adverse cardiovascular events (MACE), including cardiac death, acute coronary syndrome, cerebrovascular event, and hospitalization for heart failure [4]. EHRs were used to collect data on these outcomes, as well as the period until their occurrence, for up to two years following the index procedure. Cardiac death was defined as death from acute myocardial infarction, ventricular arrhythmia, or heart failure [15]. Acute coronary syndrome was defined as nonfatal myocardial infarction or unstable angina [15]. Nonfatal myocardial infarction was defined as persistent angina accompanied by new ECG abnormalities and elevated cardiac biomarkers [15]. Unstable angina pectoris was defined as an extended episode of resting ischemic symptoms (typically exceeding 10 minutes) or a lowering of the activity threshold that induced accelerated chest pain, necessitating an unscheduled medical visit and an overnight stay—usually within 24 hours of the most recent symptoms—while not fulfilling myocardial infarction cardiac biomarker criteria [16]. Cerebrovascular events were defined as either cerebral hemorrhage or cerebral infarction. Survival time analyses were conducted on these outcomes until the respective dates of event onset. To compare the prognostic accuracy of the novel Transformer-based model with that of the conventional risk scoring model, the c-index was employed [17]. Subsequently, the risk stratification capabilities of each model were assessed by computing risk scores for every patient using the trained models. Patients in the test set were classified into high-, intermediate-, and low-risk score groups [18] and evaluated through Kaplan–Meier survival curves [19] and log-rank tests [20].

The impact of explanatory variables on outcomes was assessed using Shapley additive explanations (SHAP) [21]. An algorithmic evaluation method rooted in game theory, SHAP uses Shapley scores to estimate the contribution of each explanatory variable to the model’s prediction.

To assess the robustness of our findings, we performed three distinct sensitivity analyses: first, by omitting missing values; second, by adjusting the percentage of test sets; and third, by excluding patients with a history of PCI. This study was conducted in accordance with the revised Declaration of Helsinki and received approval from the institutional review board of the University of Tokyo Hospital (2021238NI-(2)). Informed consent was obtained in the form of an opt-out on a website.

Modeling

To evaluate the predictive accuracy of MACE, we utilized the scoring system proposed by Wilson et al. [4] and SurvTrace, which is based on a model that uses a Transformer architecture [8].

The scoring system proposed by Wilson et al. serves as a predictive model for recurrent cardiovascular disease and incorporates variables such as age, smoking history, history of diabetes or heart failure, body mass index, number of diseased vessels, and history of statin or aspirin therapy. For the purposes of this study, it was defined as a conventional scoring model. SurvTrace is an alternative survival time analysis model that employs a Transformer, a specific deep learning technique. Using an attention mechanism, this model enables efficient calculation of the effect of each variable on survival time. All computational models were implemented using Python and executed on an Nvidia Tesla A-100 80GB graphics processing unit.

For data partitioning, 90% of the total dataset was randomly selected to constitute the training set. Subsequently, 25% of this training set was randomly allocated for validation during the model training process. The remaining 10% of the data, which was not included in the training set, served as a test set for assessing the accuracy of the trained models. Throughout the training process, Optuna, an advanced framework for hyperparameter optimization tailored for machine learning, was employed to fine-tune the model’s hyperparameters [22]. S2 File shows the SurvTrace execution code used.

Statistical analysis

Five pseudo-complete datasets were generated through the application of multiple imputation techniques to address missing values. The model’s accuracy was then calculated based on these five datasets. To synthesize the findings, the five accuracy estimates derived from each model were integrated using Rubin’s rules, facilitating a comparison of model performance [23].

For continuous variables, measurements were expressed as either mean (± standard deviation) or median (first and third quartiles), while categorical variables were reported as counts and frequencies (%).

The models’ prognostic accuracy was assessed using Harrell’s c-index [18]. Additionally, the risk stratification capabilities of each model were assessed through Kaplan–Meier curves [20] and log-rank tests [21]. The p value threshold for significance was set at <0.05. All statistical analyses were performed using Python 3.7.

Results

Between January 1, 2005, and December 31, 2019, a total of 3938 first-time PCIs were performed in our hospital. Of these, 394 were designated as the test dataset, while the remaining 3544 cases were used for model training (Fig 1). Among the patient information data collected from the EHRs at the University of Tokyo Hospital, 171 explanatory variables were used. Table 1 outlines the baseline characteristics of the key explanatory variables. The training dataset contained a significantly higher number of patients with a history of previous PCI compared with the test dataset. During the observation period, 683 subjects (17.3%) were lost to follow-up, including 610 cases in the training dataset and 73 cases in the test dataset.

thumbnail
Table 1. Baseline characteristics of key explanatory variables.

https://doi.org/10.1371/journal.pone.0304423.t001

The c-index of SurvTrace outperformed that of the conventional scoring system, registering a mean c-index of 0.72 (95% confidence interval: 0.69–0.76), as opposed to a mean c-index of 0.64 (95% confidence interval: 0.64–0.64) for the conventional scoring system (Table 2, Fig 2). Fig 3 illustrates the learning curve of SurvTrace during its training process. The most accurate training model from among all trained risk prediction models, along with its dataset, was used to evaluate risk stratification capabilities. While the conventional scoring system showed that the low-risk group experienced significantly fewer events compared with the high-risk group, it did not show a significant difference between the intermediate-risk group and the other patient groups (Fig 4). In contrast, SurvTrace revealed that the high-risk group had a significantly higher number of events than the other groups (Fig 4).

thumbnail
Fig 2. C-indices of the models.

This figure shows the c-index for both the conventional scoring system and SurvTrace. The upper and lower black lines represent the upper and lower limits of the 95% confidence intervals, respectively. The orange line shows the mean c-index value calculated from five pseudo-complete datasets.

https://doi.org/10.1371/journal.pone.0304423.g002

thumbnail
Fig 3. Learning curve of SurvTrace during the training process.

This figure illustrates the variation in the loss function over the course of the training process. The left panel shows the fluctuations in loss values for the training dataset, while the right panel shows these changes for the validation dataset.

https://doi.org/10.1371/journal.pone.0304423.g003

thumbnail
Fig 4. Kaplan–Meier curves of the models.

This figure shows the Kaplan–Meier curves generated by both the conventional scoring model and SurvTrace. The blue lines represent the Kaplan–Meier curve for the low-risk group as stratified by risk scores from both models. Similarly, the orange and green lines represent the curves for the intermediate- and high-risk groups, respectively. The translucent segments of each line indicate the 95% confidence interval.

https://doi.org/10.1371/journal.pone.0304423.g004

Fig 5 presents the SHAP result, indicating that SurvTrace highlighted the influence of pre-existing conditions, such as a history of chronic heart failure.

thumbnail
Fig 5. Summary plot of SurvTrace.

This figure illustrates the Shapley additive explanations (SHAP) of SurvTrace. The horizontal axis indicates the impact on the model’s prediction, with points situated to the right representing a higher risk of future major adverse cardiovascular events (MACE) compared with points on the left. The vertical axis indicates the importance of the explanatory variables. In this model, a history of hospitalization for heart failure (HF) exerts the greatest impact on predicting the risk of future MACE events. The color of each dot indicates the high or low status within each variable; for example, in the “History of HF Hospitalization” column, red indicates that the patient has a history of HF hospitalization, while blue indicates no such history.

https://doi.org/10.1371/journal.pone.0304423.g005

In the first sensitivity analysis, cases with missing values were excluded from both training and test datasets. Post-exclusion, the training dataset comprised 2137 cases, and the test dataset contained 254 cases. The c-index for SurvTrace was 0.71, compared with 0.66 for the conventional scoring system. The second sensitivity analysis involved adjusting the proportion of the test dataset to 20%. Following this modification, the analysis was performed using one of the five pseudo-complete datasets generated by the multiple imputation method, including both training and test datasets. This adjustment yielded a c-index of 0.68 for SurvTrace and 0.66 for the conventional scoring system. In the final sensitivity analysis, after excluding patients with a history of PCI from one of the five pseudo-complete training and test datasets, the c-index for SurvTrace was 0.69, compared with 0.63 for the conventional scoring system.

This figure illustrates the flowchart of the study. Initially, all data were split into training and test datasets at a 9:1 ratio. To address missing values, multiple imputation was applied to both datasets, generating five pseudo-complete datasets for each. A separate 25% segment of the training dataset was reserved for validation. Subsequently, survival analysis was performed on each pseudo-complete dataset, and the c-index was calculated. Finally, Rubin’s rules were used to integrate the c-index values from each dataset to compute the final result. In the figure, yellow-green represents the data used for training the model, orange represents the validation data, and pink represents the data used for testing post-training.

Discussion

This study demonstrated that SurvTrace, a predictive model using the Transformer deep learning algorithm, was effective in predicting recurrent cardiovascular events in patients with ischemic heart disease based on real-world clinical data. Compared with conventional scoring system, SurvTrace not only demonstrated superior accuracy in event prediction but also showed an improved ability to stratify high-risk patients.

The Transformer-based SurvTrace model demonstrated significantly higher prediction accuracy for recurrent cardiovascular events in patients with ischemic heart disease, using real-world clinical data, than did conventional scoring system. SurvTrace also demonstrated a significantly greater capacity for high-risk patient stratification relative to conventional scoring system. The model maintained its superior performance across a range of sensitivity analyses, which included the exclusion of missing values from the training and test datasets, modification of the test set percentages, and the exclusion of patients with a history of PCI. These results are consistent with previous studies that have underscored the superiority of machine learning and deep learning algorithms over conventional scoring systems [6, 18]. The high accuracy of these advanced models is likely attributed to their ability to identify complex patterns among explanatory variables, a feature not present in conventional methods. Typically, conventional scoring systems rely on linear models, selecting only statistically significant explanatory variables. Such models necessitate explicit definitions of relationships between explanatory variables to account for any interactions, thereby increasing model complexity and raising concerns about multicollinearity and overfitting as the number of variables grows. In contrast, the Transformer algorithm can directly incorporate multiple explanatory variables into its models, capturing nonlinear relationships and complex interactions among them without the need for explicit definitions. In this study, while the conventional scoring system incorporated only important variables such as age, gender, and medical history, SurvTrace used all 171 explanatory variables. This comprehensive approach to feature inclusion may contribute to its higher predictive accuracy.

The Transformer model’s ability to stratify high-risk group more accurately than conventional scoring system has important implications for managing patients with ischemic heart disease in real-world clinical practice. Moreover, the alignment of our SHAP results with prior findings further underscores the robustness and validity of our study’s outcomes. The enhanced risk stratification capabilities of the Transformer model could potentially improve clinical decision making and assist physicians in tailoring treatment plans for individual patients [24]. Recent advancements have introduced large language models capable of automatically extracting structured data from electronic medical records [25, 26]. Using these language models enables automated survival time analysis and future risk stratification based on individual patient records, offering a more personalized treatment approach that may potentially enhance intervention effectiveness and improve patient outcomes.

This study has several limitations that warrant consideration. First, our research relied on a dataset from a single institution, making it susceptible to potential selection bias. Future studies should address institution-specific biases by expanding and validating the diversity of the patient population through multicenter studies. Second, the sample size was relatively modest, comprising 3938 patients. In general, deep learning models require larger datasets to achieve high levels of accuracy; therefore, our sample size may have been insufficient. Third, although this study demonstrated the superiority of the Transformer model over conventional scoring system, it should be noted that the model used was specific to this study. Other Transformer models not evaluated in this study may yield different results. Fourth, this study was retrospective in nature, with events meticulously tracked in the EHRs. Despite this thorough tracking, some events might have been overlooked as a result of patients relocating or transferring to other hospitals, potentially leading to selection bias. To mitigate this issue, future prospective studies employing survival analysis with the Transformer model are necessary. Lastly, missing values in the dataset were handled using multiple imputation methods to facilitate the Transformer model’s application. These imputed values could introduce bias, especially for the Transformer model, as deep learning models are known to be sensitive to data noise.

Conclusion

This study demonstrated that a survival analysis model using Transformer, a state-of-the-art deep learning method, was significantly more accurate than the conventional scoring system in predicting recurrent cardiovascular events and stratifying high-risk patients using real-world clinical data. Additional research is warranted to further optimize the performance of deep learning models for more effective risk stratification and management of patients with ischemic heart disease.

Supporting information

S1 File. Multiple imputation method execution file.

This file contains the code to execute the multiple imputation method in Python.

https://doi.org/10.1371/journal.pone.0304423.s001

(DOCX)

S2 File. SurvTrace execution file.

This file contains the code to execute SurvTrace in Python.

https://doi.org/10.1371/journal.pone.0304423.s002

(DOCX)

Acknowledgments

We thank Phoebe Chi, MD, from Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.

References

  1. 1. Ritchie H, Roser M. Causes of Death. Publ Online OurWorldInDataOrg 2018. https://ourworldindata.org/causes-of-death. [Online Resource] accessed 2023-7-19.
  2. 2. Benjamin EJ, Blaha MJ, Chiuve SE, Cushman M, Das SR, Deo R, et al. Heart Disease and Stroke Statistics—2017 Update: A Report From the American Heart Association. Circulation 2017;135. pmid:28122885
  3. 3. Nishimura K, Okamura T, Watanabe M, Nakai M, Takegami M, Higashiyama A, et al. Predicting Coronary Heart Disease Using Risk Factor Categories for a Japanese Urban Population, and Comparison with the Framingham Risk Score: The Suita Study. J Atheroscler Thromb 2016;23:1138–1139. pmid:27582077
  4. 4. Wilson PWFD’Agostino R, Bhatt DL, Eagle K, Pencina MJ, Smith SC, et al. An International Model to Predict Recurrent Cardiovascular Disease. Am J Med 2012;125:695–703.e1. pmid:22727237
  5. 5. Kwong JC, Khondker A, Kim JK, Chua M, Keefe DT, Dos Santos J, et al. Posterior Urethral Valves Outcomes Prediction (PUVOP): a machine learning tool to predict clinically relevant outcomes in boys with posterior urethral valves. Pediatr Nephrol 2022;37:1067–1074. pmid:34686914
  6. 6. Sato M, Tateishi R, Moriyama M, Fukumoto T, Yamada T, Nakagomi R, et al. Machine Learning–Based Personalized Prediction of Hepatocellular Carcinoma Recurrence After Radiofrequency Ablation. Gastro Hep Adv 2022;1:29–37.
  7. 7. Yu H, Huang T, Feng B, Lyu J. Deep-learning model for predicting the survival of rectal adenocarcinoma patients based on a surveillance, epidemiology, and end results analysis. BMC Cancer 2022;22:210. pmid:35216571
  8. 8. Wang Z, Sun J. SurvTRACE: Transformers for Survival Analysis with Competing Events. Proc. 13th ACM Int. Conf. Bioinformatics, Comput. Biol. Heal. Informatics, New York, NY, USA: Association for Computing Machinery; 2022. https://doi.org/10.1145/3535508.3545521
  9. 9. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Adv. Neural Inf. Process. Syst., vol. 30, Curran Associates, Inc.; 2017.
  10. 10. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language Models are Unsupervised Multitask Learners, 2019. OpenAI blog [Internet]. 1.8 (2019): 9. [cited 2023 Oct 29]. Available from: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
  11. 11. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Adv. Neural Inf. Process. Syst., vol. 33, Curran Associates, Inc.; 2020, p. 1877–1901.
  12. 12. Levey AS, Greene T, Kusek JW, Beck GJ. A simplified equation to predict glomerular filtration rate from serum creatinine. J Am Soc Nephrol 2000;11:155A.
  13. 13. Austin PC, White IR, Lee DS, van Buuren S. Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Can J Cardiol 2021;37:1322–1331. pmid:33276049
  14. 14. Forrest IS, Petrazzini BO, Duffy Á, Park JK, Marquez-Luna C, Jordan DM, et al. Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet 2023;401:215–225. pmid:36563696
  15. 15. Thygesen K, Alpert JS, Jaffe AS, Chaitman BR, Bax JJ, Morrow DA, et al. Fourth Universal Definition of Myocardial Infarction (2018). J Am Coll Cardiol 2018;72:2231–2264. pmid:30153967
  16. 16. Maron DJ, Hochman JS, Reynolds HR, Bangalore S, O’Brien SM, Boden WE, et al. Initial Invasive or Conservative Strategy for Stable Coronary Disease. N Engl J Med 2020;382:1395–1407. pmid:32227755
  17. 17. Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247:2543–2546. pmid:7069920
  18. 18. Rousset A, Dellamonica D, Menuet R, Lira Pineda A, Sabatine MS, Giugliano RP, et al. Can machine learning bring cardiovascular risk assessment to the next level? A methodological study using FOURIER trial data. Eur Hear J—Digit Heal 2022;3:38–48. pmid:36713994
  19. 19. Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. J Am Stat Assoc 1958;53:457–481.
  20. 20. Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Reports 1966;50:163–170. pmid:5910392
  21. 21. Lundberg SM, Lee S-I. A Unified Approach to Interpreting Model Predictions. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Adv. Neural Inf. Process. Syst., vol. 30, Curran Associates, Inc.; 2017.
  22. 22. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework 2019. arXiv: 1907.10902v1 [Preprint]. [cited 2024 Jan 23]: Available from: https://arxiv.org/abs/1907.10902
  23. 23. Yuan YC. Multiple imputation for missing data: Concepts and new development (Version 9.0). SAS Inst Inc, Rockville, MD 2010;49:12.
  24. 24. Sánchez-Puente A, Dorado-Díaz PI, Sampedro-Gómez J, Bermejo J, Martinez-Legazpi P, Fernández-Avilés F, et al. Machine Learning to Optimize the Echocardiographic Follow-Up of Aortic Stenosis. JACC Cardiovasc Imaging 2023;16:733–744. pmid:36881417
  25. 25. Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. Npj Digit Med 2022;5:194. pmid:36572766
  26. 26. Bisercic A, Nikolic M, van der Schaar M, Delibasic B, Lio P, Petrovic A. Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models 2023. arXiv:2306.05052v1 [Preprint]. [cited 2023 Oct 29]: Available from: https://arxiv.org/abs/2306.05052