Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Predicting risk of ascending thoracic aortic aneurysm in asymptomatic adults using machine learning

  • Seung Jae Lee ,

    Contributed equally to this work with: Seung Jae Lee, Jin Hee Ahn

    Roles Formal analysis, Writing – original draft

    Affiliation Division of Cardiology, Department of Internal Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

  • Jin Hee Ahn ,

    Contributed equally to this work with: Seung Jae Lee, Jin Hee Ahn

    Roles Writing – original draft

    Affiliation Department of Anesthesiology and Pain Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

  • Eun-Ah Cho,

    Roles Data curation, Investigation

    Affiliation Department of Anesthesiology and Pain Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

  • Suyong Jeon,

    Roles Investigation, Methodology

    Affiliation Department of Anesthesiology and Pain Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

  • Eun Jung Oh,

    Roles Investigation, Methodology

    Affiliation Department of Anesthesiology and Pain Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

  • Jae-Geum Shim

    Roles Conceptualization, Writing – review & editing

    jgshim77@naver.com

    Affiliations Department of Anesthesiology and Pain Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea, Healthcare Data Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea

Abstract

Most patients with ascending thoracic aortic aneurysms (ATAA) remain asymptomatic until they develop fatal complications, including aortic dissection and rupture. We aimed to develop and validate machine-learning models for predicting ATAA risk. We developed a predictive model for the risk of ATAA based on data from 18,382 participants from the Kangbuk Samsung Health Study between January 1, 2010, and December 31, 2018. In the screening context, an ATAA was defined as an ascending thoracic aorta with a diameter ≥ 3.7 cm. For the model inputs, we used 16 variables from medical records, including basic patient information, physical indices, baseline medical conditions, and laboratory data at an early stage. A feature importance analysis was performed to analyze the factors related to the risk of ATAA in healthy adults. A machine learning model for predicting the risk of ATAA was developed using a 5-layer deep neural network (DNN) with the 15 key features. The performance of this model was evaluated in terms of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). Age was the most important factor in predicting the risk of ATAA, followed by hypertension, waist circumference, creatinine level, smoking, and body mass index. The AUROC and accuracy of our 5-layer DNN with the 15 key features are 80.4% and 83.5%, respectively. The sensitivity and specificity of the DNN were 69.4% and 81.1%, respectively. We developed and validated a machine learning model that can be used to assess the risk of ATAA. This model has potential applications in disease screening for ATAA at an early stage.

Introduction

An early diagnosis of an ascending thoracic aortic aneurysm (ATAA) is challenging because it is asymptomatic enough to be termed a “silent killer.” [1]{Saeyeldin, 2019 #301}{Saeyeldin, 2019 #301}{Saeyeldin, 2019 #301}{Saeyeldin, 2019 #301}{Saeyeldin, 2019 #301} Previous studies show that ATAA progresses slowly: approximately 1 mm of growth in diameter is reported per year [2]. Except for cases with risk factors like bicuspid aortic valve, Marfan syndrome, or a family history of aortic dissections, it is often incidentally found during routine screening [3]. An ATAA can cause life-threatening complications such as aortic dissection or rupture, but early detection allows for medical treatment opportunities or preventive surgical replacement. Currently, echocardiography and computed tomography (CT) are the main methods used to measure aortic diameter. Transthoracic echocardiography can only measure the root of the aorta, while transesophageal echocardiography can detect the ascending and descending aorta with the limitation of observing the aortic arch [4]. Therefore, CT is the most widely used technique for visualizing the ascending thoracic aorta and diagnosing an ATAA. Due to the short scan times, which reduce respiratory and cardiac motion artifacts, CT scans can be helpful for getting diagnostic images of the entire thoracic aorta [5] However, CT scans to confirm abnormalities in the entire thoracic aorta are not suitable for population.-based screening tests because of the shortcomings of radiation exposure. Currently, there is no suitable tool for the routine screening of ATAA [6].

Machine learning has become essential in most sciences because of advancements in solving complex problems. Machine learning plays an important role in disease prediction and decision-making in medicine [7]. Recently, large amounts of medical data have been stored and utilized in real time, allowing machine learning techniques to outperform traditional statistical methods in handling data with strong complexity and nonlinearity. The application of machine learning can help screen individuals in high-risk groups for specific diseases and can be implemented in large populations. For instance, Yu et al. used the support vector machine classifier to estimate the diameter of the descending thoracic aorta with an error less than 2 mm2 [8]. Similarly, Mori et al. presented a predictive model based on logistic regression for an ATAA and demonstrated an AUROC value of 0.72 using only demographic and combination information [9]. Also, a deep learning model was used for semantic segmentation of the ascending and descending thoracic aorta in cardiac magnetic resonance images, specifically the U-Net architecture, to estimate the dimensions of the ascending and descending thoracic aorta [10].

To present a predictive model as a screening tool in the preliminary stage to proceed with CT for the final diagnosis of ATAA, we aimed to develop machine learning models and validate them internally. In addition, machine learning-based feature importance analysis was conducted on basic demographic and clinical information measured in healthy adults to identify the risk factors for ATAA.

Methods

Data sets

This study was approved by the Institutional Review Board of Kangbuk Samsung Hospital (IRB 2023-05-034), which waived the requirement for informed consent due to the retrospective design of the study. This study, as part of the Kangbuk Samsung Health Study, included participants who underwent comprehensive health screening examinations at the Kangbuk Samsung Hospital Total Healthcare Centers in Seoul, South Korea, between January 1, 2010, and December 31, 2018. The data for this study were accessed on March 5, 2023.

We collected basic patient information, physical indices, initial examination results, comorbid diseases, and laboratory tests at the time of the comprehensive health screening examinations. All ATAA and non-ATAA patients were identified using chest CTs. Sixteen predictor variables were obtained from both ATAA and non-ATAA patients and used to develop a predictive model. Basic patient information included age, sex, and smoking status. The initial examination findings included the mean values of systolic and diastolic blood pressure and heart rate when visiting the hospital. The comorbid diseases included diabetes mellitus, dyslipidemia, and hypertension. Laboratory tests included creatinine, glucose, high-sensitivity C-reactive protein (hs-CRP), and low-density lipoprotein (LDL).

In this study, an ATAA was defined as an ascending thoracic aorta of a diameter ≥ 3.7 cm in the screening context. Given that previous studies have shown that the mean aorta diameter in Koreans was less than 3.5 cm in both men and women, a threshold of 3.7 cm was chosen to maximize the sensitivity of detecting an enlarged aorta [11]. If multiple results of the chest CT scan in the same patient existed, only the test results conducted on the earliest date were included in our study.

Data preprocessing

In the datasets, missing values were noted in < 2% of the records. To impute missing data, we replaced the missing values of each variable with the mean of the available values of that variable. The total dataset was split into training, validation, and test sets in a stratified manner on an ATAA basis to prevent contamination of the validation and test sets with the training data. We used 60% of the participants for training and 20% for validation and testing. Subsequently, we standardized the training set to maintain the parameters (mean and standard deviation values) for each feature, and these parameters were applied to transform the test set. In the binary classification problem with the risk of an ATAA in this study, most patients would have no disease, and detecting the disease is of greater interest. To address the large class imbalance problem, a synthetic minority oversampling technique was applied [12].

Feature importance analysis and feature selection

We performed feature importance analysis using random forest, eXtreme Gradient Boosting (XGBoost), and AdaBoost algorithms to select those features that contribute most to our prediction variable (ATAA) [13]. After calculating the feature importance by applying each machine learning algorithm, the overall feature importance was determined by averaging them. We chose the optimal number of features as the input in our machine learning model while changing the number of features according to the overall feature importance.

Machine learning model building

We developed a deep neural network (DNN) model to predict ATAA in patients. In our predictive model, the DNN consists of five layers: an input layer, an output layer, and three hidden layers. The input layer obtains data from the features ranked according to their average feature importance. We estimated the number of features required to train the DNN to be 16. We used TensorFlow to build a DNN with the Adam optimizer and a binary cross-entropy cost function with a learning rate of 0.001 and a batch size of 128 [14].{Saeyeldin, 2019 #301}{Abadi, 2016 #188}{Abadi, 2016 #188}{Abadi, 2016 #188}{Abadi, 2016 #188}{Abadi, 2016 #188}{Chang, 2020 #185}{Abadi, 2016 #188}{Abadi, 2016 #188}.

Python 3.7.13 (Python Software Foundation, Wilmington, DE, USA) and R version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria) were used for the DNN model development and descriptive statistics.

Performance evaluation

We tested the prediction performance of our proposed 5-layer DNN model with a hold-out dataset (n = 3,677) that was not used for training or validation to provide an unbiased assessment. To compare the prediction performance of the DNN model with that of other external machine learning models, we separately trained the following models: decision tree, random forest, XGBoost, and AdaBoost. The prediction performance of the DNN model was compared with that of other machine-learning models, including decision tree, random forest, XGBoost, and AdaBoost. To determine the goodness of fit of the medical predictive model, we used the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, and specificity.

Results

A total of 18,382 patients who visited Kangbuk Samsung Hospital Total Healthcare Centers in Seoul, Korea, were analyzed in our study. 1,147 patients (6.24%) experienced an ATAA according to our definition. Table 1 shows the characteristics of the subjects with and without an ATAA. The mean ages of ATAA and non-ATAA participants were 49.9 ± 9.5 and 40.5 ± 6.7 years, respectively. Although the percentage of males with an ATAA was higher (93.7%) than those without(90.6%), the difference in sex between patients with and without ATAA was not large.

Feature selection

Fig 1 shows the average feature importance values obtained from the random forest, XGBoost, and AdaBoost algorithms. Age was the most important factor in predicting the risk of ATAA, followed by hypertension, waist circumference, creatinine level, smoking, and body mass index. Additionally, we evaluated the predictive performance of the validation dataset in terms of accuracy and AUROC. We found that the use of the top 15 features for DNN model accuracy led to AUROCs of 0.804 and 0.835, as shown in Fig 2.

thumbnail
Fig 1. Results of normalized feature importance analysis.

HTN, hypertension; Cr, creatinine; BMI, body mass index; CRP, C-reactive protein; SBP, systolic blood pressure; DL, dyslipidemia; DBP, diastolic blood pressure; BSA, body surface area; HR, heart rate; LDL, low-density lipoprotein; Glc, glucose; DM, diabetes mellitus.

https://doi.org/10.1371/journal.pone.0342482.g001

thumbnail
Fig 2. The influence of the number of features on validation accuracy and AUROC.

AUROC, area under the receiver operating characteristic curve.

https://doi.org/10.1371/journal.pone.0342482.g002

Performance of the AI prediction model

The prediction performance of the DNN model with the isolated testing dataset (n = 3,677) is presented in Table 2 and Fig 3. Our proposed model, with the 15 key features, showed a sensitivity of 0.694, specificity of 0.811, accuracy of 0.804, and AUROC of 0.835. When we compared the performance metrics of our model with those of other machine learning algorithms, we observed that our proposed 5-layer DNN model was superior to the other models in terms of sensitivity and AUROC.

thumbnail
Table 2. Summary of five machine learning models with testing data.

https://doi.org/10.1371/journal.pone.0342482.t002

thumbnail
Fig 3. The prediction performance of the DNN model (testing data set).

https://doi.org/10.1371/journal.pone.0342482.g003

Discussion

The thoracic aorta is a large artery in the body that consists of three parts: the ascending aorta, aortic arch, and descending aorta. Enlargement of the aorta only mildly represents ectasia, and an aneurysm may occur because of localized weakness of the arterial wall when ectasia exceeds tolerance limits [15,16]. Thoracic aortic aneurysms can occur in one or more parts of the aorta. According to a previous study, approximately 60% of thoracic aortic aneurysms occur in the aortic root and ascending aorta, and the remaining 40% are related to the descending aorta [17]. Considering that the proportion of ATAA in thoracic aortic aneurysms is not small, it is essential to reduce the underdiagnosis of ATAA by using highly sensitive tests, even if the specificity is slightly compromised.

As a test for routine screening of population size, we should focus on increasing the sensitivity of the test, considering fatal complications. Currently, computed tomography angiography (CTA) or magnetic resonance angiography (MRA) are the selected imaging tests for diagnosing ATAA and measuring its severity [18]. However, CTA involves radiation exposure and the use of contrast dye, with the limitations of reduced renal function. As MRA is time-consuming and involves the use of a contrast dye, there are limitations for patients with claustrophobia or kidney problems. Therefore, we believe that a screening tool that can be easily applied will help reduce underdiagnosis by first selecting high-risk patients and then using additional imaging methods.

Considering the indolent and asymptomatic characteristics of ATAA, emergency surgery is always possible in the event of complications. However, the mortality rate is estimated at 20% in the case of emergency surgery, whereas there was a low risk of mobility and morbidity in the case of elective surgery [19]. Therefore, it is imperative to actively try to detect an ATAA in advance so that no acute event occurs. Presently, we don’t exactly understand the benefit of surgical intervention compared with surveillance based on the size of the dilated ascending aorta [20]. However, this means that early identification of patients at high risk of mortality is important for the treatment of ATAA patients.

In this study, we propose an ATAA prediction model that uses a deep learning algorithm for healthy adults. Fifteen input values were required to apply the proposed prediction model. We also developed a web application that allows anyone to use our model to assess the risk of ATAA (Fig 4). Basic patient information, vital signs, comorbidities, and laboratory test results are typically obtained when visiting a hospital for the first time. We anticipate that our ATAA risk assessment approach will assist in directing future testing to only those patients who need intensive care.

Limitations

Our study has some limitations. To test the performance of our model, we validated it using an isolated test dataset split from the entire dataset before the training stage. However, to generalize our proposed model, external validation using new data from other hospitals or countries are required. We expect that it will be helpful to check the results of external data using our web application. In addition, the subjects of this study were only healthy adults who visited the hospital for medical checkups; therefore, there is a possibility of selection bias. Therefore, a performance improvement can be expected when training the model using a wide range of data, including people for health screening and other patients with illnesses. Furthermore, although family history of aortic disease is a well-established risk factor for ATAA, we could not incorporate this variable into our machine learning model. Due to the retrospective nature of the study and the specific format of the health screening data, detailed information regarding family history of aortic disease was not available. We acknowledge that the lack of this genetic and familial background information may have limited the sensitivity of our model. Future prospective studies should aim to include detailed family history data to potentially improve the predictive performance and clinical applicability of the model. Finally, our study was designed using only retrospective data, but we should check the actual performance through a prospective study in the future to confirm its clinical usefulness.

Conclusions

In conclusion, we developed and validated our machine learning model using 15 selected features based on patient demographics and clinical information. We also created a web application for anyone who wants to estimate the risk of an ATAA to access our model. This algorithm may be useful for screening unidentified ATAA.

References

  1. 1. Saeyeldin AA, Velasquez CA, Mahmood SUB, Brownstein AJ, Zafar MA, Ziganshin BA, et al. Thoracic aortic aneurysm: unlocking the “silent killer” secrets. Gen Thorac Cardiovasc Surg. 2019;67(1):1–11. pmid:29204794
  2. 2. Oladokun D, Patterson BO, Sobocinski J, Karthikesalingam A, Loftus I, Thompson MM, et al. Systematic Review of the Growth Rates and Influencing Factors in Thoracic Aortic Aneurysms. Eur J Vasc Endovasc Surg. 2016;51(5):674–81. pmid:26947541
  3. 3. Eswarsingh A, Bose A, Islam T, Venkataramanan SVA, Muthyala A, Shah SH, et al. Predictors and Rate of Progression of Aortic Root and Ascending Aorta Dilatation. Am J Cardiol. 2022;181:118–21. pmid:35987908
  4. 4. Mao SS, Ahmadi N, Shah B, Beckmann D, Chen A, Ngo L, et al. Normal thoracic aorta diameter on cardiac computed tomography in healthy asymptomatic adults: impact of age and gender. Acad Radiol. 2008;15(7):827–34. pmid:18572117
  5. 5. Takayanagi T, Suzuki S, Katada Y, Ishikawa T, Fukui R, Yamamoto Y, et al. Comparison of Motion Artifacts on CT Images Obtained in the Ultrafast Scan Mode and Conventional Scan Mode for Unconscious Patients in the Emergency Department. AJR Am J Roentgenol. 2019;213(4):W153–61. pmid:31166767
  6. 6. Salameh MJ, Black JH 3rd, Ratchford EV. Thoracic aortic aneurysm. Vasc Med. 2018;23(6):573–8. pmid:30370834
  7. 7. Obermeyer Z, Emanuel EJ. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375(13):1216–9. pmid:27682033
  8. 8. Yu R, Jin M, Wang Y, Cai X, Zhang K, Shi J, et al. A machine learning approach for predicting descending thoracic aortic diameter. Front Cardiovasc Med. 2023;10:1097116. pmid:36860275
  9. 9. Mori M, Gan G, Deng Y, Yousef S, Weininger G, Daggula KR, et al. Development and Validation of a Predictive Model to Identify Patients With an Ascending Thoracic Aortic Aneurysm. J Am Heart Assoc. 2021;10(22):e022102. pmid:34743563
  10. 10. Pirruccello JP, Chaffin MD, Chou EL, Fleming SJ, Lin H, Nekoui M, et al. Deep learning enables genetic analysis of the human thoracic aorta. Nat Genet. 2022;54(1):40–51. pmid:34837083
  11. 11. Chang HW, Kim SH, Hakim AR, Chung S, Kim DJ, Lee JH, et al. Diameter and growth rate of the thoracic aorta-analysis based on serial computed tomography scans. J Thorac Dis. 2020;12(8):4002–13. pmid:32944312
  12. 12. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6(1).
  13. 13. Chung H, Ko H, Kang WS, Kim KW, Lee H, Park C, et al. Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation. J Med Internet Res. 2021;23(4):e27060. pmid:33764883
  14. 14. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al., editors. {TensorFlow}: a system for {Large-Scale} machine learning. 12th USENIX symposium on operating systems design and implementation (OSDI 16); 2016.
  15. 15. Erbel R, Eggebrecht H. Aortic dimensions and the risk of dissection. Heart. 2006;92(1):137–42. pmid:16365370
  16. 16. Hager A, Kaemmerer H, Rapp-Bernhardt U, Blücher S, Rapp K, Bernhardt TM, et al. Diameters of the thoracic aorta throughout life as measured with helical computed tomography. J Thorac Cardiovasc Surg. 2002;123(6):1060–6. pmid:12063451
  17. 17. Isselbacher EM. Thoracic and abdominal aortic aneurysms. Circulation. 2005;111(6):816–28. pmid:15710776
  18. 18. Salameh MJ, Black JH 3rd, Ratchford EV. Thoracic aortic aneurysm. Vasc Med. 2018;23(6):573–8. pmid:30370834
  19. 19. Mok SCM, Ma W-G, Mansour A, Charilaou P, Chou AS, Peterss S, et al. Twenty-five year outcomes following composite graft aortic root replacement. J Card Surg. 2017;32(2):99–109. pmid:27966257
  20. 20. Guo MH, Appoo JJ, Saczkowski R, Smith HN, Ouzounian M, Gregory AJ, et al. Association of Mortality and Acute Aortic Events With Ascending Aortic Aneurysm: A Systematic Review and Meta-analysis. JAMA Netw Open. 2018;1(4):e181281. pmid:30646119