Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Use of machine learning for early prediction of short-term mortality in veterans with metabolic dysfunction-associated steatotic liver disease

  • Lewis J. Frey ,

    Roles Conceptualization, Formal analysis, Software, Writing – original draft, Writing – review & editing

    lewis.frey@health.slu.edu

    Affiliations Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States of America, Saint Louis University, St Louis, Missouri, United States of America

  • Michael Fuchs,

    Roles Conceptualization, Writing – original draft, Writing – review & editing

    Affiliations Hunter Holmes McGuire VA Medical Center, Richmond, Virginia, United States of America, Virginia Commonwealth University, Richmond, Virginia, United States of America

  • Ralph C. Ward,

    Roles Conceptualization, Formal analysis, Writing – review & editing

    Affiliations Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States of America, Medical University of South Carolina, Charleston, South Carolina, United States of America

  • Mulugeta Gebregziabher,

    Roles Conceptualization, Writing – review & editing

    Affiliations Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States of America, Medical University of South Carolina, Charleston, South Carolina, United States of America

  • Ahmad Basil Nasir,

    Roles Conceptualization, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Saint Louis University, St Louis, Missouri, United States of America

  • Yamini Natarajan,

    Roles Conceptualization, Writing – review & editing

    Affiliation Kelsey-Seybold Clinic, Houston, Texas, United States of America

  • Andrew Schreiner,

    Roles Conceptualization, Writing – review & editing

    Affiliation Medical University of South Carolina, Charleston, South Carolina, United States of America

  • Don C. Rockey,

    Roles Conceptualization, Writing – review & editing

    Affiliation Medical University of South Carolina, Charleston, South Carolina, United States of America

  • Wing-Kin Syn

    Roles Conceptualization, Formal analysis, Software, Writing – review & editing

    Affiliations Ralph H. Johnson VA Medical Center, Charleston, South Carolina, United States of America, Saint Louis University, St Louis, Missouri, United States of America, Medical University of South Carolina, Charleston, South Carolina, United States of America, Department of Physiology, Faculty of Medicine and Nursing, University of Basque Country UPV/EHU, Leioa, Vizcaya, Spain

Abstract

Background

Metabolic dysfunction associated steatotic liver disease (MASLD) is a leading cause of chronic liver disease worldwide and affects >25% in the United States population. We hypothesized that clinical features present in electronic health records (EHR) could be extracted early to characterize patients with MASLD who are at high risk of early mortality and that machine learning models would predict mortality better than noninvasive assessments of liver disease/fibrosis.

Methods

Using previously published criteria for MASLD, applied to data from the US Veterans Affairs EHR, we identified a cohort of 13,071 patients between 2000 and 2018 who had an initial diagnosis of MASLD without clinical evidence of cirrhosis. We subsequently used machine-learning and conducted analysis of variance and logistic regression to identify clinical variables to characterize cirrhosis risk and predict mortality within the ensuing 5-years.

Results

The average age of the cohort was 60 years, had a BMI of 31, and 34% diabetes prevalence. Patients who progressed to cirrhosis were younger when first diagnosed with MASLD (56), had a higher BMI (33), and had significantly higher noninvasive fibrosis scores. Having diabetes at index MASLD diagnosis significantly increased the risk of developing cirrhosis and doubled the risk cirrhosis plus HCC (2.09 CI:1.217–3.63). Our machine-learning model performed significantly better than FIB-4 at predicting mortality within 5-years of being diagnosed with MASLD (AUC 83% vs 68%).

Conclusion

Our data suggest that machine learning models based on data extracted from the EHR early during MASLD can identify patients likely to develop cirrhosis and predict short term mortality.

Introduction

Metabolic dysfunction associated steatotic liver disease (MASLD)) is a significant and growing health problem that affects greater than 25% of the general population and is strongly associated with the metabolic syndrome (obesity, type 2 diabetes mellitus (T2DM), hypertension, and dyslipidemia) [1]. Up to 20% of those with MASLD may progress to Metabolic dysfunction associated steatohepatitis (MASH) [2], the more advanced stage of disease, which then puts them at risk of developing liver fibrosis, cirrhosis, and cirrhosis-associated complications such as hepatocellular carcinoma (HCC) and liver failure.

Liver enzymes, alanine aminotransferase (ALT) and aspartate aminotransferase (AST), are routinely used by primary care providers (PCP) to identify those likely to have MASH or MASH fibrosis/ cirrhosis (or HCC), but those measures have low sensitivity (i.e., true positive rate) and specificity (i.e., true negative rate), and those with normal ALT levels can span the range of disease severity in MASLD/ MASH (with or without fibrosis/cirrhosis) [3,4]. As a result, patients at high risk for progression to advanced disease stages (MASH, fibrosis, cirrhosis, and HCC) are often overlooked or misdiagnosed, while those with low risk for progression may conversely, be referred for treatment [5]. In recent years, simple noninvasive scores such as the fibrosis-4 (FIB-4) score [6,7], aspartate aminotransferase to platelet ratio index (APRI) [8], and NAFLD Fibrosis Score (NFS) [9], as well as more complex and commercially available tests such as the Fibrotest®, Fibrosure®, and enhanced liver fibrosis (ELF®) panel, have been developed with the aims of improving the detection of fibrosis. While all these tests generally exhibit excellent negative predictive values at cutoff extremes, they remain limited overall by their low positive predictive values, sensitivities, and accuracies.

The aim of this study was to examine readily available clinical features present in electronic health records to identify patients with MASLD who are at risk of early mortality. Moreover, we aimed to develop machine learning models to predict mortality that perform better than established noninvasive measures.

Methods

Data sources

We utilized data from the VA Informatics and Computing infrastructure (VINCI) to obtain a cohort of US Veterans. This includes both VA and VA/CMS data sources. We analyzed data from VA National Data Systems (NDS) and from the VA Information Resource Center (VIReC) using VINCI between 2/27/2019 and 2/26/2024. The authors did not have access to information that could identify individual patients during or after data collection. The Medical University of South Carolina (MUSC) institutional review board and the Ralph H. Johnson VA Research and Development Committee approved this study.

Dataset

We examined a cohort of Veterans meeting previously published criteria for MASLD between January 1, 2000 and January 1, 2018 and having a liver biopsy with at least 5 years of follow up or death within 5 years after first meeting Husain criteria [10]. The group was assessed for eligibility through no concurrent liver disease (ICD-9/10 code for hepatitis B virus (HBV), hepatitis C virus (HCV), alcoholic liver disease, autoimmune hepatitis, biliary cirrhosis, primary sclerosing cholangitis, disorders of iron metabolism, Wilson’s disease), or history of significant alcohol consumption (by ICD-9/10). Using previously published methodology [10], we identified Veterans in the cohort with MASLD using the ICD-9/10 codes, 571.8 and K76.0. Those without an ICD-9 code for MASLD were classified as having MASLD if they had at least two or more elevated ALT values (≥40 IU/ml) more than 6 months apart, with no evidence of positive serologic testing for HBV (HBV surface antigen) or HCV (HCV RNA), and no alcohol related ICD-9/10 codes or positive AUDIT-C scores within one year of elevated ALT. The VINCI platform contains information from annual AUDIT-C screen for alcohol use disorders, which has been used to screen over 90% of VA outpatients nationwide [11]. A score of ≥ 4 was used to identify active alcohol use in men and ≥3 was used in women [2]. Patients with cirrhosis or HCC at index MASLD diagnosis were excluded.

Variable definitions

Outcomes variables.

Cirrhosis: Cirrhosis was identified by the presence of a combination of ICD-9/10 codes for cirrhosis as previously described [12]. Veterans within the study cohort were considered to have cirrhosis if they were determined to meet these published criteria for cirrhosis using data from their medical records.

Hepatocellular carcinoma: The presence of hepatocellular carcinoma was assessed by ICD9/10 codes 155 and C22.0, respectively.

Disease Outcome Grouping: Patients were assigned to one of four groups according to the presence of cirrhosis and/or HCC as follows: those with no evidence either of cirrhosis or HCC (No CIRR/No HCC (control group)); those with cirrhosis but no evidence of HCC (CIRR); cirrhosis and HCC (CIRR-HCC); HCC without evidence of cirrhosis (HCC).

Mortality: Death was captured from the VA Vital Status file, which applies an algorithm to a combination of the BIRLS Death File, the Social Security Administration Death Master File, the Medicare Vital Status File and the VHA Medical SAS Datasets.

Predictor variables.

Clinical data within 1 year of the diagnosis of MASLD were used for this analysis. We use the term MASLD throughout but cite older literature using Non-Alcoholic Fatty Liver Disease (NAFLD). Fibrosis was assessed using the following noninvasive fibrosis measures: FIB-4 [6,7], NAFLD Fibrosis Score (NFS) [9], aspartate aminotransferase and alanine aminotransferase ratio (AAR), age-platelet index (AP) [13], APRI [8], and a composite of body mass index (BMI), AAR and diabetes (BARD) score, which is the sum of binary values: 1 point for BMI ≥ 28, 2 points for AAR ≥ 0.8, and 1 point for diabetes [14]. Complete demographic information including age, gender, and race/ethnicity as well as clinical data were captured (comorbidities occurring prior to the diagnosis of MASLD such as diabetes, hypertension and dyslipidemia were collected using ICD9/10 codes).

Analysis methods.

Descriptive analysis: For the four groups, mean, standard error (SE) and n are reported for the following quantitative variables: age, BMI, FIB-4, NFS, AAR, AP and APRI (Table 1). SAS was used to conduct analysis of variance (ANOVA) for each quantitative dependent variable with group as the independent variable. The significance level was set to p < 0.05. All six pairwise comparisons were examined with Bonferroni correction for multiple comparisons. Multinomial logistic regression was used for dichotomous variables of gender, race, diabetes, hypertension, dyslipidemia and cumulative logistic regression was used for BARD. A multinomial regression analysis was performed comparing the CIRR, CIRR with HCC and HCC groups against the control group (No CIRR/No HCC). We assessed for collinearity and examined correlation of variables > 0.8 and variance inflation (VIF) through checking if VIF > 10. We removed variables from the multinomial analysis if they had collinearity that would impact model fitting.

thumbnail
Table 1. Demographics, clinical features, and noninvasive fibrosis assessment.

https://doi.org/10.1371/journal.pone.0334715.t001

Feature Selection: A feature ranking algorithm was run to order features in a list from most relevant to least at predicting mortality on the training data. We used a gradient tree boosting machine model for our feature ranking analysis through the application of extreme gradient boosting (XGBoost) to run the regression trees to predict mortality [15]. We used a unified framework for interpreting the impact of features on the predictions based on Shapley additive explanations (SHAP) that generated a ranking of features using SHAP values [16].

Starting with the highest ranked features, the features were incrementally added to a random forest classifier to determine if they improved classification performance. Random forests use a collection of tree-structured classifiers where each tree casts a vote for the most popular class given the input vector. If a feature improved performance, it was selected to be part of the model and the next highest ranked feature was added to the model and assessed for performance improvements. If a feature did not improve performance, it was excluded from the model.

Predictive Analysis: We compared random forest classifiers with feature selection to FIB-4 as this is widely used to estimate risks of advanced fibrosis/ cirrhosis, HCC, and mortality [7,17]. We used the python xgboost package to run XGBoost for feature ranking by SHAP values, the sklearn package to run the random forest classifier for predicting the classes of instances using default hyperparameters unless other values are specified, and lifelines for the Kaplan Meier analysis. The following binary predictions are compared: whether Veterans died within five years when criteria for MASLD were first present (survived n = 10,665, died n = 2,406). The receiver operator characteristic (ROC) curve was used to calculate the area under the curve (AUC) to compare performance across models. The random forest classifier was trained on a balanced training set composed of two thirds of the minority category and an equal number randomly selected from the majority category. The test set consisted of the remaining holdout data from both categories. We assessed predictions on the holdout test sets for both FIB-4 and random forest for each of the mortality predictions to generate the ROC curves. SAS System Version 9.3 (Cary, North Carolina) was used to conduct statistical analyses. Unless stated otherwise, alpha is set to 0.05.

Results

A total of 16,930 patients were identified as meeting criteria for MASLD. Patients with the following features were excluded: ALT or AST > 250 (n = 662), missing BMI, AST, ALT, platelet count, or albumin levels (n = 2,839), missing race/ethnicity (n = 122), and codes for cirrhosis or HCC at the time of study entry (n = 236) (Fig 1). The exclusion of AST or ALT greater than 250 is to exclude acute hepatitis flares or other liver diseases not representative of chronic MASLD. Patients with incomplete data for BMI, AST, ALT, platelet counts, race, or ethnicity were excluded from the final analysis, and no data imputation was conducted. Clinical data within 1 year of the diagnosis of MASLD were used for this analysis (and on average were within one month). This provided a snapshot of disease characteristics at the time of first MASLD diagnosis.

The overall cohort was primarily older white males (Table 1). The mean age associated with first diagnosis of MASLD was significantly older for the CIRR-HCC and HCC groups and significantly younger for the CIRR group compared to control (Table 1). All groups were obese (BMI > 30 kg/m2) with a higher BMI for those with cirrhosis compared with control. There were significantly fewer non-Hispanic black (NHB) with CIRR and more non-Hispanic white (NHW) with CIRR compared with control. Diabetes was significantly associated with CIRR, CIRR-HCC and HCC compared with control. Hypertension was significantly higher in CIRR-HCC and HCC compared with control. Dyslipidemia was higher in the control group compared with CIRR and CIRR-HCC. At the time of first MASLD diagnosis, FIB-4, NFS, AAR, AP, and BARD were significantly higher in CIRR, CIRR-HCC, and HCC groups compared with the control. APRI was significantly higher for CIRR and CIRR-HCC. Based on noninvasive tests of fibrosis, patients with CIRR-HCC had the most liver fibrosis.

Multinomial regression analysis

Multinomial regression demonstrated that CIRR patients were younger and HCC patients were older than control group patients when first meeting MASLD criteria (Table 2). NHB were half as likely as NHW of being in the CIRR group and one-third as likely of being in CIRR-HCC. Women were one-tenth as likely to be in the CIRR-HCC group. Having diabetes when criteria for MASLD were first present significantly increased the risk of CIRR, CIRR-HCC, and HCC, doubling the risk of CIRR-HCC. Dyslipidemia was more likely in the control group when first meeting MASLD diagnosis criteria compared with the other groups. BARD values were significantly higher than control for the CIRR and CIRR-HCC groups. The variable APRI had a correlation of 0.8 with FIB4 and was excluded from the model. All other variables met the correlation and VIF cutoffs.

thumbnail
Table 2. Multinomial Regression Analysis of features associated with CIRR, CIRR-HCC or HCC.

https://doi.org/10.1371/journal.pone.0334715.t002

Clinical features important in prediction of 5-year mortality

Selection of relevant demographic, clinical, and noninvasive fibrosis assessment features were ranked by their impact on the training performance of the predictive model (Fig 2). Higher values of age and lower values of albumin and BMI increased the likelihood of the model predicting death within five years. Diabetes increased the probability of predicting death through the composite variable BARD. The ranking of features was determined through an evaluation of performance based on the training set that consisted of 3,224 cases balanced between the majority and minority classes (alive at 5-years n = 1,584 and dead within 5-years n = 1,640). In brief, features are sorted by SHAP relevance values and then incrementally added to a random forest classifier with 1000 trees using only the training data in 5 cross fold split. The features that increase AUC performance are added to the set of features to be used in the final model. The SHAP value plot in Fig 2 has the ranked order of the selected features with age being at the top. The position on the x-axis indicates the degree of impact on the model with red being high values and blue being low values for the feature. The plot shows older age (in red) was associated with increased risk of death and lower albumin values (in blue) also were associated with increased risk of death. The random forest used the selected features from Fig 2 in the final model and is trained on the same training data and evaluated on a holdout test set. The holdout test set consisted of one-third of cases of those who died within 5-years (n = 766) and the remaining cases were those alive at five years (n = 9,081) for a total test set of 9,847 cases.

thumbnail
Fig 2. The SHAP values for the selected variables that improve training accuracy in the nested cross validation.

SHAP values higher than 0.0 move the classifier to predict death while points with lower SHAP values move the classifier to predict alive. The red color indicates values were high and blue indicates values were low. In the case of age, higher values (red) indicating older patients moved the classifier in the direction of predicting death, while lower values (blue) of albumin moved the classifier to predicting death. The variables considered included age, sex, bmi, race, ethnicity, NFS, AAR, AP, APRI, BARD, alt, ast, diabetes, hypertension, and dyslipidemia.

https://doi.org/10.1371/journal.pone.0334715.g002

Predictive models of mortality

The trained model based on the relevant features from Fig 2 were used for the evaluation of the random forest model on the holdout test set. For predicting mortality within five years of first identification of MASLD, the random forest model (0.83 AUC) was significantly better than FIB-4 (0.68 AUC) due to the higher sensitivity of the random forest (See Fig 3, Table 3). The mortality assessed was all-cause mortality, but the results hold when filtering for only cirrhosis and HCC patients.

thumbnail
Table 3. The AUC performance of models along with sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for the random forest classifier and FIB-4 on the test set. The FIB-4 threshold of 2.67 is used for FIB-4 classification.

https://doi.org/10.1371/journal.pone.0334715.t003

thumbnail
Fig 3. AUC comparison of random forest and FIB-4 at predicting mortality within five years of meeting MASLD criteria on holdout test set.

The models are significantly different at p < 0.05 (Chi-Square).

https://doi.org/10.1371/journal.pone.0334715.g003

Comparing the Kaplan Meier plots for the low risk groups of random forest and FIB-4 the higher sensitivity of the random forest results in the low risk group having significantly fewer deaths over the five years compared with FIB-4 (See Fig 4). The high risk groups for random forest and FIB-4 overlap and are not significantly different by the Log-Rank Test.

thumbnail
Fig 4. Kaplan Meier plots of random forest (RF) and FIB-4 high and low risk categories on holdout test set.

The RF and FIB-4 Low Risk models are significantly different at p < 0.05 (Log-Rank Test), while the High Risk models are not significantly different.

https://doi.org/10.1371/journal.pone.0334715.g004

Discussion

The pathogenesis of MASLD is complex and heretofore, no single clinical measure is sensitive to predict which patients will die within five years of first meeting criteria for having a MASLD diagnosis. Here we have demonstrated that machine learning using random forest with SHAP feature extraction is sensitive and performed better than FIB-4 at predicting mortality after meeting MASLD criteria. The random forest is consistent with predicting those who die early in the five years and those who die later as shown in the Kaplan Meier plot. The FIB-4 low risk model is less consistent over the five years and miss identifies those who will die early in the five year window as low risk. The current study shows that the use of clinical and laboratory measurements in combination help identify individuals with MASLD who are at risk of mortality. The results further indicate that there is information in the EHR that might be utilized to inform early clinical decision making – specifically, the signal in EHR recorded non-invasive measures might be used to prioritize patients for more aggressive screening and/or evaluation. The approach moves the predictions to early MASLD diagnosis and extends the use of some of the same measures and techniques as those used for predicting stages of MASLD and MASLD-related cirrhosis for patients undergoing liver biopsy [18]. Hence, machine learning can function along a continuum of disease progression starting early with first MASLD diagnosis. We were somewhat surprised by the finding that FIB-4 was relatively insensitive in predicting mortality. A possible reason for the poor performance of FIB-4 compared to the random forest is the SHAP feature selection utilized early identification of a lower albumin and lower BMI values to predict mortality (See position of blue values for albumin and BMI compared with high red values for age in Fig 2), but the commonly used FIB-4 [7,17] does not incorporate these features to increase the risk factor score. We used XGBoost as a feature selection algorithm given its integration in the SHAP tooling for displaying feature relevance. The use of SHAP was to provide a way to graphically present the importance of variables and consider how such information could be used in a future dashboard for managing patients. Fig 2 shows SHAP values for XGBoost model where the features were selected for use in the RF model. Clinically, hypoalbuminemia is a marker of impaired hepatic synthetic function and poor nutritional status, both associated with adverse outcomes. Similarly, while obesity is a risk factor for MASLD progression, low BMI in this context may reflect sarcopenia or frailty, which are known predictors of poor prognosis in chronic liver disease. A benefit of the random forest method is improved sensitivity and the use of noninvasive scores along with other EHR data as a means of screening for patients with higher risk of mortality. This could provide an improved method to assess which patient would benefit from access to additional resources to track risk of mortality. Liver biopsy is invasive, costly, with the potential risk of complications, so providers currently use various clinical parameters and/or noninvasive tools to help with disease stratification [19,20]. To date however, these tools don’t focus on the earliest point of MASLD identification to determine risk of mortality in a five year window. The random forest models with SHAP feature selection indicate which variables are relevant to identify progression to mortality. Instead of getting arbitrary snapshots of disease severity at points in time, the method uses frequently tracked variables at the earliest point so that patients can be proactively treated.

Consistent with prior reports, we observed that those with HCC were significantly older than those without HCC, and that there were fewer NHB compared with NHW with CIRR [21]. Diabetes is a major risk factor for MASLD progression and is associated with nearly 4-fold increased risk of HCC in patients with MASH cirrhosis [22]. In a real-world study of 18 million European adults, the presence of diabetes was the strongest independent predictor for HCC [23]. In this study, we similarly noted that those with diabetes were more likely to have CIRR, CIRR-HCC, and HCC compared with controls, thus validating the clinical relevance of this study cohort.

The identification of a substantial proportion of patients with HCC but without cirrhosis was remarkable, but consistent with other data [24]. Notably, in an administrative dataset such as the one used here, it is difficult to ascertain whether the non-cirrhotic HCCs developed on a background of early fibrosis (i.e. F1), or whether there was significant or advanced stage fibrosis (i.e. F2-3) where the majority had arisen from livers with advanced fibrosis (i.e. F3). Recent studies from both the VHA and the general population however, have shown that up to 30–40% of those with MASLD-associated HCC may have arisen from a non-cirrhotic liver [24]. Mechanistic studies have also reported that obesity associated oxidative stress may directly modulate HCC development and sarcopenia has been associated with HCC development [25,26]. Therefore, prospective studies with greater representation of the general population will be needed to determine the true prevalence of non-cirrhotic HCCs in the US.

One of the more remarkable findings of this study was that non-Hispanic black Veterans were half as likely to develop cirrhosis associated with MASLD (0.502 CI:0.391–0.644) and one-third as likely to develop cirrhosis and HCC together (0.294 CI:0.128–0.674) as were other populations (Table 2). Additionally, women were one-tenth as likely to develop both cirrhosis and HCC together (0.108 CI:0.015–0.784). These findings may reflect protective biological mechanisms, such as genetic polymorphisms in NHB individuals or the influence of estrogen in women, both of which have been implicated in slower fibrosis progression. Sociodemographic factors, including differences in comorbidity burden, healthcare access, and patterns of screening, may also contribute. However, the exact reasons for these disparities remain unclear and warrant further study in larger, more diverse populations.

Our findings suggest potential clinical applications for predictive models in early management of MASLD. By enabling risk stratification at the time of diagnosis, the model could help identify patients at higher risk of mortality who may benefit from closer follow-up. Our roadmap for future work includes assessment of decision-support dashboards that could support clinicians in real time. In our new VA Merit, we will examine the different outcomes of mortality based on cardiovascular and liver related events along with predicting fibrosis stage. The VA Merit has an assessment of an augmented dashboard to provide longitudinal risk of MASLD progression including survival analysis using Cox proportional hazards models. Hyperparameter tuning of the models will be done in future work to identify how performance can be improved by systematically assessing parameter settings of the models. The dashboard will be assessed by primary care physicians and hepatologists to examine if the information in the predictive models can improve referral of patients to specialty MASLD clinics. Based on a successful phase 3 clinical trial [27], the FDA has approved semaglutide, a glucagon-like peptide-1 (GLP-1), for the treatment of patients with MASH and stage 2 or 3 fibrosis. The MASLD dashboard will be timely to help identify patients who could benefit from the treatment. The earlier patients can be identified, the more options they will have available to them to slow progression of the disease and reduce the risk of mortality associated with MASLD.

For our pilot study, we decided to use as much data as we could for the assessment of RF and FIB-4, which is commonly used to assess severity of MASLD, which resulted in a larger percentage of survivor cases in the test set than in the full cohort. The imbalance could affect the positive predictive values for both RF and FIB-4. In future we will analyze the data with the test set having the same prevalence of death as the full cohort. The cohort was derived from U.S. Veterans, who are older, predominantly male, and more comorbid than the general population, which may limit generalizability. Due to dataset access restrictions and the specificity of our population, external validation could not be performed at this stage. The model is applicable to the VA population having been trained on it, additional work will need to be done to generalize the model outside the VA and have an external validation data set. The team is working on validating the approach on patients in the SSM Health system that treats over eight million patients across four states. While the VA has a 12.6% female population in large cohort studies [28], SSM Health has a 51% female population which makes it ideal to test the generalizability of the approach. Like the VA, SSM Health has a virtual data warehouse for developing and validating predictive models. The system has over five million patients with demographics, labs, vitals, and comorbidities. The effort to demonstrate generalizability through external validation with a more representative population is ongoing work.

Our study is limited by the retrospective design and the use of administrative tools to identify those with MASLD and MASLD-associated complications. For example, the use of elevated liver enzymes likely under-estimated the true population with MASLD. It is also possible that we could have underestimated the number of patients who developed cirrhosis. However, we used previously published data to identify cirrhosis in administrative data [12]. The number of those with HCC could also have been higher had we included codes for HCC interventions such as transarterial chemoembolization or radiofrequency ablation. Another limitation of our study is that we have applied our approach to a specific population that was selected for MASLD. Nevertheless, this cohort was sufficiently large to demonstrate consistency with published reports, [21,23,24] thus validating its utility and findings.

In summary, our data and models demonstrate that information exists in the medical record when an initial diagnosis of MASLD can be made that is associated with the development of cirrhosis and predicts mortality within five years. The random forest model is better than FIB-4 at predicting risk of mortality both early and late in the five years after first diagnosis of MASLD. Prospective studies will be needed to validate such noninvasive machine learning models to predict progression to mortality with MASLD.

References

  1. 1. Kim D, Cholankeril G, Loomba R, Ahmed A. Prevalence of Nonalcoholic Fatty Liver Disease and Hepatic Fibrosis Among US Adults with Prediabetes and Diabetes, NHANES 2017-2018. J Gen Intern Med. 2022;37(1):261–3. pmid:33674915
  2. 2. Kanwal F, Kramer JR, Duan Z, Yu X, White D, El-Serag HB. Trends in the Burden of Nonalcoholic Fatty Liver Disease in a United States Cohort of Veterans. Clin Gastroenterol Hepatol. 2016;14(2):301-8.e1-2. pmid:26291667
  3. 3. Fracanzani AL, Valenti L, Bugianesi E, Andreoletti M, Colli A, Vanni E, et al. Risk of severe liver disease in nonalcoholic fatty liver disease with normal aminotransferase levels: a role for insulin resistance and diabetes. Hepatology. 2008;48(3):792–8. pmid:18752331
  4. 4. Mofrad P, Contos MJ, Haque M, Sargeant C, Fisher RA, Luketic VA, et al. Clinical and histologic spectrum of nonalcoholic fatty liver disease associated with normal ALT values. Hepatology. 2003;37(6):1286–92. pmid:12774006
  5. 5. Nielsen EM, Anderson KP, Marsden J, Zhang J, Schreiner AD. Nonalcoholic fatty liver disease underdiagnosis in primary care: what are we missing?. J Gen Intern Med. 2022;37:2587–90.
  6. 6. Sterling RK, Lissen E, Clumeck N, Sola R, Correa MC, Montaner J, et al. Development of a simple noninvasive index to predict significant fibrosis in patients with HIV/HCV coinfection. Hepatology. 2006;43(6):1317–25. pmid:16729309
  7. 7. Shah AG, Lydecker A, Murray K, Tetri BN, Contos MJ, Sanyal AJ, et al. Comparison of noninvasive markers of fibrosis in patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2009;7(10):1104–12. pmid:19523535
  8. 8. Wai C-T, Greenson JK, Fontana RJ, Kalbfleisch JD, Marrero JA, Conjeevaram HS, et al. A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis C. Hepatology. 2003;38(2):518–26. pmid:12883497
  9. 9. Angulo P, Hui JM, Marchesini G, Bugianesi E, George J, Farrell GC, et al. The NAFLD fibrosis score: a noninvasive system that identifies liver fibrosis in patients with NAFLD. Hepatology. 2007;45(4):846–54. pmid:17393509
  10. 10. Husain N, Blais P, Kramer J, Kowalkowski M, Richardson P, El-Serag HB, et al. Nonalcoholic fatty liver disease (NAFLD) in the Veterans Administration population: development and validation of an algorithm for NAFLD using automated data. Aliment Pharmacol Ther. 2014;40(8):949–54. pmid:25155259
  11. 11. Lapham GT, Achtmeyer CE, Williams EC, Hawkins EJ, Kivlahan DR, Bradley KA. Increased documented brief alcohol interventions with a performance measure and electronic decision support. Med Care. 2012;50(2):179–87. pmid:20881876
  12. 12. Nehra MS, Ma Y, Clark C, Amarasingham R, Rockey DC, Singal AG. Use of administrative claims data for identifying patients with cirrhosis. J Clin Gastroenterol. 2013;47(5):e50-4. pmid:23090041
  13. 13. Poynard T, Bedossa P. Age and platelet count: a simple index for predicting the presence of histological lesions in patients with antibodies to hepatitis C virus. METAVIR and CLINIVIR Cooperative Study Groups. J Viral Hepat. 1997;4(3):199–208. pmid:9181529
  14. 14. Harrison SA, Oliver D, Arnold HL, Gogia S, Neuschwander-Tetri BA. Development and validation of a simple NAFLD clinical scoring system for identifying patients without advanced disease. Gut. 2008;57(10):1441–7. pmid:18390575
  15. 15. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. 785–94.
  16. 16. Lundberg L. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst [Internet]. 2017. 30. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
  17. 17. Bertot LC, Jeffrey GP, de Boer B, MacQuillan G, Garas G, Chin J, et al. Diabetes impacts prediction of cirrhosis and prognosis by non-invasive fibrosis models in non-alcoholic fatty liver disease. Liver Int. 2018;38(10):1793–802. pmid:29575516
  18. 18. Chang D, Truong E, Mena EA, Pacheco F, Wong M, Guindi M, et al. Machine learning models are superior to noninvasive tests in identifying clinically significant stages of NAFLD and NAFLD-related cirrhosis. Hepatology. 2023;77(2):546–57. pmid:35809234
  19. 19. Puri P, Fuchs M. Population Management of Nonalcoholic Fatty Liver Disease. Fed Pract. 2019;36(2):72–82. pmid:30867627
  20. 20. Shetty A, Syn W-K. Health and Economic Burden of Nonalcoholic Fatty Liver Disease in the United States and Its Impact on Veterans. Fed Pract. 2019;36(1):14–9. pmid:30766413
  21. 21. Kim D, Kim W, Adejumo AC, Cholankeril G, Tighe SP, Wong RJ. Race/ethnicity-based temporal changes in prevalence of NAFLD-related advanced fibrosis in the United States, 2005--2016. Hepatol Int. 2019;13:205–13.
  22. 22. Hossain N, Afendy A, Stepanova M, Nader F, Srishord M, Rafiq N, et al. Independent predictors of fibrosis in patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2009;7(11):1224–9, 1229.e1-2. pmid:19559819
  23. 23. Alexander M, Loomis AK, van der Lei J, Duarte-Salles T, Prieto-Alhambra D, Ansell D, et al. Risks and clinical predictors of cirrhosis and hepatocellular carcinoma diagnoses in adults with diagnosed NAFLD: real-world study of 18 million patients in four European cohorts. BMC Med. 2019;17(1):95. pmid:31104631
  24. 24. Ertle J, Dechêne A, Sowa J-P, Penndorf V, Herzer K, Kaiser G, et al. Non-alcoholic fatty liver disease progresses to hepatocellular carcinoma in the absence of apparent cirrhosis. Int J Cancer. 2011;128(10):2436–43. pmid:21128245
  25. 25. Grohmann M, Wiede F, Dodd GT, Gurzov EN, Ooi GJ, Butt T, et al. Obesity Drives STAT-1-Dependent NASH and STAT-3-Dependent HCC. Cell. 2018;175:1289–306.e20.
  26. 26. Feng Z, Zhao H, Jiang Y, He Z, Sun X, Rong P, et al. Sarcopenia associates with increased risk of hepatocellular carcinoma among male patients with cirrhosis. Clin Nutr. 2020;39(10):3132–9. pmid:32057535
  27. 27. Sanyal AJ, Newsome PN, Kliers I, Østergaard LH, Long MT, Kjær MS, et al. Phase 3 Trial of Semaglutide in Metabolic Dysfunction-Associated Steatohepatitis. N Engl J Med. 2025;392(21):2089–99. pmid:40305708
  28. 28. Frey LJ, Gebregziabher M, Bishu KG, Youngblood B, Obeid JS, Shi J, et al. Multimorbidity Burden in Veterans with and Without Type 2 Diabetes Mellitus: A Comparative Retrospective Cohort Study. Diabetology. 2025;6(9):88.