Figures
Abstract
The global rise in prescription opioid use has contributed to an opioid epidemic, associated harms, and unintentional deaths in several western countries. Opioids however continue to be regularly prescribed for acute pain and in the chronic pain context due to limited treatment options. Currently there are no accurate tools that help predict which patients prescribed opioids may be at risk of death, which depends on the cultural context and varies across countries. Existing models do not account for statistical considerations such as censoring and competing risks. Using nationally representative data from the United Kingdom from 1,026,139 patients newly prescribed an opioid, we developed three competing risk time-to-event models: a regression model, a random forest, and a deep neural network to predict opioid-related deaths using UK primary care records. The models were externally validated in an external cohort of 337,015 patients. The models exhibited good discrimination and positive predictive value during internal validation (C-statistic for the regression model, random forest, and neural network: 84.3%, 84.4% and 82.1% respectively), and external validation (C-statistic for the regression model, random forest, and neural network: 81.8%, 81.5% and 81.5% respectively). Prior substance abuse, lung and liver comorbidities, morphine, fentanyl, or oxycodone at initiation and co-prescription of gabapentinoids were some of candidate predictors associated with a higher risk of opioid-related mortality within the models. These results demonstrate how routinely collected data from a nationally representative dataset may be used to develop and validate opioids risk algorithms to better help clinicians and patients predict risk to this serious adverse outcome.
Author summary
The rising use of prescription opioids has led to serious health concerns, including a growing number of preventable deaths. Yet, opioids remain a common treatment for pain, leaving doctors and patients with a difficult balance between benefits and risks. There are no accurate tools to identify which patients prescribed opioids for non-cancer pain may face a higher risk of dying from them. Using electronic health records from over one million people in the United Kingdom who were newly prescribed an opioid, we created and tested three different prediction models, including different types of machine learning. We were able to accurately define opioid-associated deaths from information presented on death certificates. These models used information such as medical history, other medications taken at the same time, and the type/ dose of opioid prescribed to estimate an individual’s risk of opioid-related death. We found that all three models performed well, both when tested on the original data and when tested on a separate group of patients. We developed and validated risk prediction tools to predict the most serious adverse event to opioids for the first time using nationally representative data. These could help guide safer prescribing decisions and support conversations between patients and healthcare professionals about opioid use.
Citation: Benitez-Aurioles J, Raul Ramirez Medina C, Jenkins D, Peek N, Jani M (2026) Development and evaluation of machine learning algorithms for the prediction of opioid-related deaths among UK patients with non-cancer pain. PLOS Digit Health 5(1): e0001190. https://doi.org/10.1371/journal.pdig.0001190
Editor: Sulaf Assi, Liverpool John Moores University - City Campus: Liverpool John Moores University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: August 11, 2025; Accepted: December 23, 2025; Published: January 27, 2026
Copyright: © 2026 Benitez-Aurioles et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used for this paper are available through The Clinical Practice Research Datalink (CPRD) (https://www.cprd.com/, contact for data queries: enquires@cprd.com) for researchers who meet criteria for access to confidential data. The researchers received no special privileges in accessing data that others would not have had.
Funding: This work was funded by a National Institute for Health and Care Research (NIHR) Advanced Fellowship (NIHR301413), awarded to MJ, which funds MJ and CRRM’s salary as well as data costs for the project. The views expressed in this publication are those of the authors and not necessarily those of the NIHR, NHS or the UK Department of Health and Social Care. JBA is the receipt of the studentship awards from the Health Data Research UK-The Alan Turing Institute Wellcome PhD Programme in Health Data Science (Grant Ref: 218529/Z/19/Z). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
For the past 20 years, there have been concerns of a considerable increase in the prescription of opioids for non-cancer related pain in countries such as US Canada and the UK [1–6]. Accompanying this surge, the clinical awareness of opioid-related adverse effects has increased, in line with the increase in prescribing. The concurrent growth of opioid consumption, both legal and illegal, and opioid-related harms has been termed the opioids epidemic and has been a leading cause of death for young adults in the US. Reducing opioid prescribing for non-cancer pain to reduce premature mortality and other associated harms is a National Health Service (NHS) England medicines optimisation priority for 2024/25 [7]. Despite national efforts to prevent harm, the UK continues to experience very high rates of drug related deaths, with recent evidence suggesting that UK mortality is worse than other high-income countries [8]. Age standardised mortality rate for deaths in the UK related to drug poisoning has risen every year since 2012, with just under half of all drug-related deaths registered in 2024 confirmed to involve an opioid [9].
Previous studies have identified factors associated with opioid-related adverse events including the type of opioids prescribed, concurrent medication, demographics, and several mental disorders on a population level [10–13]. Given limited treatment options for pain management, opioids continue to be prescribed for acute and chronic pain. However, it is not currently possible to predict which patients may develop the most serious adverse outcomes. Being able to do so could provide prescribers the ability to tailor more effective care by determining the need for heightened monitoring, the co-prescription of risk-mitigating medication such as naloxone, re-evaluation of treatment plans or prioritisation of effective biopsychosocial interventions if the risk is too high to prescribe such drugs at all. There has been increasing interest in developing prognostic clinical prediction models for outcomes for opioid associated harms, yet nearly all have been in North America using administrative data and none have yet been widely implemented [14,15].
Supervised machine learning (ML) methods leverage large amounts of data to extract complex relationships between variables [16]. ML could be used for predicting adverse outcomes for individual patients by harnessing the large databases of patient information being generated in healthcare systems, such as in electronic health records (EHRs). In the past decade, the application of ML in clinical risk prediction has expanded, in some cases achieving better performance than regression models in areas such as emergency admission or readmission risk [17–19]. Additionally, ML methods could potentially better estimate the risk of rarer outcomes by better modelling the risk profile of individuals with multiple risk factors. On the other hand, whether ML models outperform regression models for clinical prognosis in general has been put into question, with potentially biased performance estimates found in the reporting of many ML models [20].
Of particular interest are the new developments of time-to-event ML models [21]. When predicting long-term outcomes, not censoring time when a patient leaves the study, e.g., due to being lost to follow-up, can introduce biases into a model’s predictions [22]. Time-to-event models, such as a Cox regression models allows the estimation of the probability of the event over time. Such models are important considerations in opioids prognostic research, as unlike traditional classification models that predict a binary outcome (e.g., event/no event), time-to-event models estimate the risk of an event occurring at different time points. In addition, competing risk models [23] consider the potential misestimation of the primary outcome due to the censorship of competing risk events, such as deaths unrelated to opioids. If a patient dies from another cause (a competing risk), they can no longer experience opioid-associated death. Therefore, ignoring competing risks may inflate the estimated risk because the model treats individuals who die from other causes as if they were still at risk of opioid-related death. Compared to regression models, competing risk time-to-event ML models provide the possibility to study non-linear and complex relationships between variables, associating patterns in a patient’s data with non-proportional survival curves [24–27]. A recent systematic literature review on machine learning applications in predicting opioid-associated adverse events to date reported that time-to event or competing risk models were not performed and an area for future development [15].
While there is existing research on the application of ML in predicting adverse opioid outcomes, the use of ML time-to-event analysis for this task has not yet been explored. Additionally, previous ML opioids outcome work has mostly been carried out in US patient data, where the cultural context of opioid prescribing and opioid-related adverse events are different from other countries, such as the UK.
Our study aims to evaluate the potential of statistical modelling and ML algorithms in predicting opioid-related deaths among patients initiating opioid use in the UK, and to determine whether regression or ML models perform better when predicting mortality risk due to opioids. The results of this study could help prescribers identify those at the highest risk of opioid-related deaths and allow patients to make better informed decisions based on their individual risk.
Methods
Data source
In this retrospective cohort study, we used data from the Clinical Practice Research Datalink (CPRD) Gold (between 1st of January 2006 and 31st of Dec 2017) and Aurum (between January 1st, 2015, to October 31st, 2021) for model training and validation respectively. CPRD is a longitudinal database of anonymized primary care electronic health records (EHRs) from over 14 million patients registered with a GP in the UK [28,29]. CPRD GOLD contains data contributed by practices using Vision software, whilst CPRD AURUM contains data from the EMIS Web electronic patient record system software. The demographic composition of the CPRD population is representative of the UK population with regards to age, sex, and ethnicity. CPRD contains both diagnostic and electronic prescribing information for each patient, recorded through Read codes and prescribing codes. A Read Code is a standardised clinical coding system used in UK primary care to represent diagnoses and symptoms, while a prescription code is a standardised identifier used to classify and record prescribed medications within EHRs. CPRD also allows individual linkage to death records of the Office of National Statistics (ONS). Cause of death information was specifically requested and approved for this study, which captures this information from the patient’s death certificate. Additionally, the data were linked to the Townsend Deprivation Index, a score of the deprivation index derived from the patient’s postcode [30].
Study population and design
Adults over the age of 18 who were new users of opioids without cancer from CPRD Gold were included. New patients taking opioids were defined as patients who had not been prescribed opioids in at least two years preceding the incident (or first) opioid prescription. The index date was defined as the date the first opioid was prescribed. Individuals who had a history of cancer within the previous ten years prior to opioid initiation, were not included in the analysis, due to distinct opioid prescribing mechanisms and a different baseline risk of death. To do this we excluded patients with a Read codes for any malignancy with the exception of those with non-melanoma skin cancer prior to their index opioid prescription, as previously described [3,5]. Additionally, patients who were prescribed an opioid within six months of a cancer diagnosis were excluded to reduce the risk of protopathic bias, when an opioid may be prescribed inadvertently for an early manifestation of cancer before it has been detected diagnostically (e.g., pain or cough preceding a diagnosis of cancer). Patients who were prescribed methadone or oral buprenorphine either two years before or at the index date were also excluded, as patients in the UK are often prescribed these drugs to treat opioid use disorder secondary to recreational/ illicit use, for which the baseline risk markedly different than that of the general population. Buprenorphine patches however are frequently used in the U.K. for pain management and therefore included. We did not rely on opioid use disorder diagnosis codes for exclusion, as it is inconsistently coded in CPRD and medication records offer a more reliable indicator of active treatment. CPRD measures electronic prescribing data, and all other opioid prescriptions were considered in the analysis, to make the analysis and population of interest as inclusive as possible. Patients with data considered not to be up-to-standard according to the quality checks performed by CPRD were excluded. The study design is shown in Fig 1. The CPRD drug exposure data were transparently prepared using a previously published ‘drug preparation algorithm’ [31], with decisions made for this work outlined in S1 Text.
The cohort includes patients at opioid initiation. The candidate predictors are assessed at the time of the first prescription (type of opioid, demographics), and by using a two-year lookback period on their primary care records. The regression and machine learning models are trained to predict the cumulative incidence functions of the time-to-event outcomes, and are validated internally and in a second, external dataset.
Patients were censored at their death, if transferred out of the practice, two years after the index date, or when the practice ceased meeting CPRD data-quality standards. ONS linkage was used to identify both the patient death date and cause of death. Opioid-induced deaths were identified using ICD-10 codes and classified as such if any of these codes appeared as either the underlying cause or a contributory cause on the death certificate. Opioid associated deaths are underrepresented using ONS data as coroners’ reports (if conducted at all) are not available through any means for such primary care data. Restricting classification to the underlying cause alone or using a narrower ICD-10 definition would substantially reduce the number of outcomes and limit statistical feasibility. Deaths not attributable to opioids were treated as competing events. Full codelists are provided in S1 Table.
To reduce the likelihood of overfitting to the data, we performed a sample size estimation for a time-to-event model as developed by Riley et al [32], using as performance baseline the survival model trained by Glanz et al [33] (S1 Text). This calculation is intended to give the maximum number of parameters that a regression model can use without risking overfitting the model to the training data. For our study, the maximum number of parameters recommended to be used was 216, more than three times the candidate parameters considered by our models.
Predictor variables were chosen based on prior scientific literature and clinical relevance. They included patient demographics, comorbidities (included within the Charlson comorbidity score), other types of prescriptions such as benzodiazepines and gabapentinoids, healthcare utilisation history, type of opioid being prescribed at initiation and morphine milligram equivalents (MME) per day at initiation (S1 Text). MME/day was calculated using cumulative daily dose of opioid multiplied by potency of the opioid according to per the conversion ratios specified by the U.S. Centre for Disease Control and Prevention [34]. Diagnoses were extracted from Read codes during a baseline assessment period and additional medications from Product codes, two years prior to the incident opioid prescription. Missing values (Ethnicity and Townsend score in CPRD Gold, and Ethnicity and Region of the Practice in CPRD Aurum) were handled using multiple imputation with chained equations (MICE) [35], using the MICE package in R [36]. Ethnicity and deprivation scores are known to be variables with a higher proportion of missingness in CPRD, therefore multiple imputation was performed to reduce potential bias associated with a complete case analysis. Imputation was performed within each cross-validation fold. For each fold, a single imputed dataset was generated with 50 iterations per chain. To account for the possibility that missingness itself was informative, we additionally created a missingness indicator for each variable with missing data and included it as a predictor alongside the imputed value to model informative missingness for the prediction [37]. This allowed the models to retain all individuals while simultaneously capturing any predictive signal associated with missingness.
Statistical analysis
We trained three models of varying complexity to account for competing risks in time-to-event prediction:
- The Fine & Gray model [38] adapts Cox regression to account for competing risk by building hazard sub-distributions. We introduce LASSO penalization in order to prevent model overfitting.
- The Competing Random Forest model [27] uses nonparametric random survival forests to estimate the cumulative incidence functions of each event type. Number of trees, per-tree sample size and variable size, number of random splits and minimum size of terminal node were controlled to prevent overfitting.
- The DeepHit model [24] builds a neural network to estimate the cumulative incidence functions of each event type. It is composed of a shared and a cause specific network to estimate the joint distribution of all competing events. Learning rate, total epochs trained, width and length of shared and individual networks, class imbalance sampling rate and dropout rate were controlled to prevent overfitting.
Further details on model hyperparameter tuning and model training can be found in S1 Text. The Fine & Gray model was implemented in R using the packages survival [39] and glmnet [40]. The Random Forest model was implemented with R and the package randomForestSRC [41]. The DeepHit model was implemented with Python and the libraries PyTorch [42], PyTorch Lightning [43] and PyCox [44].
The three models were validated using 5-fold cross-validation, as it provides more accurate estimates than split sampling [45]. As the DeepHit model oversamples positive outcomes to improve its performance, it is expected to be initially mis-calibrated, and thus all three model are recalibrated to be well-calibrated to their training data, the inner fold in the cross-validation, before being validated on the outer fold. The performance was evaluated at three different prediction horizons: 6, 12, and 24 months after index data (i.e., when the patient was prescribed their first opioid). For each round of the cross-validation, the model only had access to the inner fold until the validation step, and since only first recorded incidences were included, no patient was in both the inner and outer fold.
Discriminative performance was estimated using the standard receiving operator characteristic (ROC) curve along with the precision-recall (PR) curve. The area under the ROC curve (AUROC) and area under the PR curve (AUPRC) were also calculated and plotted for all prediction horizons between 3 and 24 months.
Calibration performance was evaluated using quintile calibration plots, weighted with pseudo-observations to account for the other mortality competing risk [46]. The average cumulative incidence function of each model is compared to that observed in the population, reporting the average expected to observed risk ratio of the model. In addition, the calibration slope was calculated for all prediction horizons by fitting a linear model to the five quintiles calibration data of the calibration plots.
Furthermore, for the three prediction horizons of interest, patients were stratified based on their risk level. Patients were classified as being moderate risk (above a cut-off risk chosen by the Youden index) or high risk (in the 5th percentile of prediction scores). The recall, specificity, positive predictive value, number of patients needed to screen a positive case (defined as the inverse of the positive predictive value) was also calculated in both groups for the regression model and for the higher performing ML model (according to the AUROC and AUPRC).
Final models were trained and recalibrated to the entirety of the data. The coefficients of the Fine & Gray regression were converted into hazards ratios. Furthermore, Shapley Additive exPlanations (SHAP) values [47] for patients who had experienced opioid-related death of the higher performing ML model were calculated. SHAP values use game theory principles to explain the output of a predictive model by attributing importance scores to each input variable used in the model, thus indicating how much each feature contributed to the final prediction. The use of SHAP values is particularly relevant for “black box” models, where the decision-making process may be opaque or difficult to interpret. For each individual, the SHAP value for a particular variable implies that for this patient’s value, the risk of experiencing an opioid-related death is increased or reduced by the SHAP value, as compared to what their risk would be had they had the overall average of that variable. Although in different scales, a qualitative comparison of the order of both the Fine & Gray hazard ratios and the SHAP values of the DeepHit model (for predicting the risk 24 months after the first opioids prescription) was carried out to better understand the differences in decision-making of the two models.
The use of the models is highlighted through example cases of a random patient in the bottom half, one in the top half, one in the top ten percent, and one in the top one percent of risk predictions for the DeepHit model. The predictions of the three models are given, as well as the top predictive variables for the patient as given by the SHAP values of the DeepHit model.
The checklists for the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) [48] and the checklist for the minimum information about clinical artificial intelligence modelling (MI-CLAIM) [49], are included in S2 Table Equator checklists. Detailed guidance on the interpretation of clinical risk prediction models and terminology is provided in provided elsewhere [48,50].
Results
Baseline characteristics
Data from 1,026,139 and 337,015 patients who contributed to 2,350,730 and 781,362 patient-years of follow up were included in the development and validation cohorts. The median follow-up for both cohorts was 730 days, and the corresponding frequency of the outcome were 0.12% (corresponding 1,226 individuals) and 0.09% (293) respectively. Competing deaths constituted 5.1% (52,665) of deaths in the development cohort and 5.9% (19,849). Baseline characteristics of the patients can be found in Table 1.
The patients in both cohorts were more likely to be female (58% and 57% for CPRD Gold and Aurum respectively) and white (85% and 72%), being representative of all UK regions (although no South-Central England practice were available for CPRD Aurum) and socioeconomic deprivation levels. The median age for incident users was 50 and 58 years for CPRD Gold and Aurum, respectively. Codeine (72% and 86%), dihydrocodeine (17% and 9%) and tramadol (9% and 3%) were the most common initial opioid types prescribed.
In terms of differences, the validation cohort was more ethnically diverse (15% vs 28% of non white or unknown ethnicity), had higher smoking rates (15% for CPRD Gold and 29% for CPRD Aurum), lower depression rates (23% and 8%) and experienced a higher number of median healthcare utilisations (10 and 19).
In the development dataset, missing data was found in two categorical variables: the ethnicity variable, with 6.8% missing, and the Townsend score, with <0.1% missing, and were replaced with a missing marker. There was other no missing data in the continuous values within the patients included (age, number of GP visits and health utilisations). In the validation data, missingness was found in the ethnicity variable, with 16.5% missing, the region of the patient’s GP practice, with 1.3% missing, and the quintile of the Townsend score, with <0.1% missing. Missing data was present in two continuous values: the total number of GP visits in the last year with 5.4% missing, and the total number of health utilisations with <0.1% missing (due to incomplete records). Ethnicity and Townsend score in CPRD Gold, and ethnicity and region of the practice in CPRD Aurum were imputed as described in the methods.
Model development
The development of the Fine & Gray model, the Random Forest, and the DeepHit model, including the five-fold cross-validation, optimisation, and training of the final model, took 120, 140, and 90 hours, respectively. The final Fine & Gray model had 34 non-zero parameters. The Random Forest had a total of 241 trees, with at most 5 variables split per node and a minimum node size of 724. The DeepHit network had a total of 48,500 parameters, with two shared layers for all risks and a single individual layer for each three outcomes (opioid-related death, competing death, and other censoring), all with a width of 32 nodes.
Discrimination performance
Discriminative performance, as seen through the C-Statistic over time (Fig 2a) was comparable between The Fine & Gray model (Average C-statistic over prediction horizons between months 3–24 of follow-up: 84.3%, 95% CI: 83.6%-85.1%) and the Random Forest model (84.4%, 95% CI: 83.6%-85.3%). The DeepHit exhibited slightly lower discriminative performance of the three models (82.1%, 95% CI: 78.7%-85.6%). In terms of the average AUPRC (Fig 2b), the Fine & Gray model (1.1%, 95%CI: 0.9%-1.2%), Random Forest (1.1%, 95%CI: 0.9%-1.3%) and DeepHit (Average AUPRC between months 3–24 of follow-up: 1.1%, 95%CI: 0.6%-1.6%) had similar performance. This was above the baseline value for the AUPRC (i.e., the average precision over the same timeframe of a model that is no better than chance) of 0.1%.
Average precision (positive predictive value) across different true positive rates for the Fine & Gray model, random forest, and DeepHit across the first 24 months of prediction horizon (b). Confidence intervals are shown as dotted lines.
Calibration performance
In terms of the average expected to observed risk ratio, the Fine & Gray model (Average Ratio between 3 and 24 months of follow-up: 1.00, 95%CI: 0.91 - 1.09) and the Random Forest (1.00, 95%CI: 0.91 - 1.09) were well-calibrated. After recalibration with the training data, DeepHit (1.00, 95%CI: 0.91 - 1.09) was also well-calibrated. Calibration plots over different prediction horizons are shown in Fig 3. At 6 months, the calibration slopes of the Fine & Gray model, random forest, and DeepHit model were 0.85 (95%CI: 0.77-0.94), 1.24 (95%CI: 1.02-1.46) and 1.42 (95%CI: 0.72-2.13) respectively. At 12 months, they are 0.86 (95%CI: 0.76-0.95), 1.30 (95%CI: 1.03-1.57) and 1.56 (95%CI: 0.78-2.34), and at 24 months, 0.82 (95%CI: 0.75-0.90), 1.22 (95%CI: 1.04-1.41) and 1.26 (95%CI: 0.83-1.69).
Calibration plots for the Fine & Gray model, random forest, and DeepHit when predicting 12-month risk of opioid-related death (b). Calibration plots for the Fine & Gray model, random forest, and DeepHit when predicting 24-month risk of opioid-related death (c).
Risk stratifications
Table 2 shows the stratification of the patient population for different risk thresholds for the Fine & Gray and Random Forest models. For both thresholds, the performance of the two models was comparable, and neither performed significantly better than the other.
Model interpretation
For the Fine & Gray model coefficients (Fig 4a, complete list in S3 Table 1 in S3 Table), the 5 coefficients associated with the largest increase in the prediction of the model for categorical variables were alcohol abuse (Hazards Ratio: 3.23), substance use disorder (3.21), receiving a morphine prescription at initiation (2.45), attempted suicide/self-harm (2.12) and having a gabapentinoids prescription in the last two years (2.11). The coefficients associated with the largest decreases were never having smoked (0.79), living in the Southwest of England (0.77), being Asian (0.75), being female (0.62) and a history of migraines (0.59).
SHAP plots of the DeepHit model, with the 5 largest (c) and 5 smallest (d) average differences between positive (green) and negative (red) individuals plotted. A full list of the Fine & Gray coefficients and full list of the SHAP differences of the DeepHit model are available in the S3 Tables. Suicide/self-harm refers to attempted suicide or self-harm history.
For the SHAP values of the DeepHit neural network (Fig 4b, complete list in S3 Table 2 in S3 Table, and SHAP plot for continuous values in S3 Fig 1 in S3 Table), the five categorical variables with the highest risk SHAP values (as determined by the difference between the average of the SHAP value of individuals with and without the positive value of the categorical variable) were fibromyalgia (SHAP Difference: 0.018), receiving a oxycodone prescription at initiation (0.012), having a gabapentinoids prescription in the last two years (0.010), receiving a buprenorphine prescription at initiation (0.008) and dementia (0.008). The 5 variables with the lowest risk SHAP values were being part of a GP practice in Yorkshire and the Humber (-0.00024), having ethnicity missing from your records (-0.00043), having a missing Townsend score (-0.00047), being part of a GP practice in the East Midlands (-0.00050) and having a benzodiazepines prescription in the last two years (-0.051). Diamorphine (0.005), fentanyl (0.004) and morphine (0.003) prescriptions at initiation were in the top ten of features with the highest associated SHAP difference.
Clinical examples
We have included the predictions of 4 patients across the risk score spectrum, alongside the predicted values of the three models, and the highlighted top predictors according to SHAP DeepHit values, in S3 Table 3 in S3 Table.
External validation
The Fine & Gray, random forest, and DeepHit models had an average AUROC (Fig 5a) over months 3–24 of 81.76 (95%CI Using 500 Bootstraps: 75.90-87.62), 81.49 (95%CI: 77.60-85.38) and 81.37 (95%CI: 76.49-86.26) respectively, and an average AUPRC (Fig 5b) of 0.52% (95%CI: 0.39%-0.66%), 0.39% (95%CI: 0.27%-0.52%) and 0.34% (95%CI: 0.24%-0.44%), where the baseline AUPRC is 0.04%. According to both metrics the Fine & Gray model shows superior discrimination performance in the new dataset.
Average C-Statistic for the Fine & Gray model, random forest, and DeepHit across months 3 to 24 of prediction horizon (a). Average precision (positive predictive value) across different true positive rates for the Fine & Gray model, random forest, and DeepHit across months 3 to 24 of prediction horizon (b). Expected/Observed ratio for the Fine & Gray regression, random forest, and DeepHit across the months 3 to 24 of prediction horizon, with a grey dotted line shing the ideal expected/observed ratio of 1 (c). Confidence intervals are shown as dotted lines.
In terms of calibration, the average expected to observed ratio at 6 months was 7.57 (95%CI: 5.67-9.48), 3.78 (95%CI: 2.78-4.78), and 2.92 (95%CI: 2.37-3.46) for the Fine & Gray, Random Forest and DeepHit models, respectively (Fig 5c). At 12 months, it was 5.71 (95%CI: 4.62-6.80), 3.10 (95%CI: 2.61-3.59), and 2.51 (95%CI: 2.26-2.75), and at 24 months, it was 4.49 (95%CI: 3.99-5.00), 2.79 (95%CI: 2.14-3.44), and 2.24 (95%CI: 1.87-2.61). At 6 months, the calibration slope was 0.32 (95%CI: 0.28-0.35), 1.06 (95%CI: 0.77-1.35), and 1.13 (95%CI: 0.88-1.38), at 12, 0.32 (95%CI: 0.29-0.36), 0.93 (95%CI: 0.72-1.14), and 0.94 (95%CI: 0.70-1.19), and at 24, 0.33 (95%CI: 0.30-0.35), 0.90 (95%CI: 0.77-1.02), 0.57 (95%CI: 0.51-0.63). The random forest and DeepHit models were better calibrated in the new dataset compared to the Fine & Gray model, but still showed overall poor calibration. Additional performance metrics are presented in S3 Tables 4–6 in S3 Table.
Discussion
We developed a series of time-to-event clinical prediction models employing both regression and machine learning methodologies to predict opioid-related deaths among a cohort of patients initiating opioids. We built three competing risk prediction models of varying complexity, training and internally validating these on separate UK cohorts. In the internal validation, the machine learning models had comparable performance to the Fine & Gray regression model. In the external validation, the Fine & Gray model showed the best performance in terms of discriminative issues, illustrating the stability of regression methods.
Strengths and limitations
This is the first study assessing the potential of ML in relation to opioid-related mortality in the United Kingdom, developing, and externally validating these models using large, nationally representative cohorts using real-world primary care records. This supports the generalisability of the models and results to the entirety of the United Kingdom. This study is amongst one of the few which have explored the potential of opioid risk prediction outside of North America [15]. In addition, harnesses variables that are widely available in primary care EHRs rather than from prospective observational studies, facilitating the potential implementation of these models in clinical practice. Through ONS linkage, this study ascertained cause-specific mortality related to opioids as mentioned on the death certificate, rather than all-cause mortality, strengthening this analysis.
There are a limited number of studies comparing regression and ML models in a competing risk time-to-event framework [15,51]. This is the first study applying survival ML models to predict adverse opioid outcomes. By using interpretability techniques such as SHAP values, we were able to explore the prediction mechanisms used by the neural network. Overall, the variables in DeepHit considered to be important were the same as those used by Fine & Gray, with substance use abuse, alcohol disorder, morphine prescriptions and high levels of deprivation being associated with increases of risk in the decisions of the models. These features are consistent with previous work done in smaller retrospective studies in the US [52]. We also performed an external validation on a separate group of patients in CPRD Aurum. However, differences between the training and external validation data can have consequences for model performance even though performed in the same setting due to temporal changes in prescribing and coding practices.
The study needs to be interpreted in the context of its limitations. Large and representative, EHRs, such as CPRD Gold and Aurum, may reflect some inherent limitations related to the mechanisms of routine data recording [53,54]. As a result, the evaluation of clinical utility using decision curve analysis was inherently limited. In low-prevalence settings, decision curves tend to lie close to or overlap the “treat none” strategy across most of the threshold ranges, as differences in net benefit require much larger samples to estimate with acceptable precision. To address this limitation, we additionally reported complementary measures of clinical utility, such as sensitivity, specificity, PPV and number needed to screen at different prediction horizons, as recommended by reporting checklists for clinical AI models such as MI-CLAIM [49]. Due to the limited sample size and rarity of opioid-related deaths, more detailed temporal validation was also not statistically feasible. Access to linked UK addiction centre data was not available as part of this data source.
While CPRD measures electronic prescribing information, dispensing or administration information was not available. We therefore used a previously published drug preparation algorithm [31] to transparently prepare the such data and communicate the decisions made during this process (S1 Text). Codeine is the most frequently prescribed opioid the dataset, reflecting UK primary care prescribing practices compared to other countries [3,5]. Previous work has shown that the most common indications for opioids in the UK are musculoskeletal disorders such as osteoarthritis and low back pain, while respiratory indications such as cough are less common [55]. In addition, although the MME/day at initiation was included as a candidate predictor, the evolution of the dose after initiation was not considered. This was to prevent data leakage, when the model has access to information that would not be available at the time of prediction. Total MME/day at the time of the event has been associated with mortality in opioid-treated patients, with MME/day of <50 per day associated with lower risk, compared to higher doses [56]. This study focuses on prediction at the time of opioid initiation, however the use of dynamic predictions using prescription history and cumulative dose as predictor could be explored in further work.
It is likely that opioid-related deaths in the UK are underreported in ONS records as the primary cause of death, reflecting under recognition of such events in clinical practice. In the UK post-mortems are conducted by the coroner for specific indications such as following a sudden, violent or unexpected deaths [57] and reports are not linked with CPRD to protect the patient’s identity. For this reason, we used a broader definition of what constitutes as an opioid-related death, considering both underlying and contributory causes. However, even after applying this broader definition, the outcome of interest remained infrequent.
In this study, we focussed on the validation of three models of increasing complexity. We, however, did not explore other emerging methodologies such as transformer-based [58] or long short-term memory networks [59], as such models do not yet have the same level of maturity for time to event prediction with as the three models that we used. We considered that the DeepHit model had enough complexity to leverage potentially useful high-level relationships between predictors, but further work could explore alternative models for opioid-related death risk prediction in the presence of competing events. In addition, we were restricted our inclusion of predictors as outlined by our sample size calculation and therefore prioritised a knowledge driven rather than data-driven approach in choosing candidate predictors. The advantage of doing so includes reducing the likelihood of overfitting to a specific dataset and developing a clinically interpretable model that is less likely to be influenced by spurious associations. The final model uses a hybrid approach where the clinically relevant variables are retained in the final model following data-driven refinement. However, we acknowledge that a limitation is that the novel variables that could improve model performance further, using a data driven approach may have been overlooked.
The interpretation of coefficients and SHAP values of the models of this work is limited, as this study was designed as a predictive investigation rather than an aetiological one, precluding the drawing of any causal conclusion about the data. It is important to ensure that the associated features in the model are not interpreted as causal, as would be the case in any risk prediction model. We included variables that were correlated and while their inclusion enhances predictive performance it does not allow causal interpretation of the calculated associations. However, the investigation of coefficients still can be valuable, to better understand the decision-making of models and address the ‘black box’ of ML models [60]. Finally, differences in missingness patterns between GOLD and AURUM may influence changes in model performance and calibration, and the impact of different imputation techniques should be explored in future work.
Comparisons to other studies and practice implications
Researchers have developed prediction models to predict a diverse range of opioid-related adverse outcomes, leveraging administrative claims data [61], EHRs [33], prescription monitoring [62], mainly sourced from North America [15]. Moreover, ML models for opioid-related adverse outcome prediction, with models such as random forests [63,64], boosting trees [63,14] and neural network architectures [63,65,66] have generally demonstrated superior performance compared to logistic regression. These have, so far, all been binary classification problems, where censoring and competing risks were not considered, something which is potentially important in the prediction of opioid-related outcomes.
In this study, well-defined structured variables were used as the input of all models, with the aim of evaluating whether more complex ML models could uncover complicated relationships between these variables to improve their predictive power. ML and deep learning techniques have sometimes been utilized to instead extract information directly from unstructured data sources, such ‘unprocessed’ EHRs. These methods can leverage clinical free text using natural language processing techniques [67,68], or sequences of clinical codes as recorded in the system [65,69]. Although promising, the use of unstructured clinical data for risk assessment is particularly sensitive to generalisability issue, as models can more easily leverage patterns that are only specific to a hospital, system, or a particular moment in time [70]. Practically, such unstructured data are not widely accessible through curated data sources such as CPRD, due to information governance regulations and the risk of patient reidentification through free text.
During internal validation, the models accurately estimated the overall risk of opioid-related death in the population, but predicted probabilities were too close to the overall average risk. In the external validation cohort, the models overestimated the risk of opioid-related death on average. This calibration deficiency is in part due to the rarity of the event of interest and is a consequence of evaluating cause-specific mortality, instead of all-cause mortality. The external calibration issues could also potentially be attributed to the distributional shifts particularly salient in EHRs, caused by changes in clinical practice through time, population differences, and differences in how clinical information is recorded in each system [71]. ML models were relatively more adept, calibration-wise, in the presence of dataset shift. However, it is questionable whether the calibration observed is good enough to consider the models appropriate for probabilistic statements. Also, whether this improvement compensates the computational and implementation difficulties that might arise from implementing intensive ML models in practice was out of scope within in this work.
Calibration issues have consequences in how opioid risk score might need to be used in practice, as policies informed on whether patients exceed a certain threshold will be unpredictable when used in different centres, or throughout time [72], making the need for recalibration more frequent. Percentile-based policies, where individual practices identify the top percent of patients at risk of dying due to opioids, can become an alternative to threshold-based policies, monitoring or offering preventive resources in retrospect to patients which, within that particular population, are at the highest risk of experience opioid-related adverse events. Given the high recall, specificity and lower precision, the models developed here could be useful in identifying which patients in a practice have the lowest risks, which is especially informative when there are limited treatment options for pain. This can help more informed decisions about shared decision making to balance the benefits and risks of opioids for the individual. The consideration of calibration during model validation is important and has previously been raised as an issue of modern ML [73].
Whilst there are several risk prediction models being developed, very few are implemented into clinical practice. At present for non-cancer pain, it is recognised that there are limited pharmacological treatment options. Opioids are a class of drug that can be helpful in the short-term (e.g., immediately post-surgery) however is associated with serious adverse events and premature deaths in a proportion of individuals. Currently a ‘one size fits all’ approach is used without being able to incorporate a patient’s individual characteristics and contextual factors. Developing prediction models using both ML and traditional statistical techniques for opioid-related deaths, the most serious adverse outcome, in the context of rising opioid-related deaths in the UK has the potential to influence future outcomes and patient safety. The results of risk prediction models in this field could also be considered by guideline committees for risk stratification of individuals in non-cancer pain, to improve safer prescribing of these medications.
Conclusion
This study provides valuable insights into the performance of various predictive models for survival predictions in the context of opioid use within a competing risks framework. The models demonstrated success in predicting survival outcomes, with high discrimination when identifying which patients are at higher risk. The study however also highlights the importance of designing robust validation pipelines and reporting both discriminative and calibration performance. The results show that, although ML methods perform well, their performance is not notably different from regression methods. However, the results still show how algorithms leveraging ‘big data’ could provide predictions valuable to clinicians and the healthcare system when managing the care patients which have recently started opioids. Ultimately, this research contributes to the growing body of knowledge on ML and risk prediction in the field of opioid use and offers a foundation for future studies seeking to optimize these models for practical application.
Supporting information
S1 Text. Additional methods: Drug preparation steps, sample size calculations, hyperparameter tuning.
https://doi.org/10.1371/journal.pdig.0001190.s001
(DOCX)
S1 Table. Opioid-related death ICD-10 code list.
https://doi.org/10.1371/journal.pdig.0001190.s002
(DOCX)
S2 Table. Equator checklists for clinical prediction models.
https://doi.org/10.1371/journal.pdig.0001190.s003
(DOCX)
References
- 1. Gomes T, Mamdani MM, Paterson JM, Dhalla IA, Juurlink DN. Trends in high-dose opioid prescribing in Canada. Can Fam Physician. 2014;60(9):826–32. pmid:25217680
- 2. Smith BH, Fletcher EH, Colvin LA. Opioid prescribing is rising in many countries. BMJ. 2019;367:l5823. pmid:31624081
- 3. Jani M, Birlie Yimer B, Sheppard T, Lunt M, Dixon WG. Time trends and prescribing patterns of opioid drugs in UK primary care patients with non-cancer pain: A retrospective cohort study. PLoS Med. 2020;17(10):e1003270. pmid:33057368
- 4. Curtis HJ, Croker R, Walker AJ, Richards GC, Quinlan J, Goldacre B. Opioid prescribing trends and geographical variation in England, 1998-2018: a retrospective database study. Lancet Psychiatry. 2019;6(2):140–50. pmid:30580987
- 5. Jani M, Girard N, Bates DW, Buckeridge DL, Sheppard T, Li J, et al. Opioid prescribing among new users for non-cancer pain in the USA, Canada, UK, and Taiwan: A population-based cohort study. PLoS Med. 2021;18(11):e1003829. pmid:34723956
- 6. Jani M, Dixon WG. Opioids are not just an American problem. BMJ. 2017;359:j5514. pmid:29212773
- 7.
NHS England » National medicines optimisation opportunities 2024/25. [cited 1 Jul 2025]. Available from: https://www.england.nhs.uk/long-read/national-medicines-optimisation-opportunities-2023-24/#15-chronic-non-cancer-pain-management-without-opioids
- 8. O’Dowd A. Drug deaths causing UK mortality to fall behind other high income countries, analysis finds. BMJ. 2025;389:r1046. pmid:40393742
- 9.
Office for National Statistics. Deaths related to drug poisoning in England and Wales: 2024 registrations. ONS website. [cited 17 Oct 2025]. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/deathsrelatedtodrugpoisoninginenglandandwales/2024registrations
- 10. Bohnert ASB, Valenstein M, Bair MJ, Ganoczy D, McCarthy JF, Ilgen MA, et al. Association between opioid prescribing patterns and opioid overdose-related deaths. JAMA. 2011;305(13):1315–21. pmid:21467284
- 11. Gomes T, Juurlink DN, Antoniou T, Mamdani MM, Paterson JM, van den Brink W. Gabapentin, opioids, and the risk of opioid-related death: A population-based nested case-control study. PLoS Med. 2017;14(10):e1002396. pmid:28972983
- 12. Dunn KM, Saunders KW, Rutter CM, Banta-Green CJ, Merrill JO, Sullivan MD, et al. Opioid prescriptions for chronic pain and overdose: a cohort study. Ann Intern Med. 2010;152(2):85–92. pmid:20083827
- 13. Yimer BB, Soomro M, McBeth J, Medina CRR, Lunt M, Dixon WG, et al. Comparative risk of severe constipation in patients treated with opioids for non-cancer pain: a retrospective cohort study in Northwest England. BMC Med. 2025;23(1):288. pmid:40518524
- 14. Lo-Ciganic W-H, Donohue JM, Yang Q, Huang JL, Chang C-Y, Weiss JC, et al. Developing and validating a machine-learning algorithm to predict opioid overdose in Medicaid beneficiaries in two US states: a prognostic modelling study. Lancet Digit Health. 2022;4(6):e455–65. pmid:35623798
- 15. Ramírez Medina CR, Benitez-Aurioles J, Jenkins DA, Jani M. A systematic review of machine learning applications in predicting opioid associated adverse events. NPJ Digit Med. 2025;8(1):30. pmid:39820131
- 16. Habehh H, Gohel S. Machine Learning in Healthcare. Curr Genomics. 2021;22(4):291–300. pmid:35273459
- 17. Rahimian F, Salimi-Khorshidi G, Payberah AH, Tran J, Ayala Solares R, Raimondi F, et al. Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records. PLoS Med. 2018;15(11):e1002695. pmid:30458006
- 18. Barbieri S, Kemp J, Perez-Concha O, Kotwal S, Gallagher M, Ritchie A, et al. Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk. Sci Rep. 2020;10(1):1111. pmid:31980704
- 19. Mahmoudi E, Kamdar N, Kim N, Gonzales G, Singh K, Waljee AK. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. BMJ. 2020;369:m958. pmid:32269037
- 20. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22. pmid:30763612
- 21. Friedler SA, Scheidegger C, Venkatasubramanian S, Choudhary S, Hamilton EP, Roth D. A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019. p. 329–38.
- 22. Clark TG, Bradburn MJ, Love SB, Altman DG. Survival analysis part I: basic concepts and first analyses. Br J Cancer. 2003;89(2):232–8. pmid:12865907
- 23. Austin PC, Lee DS, Fine JP. Introduction to the Analysis of Survival Data in the Presence of Competing Risks. Circulation. 2016;133(6):601–9. pmid:26858290
- 24. Lee C, Zame W, Yoon J, Van der Schaar M. DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. AAAI. 2018;32(1).
- 25. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018;18(1):24. pmid:29482517
- 26. Kvamme H, Borgan Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 2021;27(4):710–36. pmid:34618267
- 27. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM. Random survival forests for competing risks. Biostatistics. 2014;15(4):757–73. pmid:24728979
- 28. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36. pmid:26050254
- 29. Wolf A, Dedman D, Campbell J, Booth H, Lunn D, Chapman J, et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019;48(6):1740–1740g. pmid:30859197
- 30.
Jarman B, Townsend P, Carstairs V. Deprivation indices. BMJ; 1991. 523 p.
- 31. Jani M, Yimer BB, Selby D, Lunt M, Nenadic G, Dixon WG. “Take up to eight tablets per day”: Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology. Pharmacoepidemiol Drug Saf. 2023;32(6):651–60. pmid:36718594
- 32. Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med. 2019;38(7):1276–96. pmid:30357870
- 33. Glanz JM, Narwaney KJ, Mueller SR, Gardner EM, Calcaterra SL, Xu S, et al. Prediction Model for Two-Year Risk of Opioid Overdose Among Patients Prescribed Chronic Opioid Therapy. J Gen Intern Med. 2018;33(10):1646–53. pmid:29380216
- 34. Dowell D, Ragan KR, Jones CM, Baldwin GT, Chou R. CDC Clinical Practice Guideline for Prescribing Opioids for Pain - United States, 2022. MMWR Recomm Rep. 2022;71(3):1–95. pmid:36327391
- 35. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. pmid:21225900
- 36. Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations inR. J Stat Soft. 2011;45(3).
- 37. Sisk R, Sperrin M, Peek N, van Smeden M, Martin GP. Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study. Stat Methods Med Res. 2023;32(8):1461–77. pmid:37105540
- 38. Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. J Am Stat Assoc. 1999;94(446):496–509.
- 39. Therneau T. A Package for Survival Analysis in R. 2023. Available from: https://cran.r-project.org/package=survival
- 40. Simon N, Friedman J, Hastie T, Tibshirani R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J Stat Softw. 2011;39(5):1–13. pmid:27065756
- 41.
Ishwaran H, Kogalur UB. Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.0. 2023.
- 42. Paszke A, Gross S, Massa F, Lerer A, Google J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv Neural Inf Process Syst. 2019;32.
- 43. Falcon W, Borovec J, Wälchli A, Eggert N, Schock J, Jordan J, et al. PyTorchLightning/pytorch-lightning: 0.7.6 release. Available from:
- 44.
GitHub - havakv/pycox: Survival analysis with PyTorch. [cited 10 Dec 2025]. Available from: https://github.com/havakv/pycox
- 45. Ellis RP, Mookim PG. K-Fold Cross-Validation is Superior to Split Sample Validation for Risk Adjustment Models. Bost Univ - Dep Econ - Work Pap Ser. 2013 [cited 10 Dec 2025]. Available from: https://ideas.repec.org/p/bos/wpaper/wp2013-026.html
- 46. van Geloven N, Giardiello D, Bonneville EF, Teece L, Ramspek CL, van Smeden M, et al. Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ. 2022;377:e069249. pmid:35609902
- 47. Lundberg SM, Allen PG, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
- 48. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. pmid:25560714
- 49. Norgeot B, Quer G, Beaulieu-Jones BK, Torkamani A, Dias R, Gianfrancesco M, et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med. 2020;26(9):1320–4. pmid:32908275
- 50. Efthimiou O, Seo M, Chalkou K, Debray T, Egger M, Salanti G. Developing clinical prediction models: a step-by-step guide. BMJ. 2024;386:e078276. pmid:39227063
- 51. Kantidakis G, Putter H, Litière S, Fiocco M. Statistical models versus machine learning for competing risks: development and validation of prognostic models. BMC Med Res Methodol. 2023;23(1):51. pmid:36829145
- 52. Tseregounis IE, Henry SG. Assessing opioid overdose risk: a review of clinical prediction models utilizing patient-level data. Transl Res. 2021;234:74–87. pmid:33762186
- 53. Jani M, Curtis JR, Hyrich KL. Navigating real-world data sources in rheumatology: opportunities, pitfalls, and practical guidance. Ann Rheum Dis. 2025;S0003-4967(25)04550-9. pmid:41436312
- 54. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data. JAMA Intern Med. 2018;178(11):1544–7. pmid:30128552
- 55. Ramirez Medina CR, Lyon M, Davies E, McCarthy D, Reid V, Khanna A, et al. Clinical indications associated with new opioid use for pain management in the United Kingdom: using national primary care data. Pain. 2025;166(3):656–66. pmid:39446674
- 56. Jani M, Girard N, Bates DW, Buckeridge DL, Dixon WG, Tamblyn R. Comparative risk of mortality in new users of prescription opioids for noncancer pain: results from the International Pharmacosurveillance Study. Pain. 2025;166(5):1118–27. pmid:39503752
- 57.
Post-mortem - NHS. [cited 17 Dec 2025]. Available from: https://www.nhs.uk/tests-and-treatments/post-mortem/
- 58. Bishop JM, Seiberth G. Attention Is All You Need. Driving Intelligence: The Green Book. Routledge; 2025. p. 151–63.
- 59. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. pmid:9377276
- 60. Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput. 2023;16(1):45–74.
- 61. Zedler B, Xie L, Wang L, Joyce A, Vick C, Brigham J, et al. Development of a Risk Index for Serious Prescription Opioid-Induced Respiratory Depression or Overdose in Veterans’ Health Administration Patients. Pain Med. 2015;16(8):1566–79. pmid:26077738
- 62. Chang H-Y, Krawczyk N, Schneider KE, Ferris L, Eisenberg M, Richards TM, et al. A predictive risk model for nonfatal opioid overdose in a statewide population of buprenorphine patients. Drug Alcohol Depend. 2019;201:127–33. pmid:31207453
- 63. Lo-Ciganic W-H, Huang JL, Zhang HH, Weiss JC, Wu Y, Kwoh CK, et al. Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions. JAMA Netw Open. 2019;2(3):e190968. pmid:30901048
- 64. Ellis RJ, Wang Z, Genes N, Ma’ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Min. 2019;12:3. pmid:30728857
- 65. Dong X, Deng J, Hou W, Rashidian S, Rosenthal RN, Saltz M, et al. Predicting opioid overdose risk of patients with opioid prescriptions using electronic health records based on temporal deep learning. J Biomed Inform. 2021;116:103725. pmid:33711546
- 66. Hastings JS, Howison M, Inman SE. Predicting high-risk opioid prescriptions before they are given. Proc Natl Acad Sci U S A. 2020;117(4):1917–23. pmid:31937665
- 67. Ashburner JM, Chang Y, Wang X, Khurshid S, Anderson CD, Dahal K, et al. Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records. J Am Heart Assoc. 2022;11(15):e026014. pmid:35904194
- 68. Jani M, Alfattni G, Belousov M, Laidlaw L, Zhang Y, Cheng M, et al. Development and evaluation of a text analytics algorithm for automated application of national COVID-19 shielding criteria in rheumatology patients. Ann Rheum Dis. 2024;83(8):1082–91. pmid:38575324
- 69. Miotto R, Li L, Kidd BA, Dudley JT. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci Rep. 2016;6:26094. pmid:27185194
- 70. Sauer CM, Chen L-C, Hyland SL, Girbes A, Elbers P, Celi LA. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit Health. 2022;4(12):e893–8. pmid:36154811
- 71. Avati A, Seneviratne M, Xue E, Xu Z, Lakshminarayanan B, Dai AM. BEDS-Bench: Behavior of EHR-models under Distributional Shift-A Benchmark. [cited 10 Dec 2025]. Available from: https://arxiv.org/abs/2107
- 72. Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Making. 2015;35(2):162–9. pmid:25155798
- 73. Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. 34th International Conference on Machine Learning. 2017. p. 2130–43.