Machine learning approach to predict postoperative opioid requirements in ambulatory surgery patients

Opioids play a critical role in acute postoperative pain management. Our objective was to develop machine learning models to predict postoperative opioid requirements in patients undergoing ambulatory surgery. To develop the models, we used a perioperative dataset of 13,700 patients (≥ 18 years) undergoing ambulatory surgery between the years 2016–2018. The data, comprising of patient, procedure and provider factors that could influence postoperative pain and opioid requirements, was randomly split into training (80%) and validation (20%) datasets. Machine learning models of different classes were developed to predict categorized levels of postoperative opioid requirements using the training dataset and then evaluated on the validation dataset. Prediction accuracy was used to differentiate model performances. The five types of models that were developed returned the following accuracies at two different stages of surgery: 1) Prior to surgery—Multinomial Logistic Regression: 71%, Naïve Bayes: 67%, Neural Network: 30%, Random Forest: 72%, Extreme Gradient Boost: 71% and 2) End of surgery—Multinomial Logistic Regression: 71%, Naïve Bayes: 63%, Neural Network: 32%, Random Forest: 72%, Extreme Gradient Boost: 70%. Analyzing the sensitivities of the best performing Random Forest model showed that the lower opioid requirements are predicted with better accuracy (89%) as compared with higher opioid requirements (43%). Feature importance (% relative importance) of model predictions showed that the type of procedure (15.4%), medical history (12.9%) and procedure duration (12.0%) were the top three features contributing to model predictions. Overall, the contribution of patient and procedure features towards model predictions were 65% and 35% respectively. Machine learning models could be used to predict postoperative opioid requirements in ambulatory surgery patients and could potentially assist in better management of their postoperative acute pain.


Introduction
Pain is a commonly reported symptom among patients after surgery [1,2]. However, the management of acute postoperative pain continues to be difficult for both the patients and the health care providers. Patients with unrelieved postoperative pain are associated with slower recovery, delayed ambulation and increased risks of infection and thromboembolism [3]. Further, patients with poorly controlled postoperative pain are at higher risk of developing chronic pain [4]. In addition to patient impact, there are also deleterious consequences of inadequate pain management for hospitals, including extended length of stay, increased risk of readmission, and increased cost of care [3].
Opioids are often used to manage postoperative pain [5]. Despite their widespread use to mitigate pain, opioid use is also associated with negative side effects including neurological effects, respiratory depression, gastrointestinal effects, and pruritus [6]. For these reason, opioid-sparing multimodal analgesic options are increasingly being adopted for optimal pain control in the perioperative setting [7]. Nevertheless, opioids still have a critical role in acute postoperative pain management especially for procedures where a primary neuraxial, regional or local infiltration is not possible.
Predicting postoperative pain levels and opioid requirements could facilitate proactive strategies that can optimize pain control to avoid underuse or overuse of opioids. Towards this, previous studies have retrospectively looked for predictors of postoperative pain and analgesic consumption, identifying four significant predictors including age, type of surgery, anxiety levels, and psychological distress [8]. Some of these studies have focused on specific types of surgeries and patient population to determine factors associated with postoperative opioid usage [9,10]. To date, a review of the published literature indicates the lack of rigorous research pertaining to the identification of perioperative predictive factors for acute postoperative pain and opioid requirements across a wide spectrum of surgeries. Furthermore, previous studies used traditional statistical methods as opposed to trying machine learning, potentially limiting their predictive abilities [11].
Recently, artificial intelligence methods such as machine learning have increasingly been used in the medical field to predict clinical events [12]. Machine learning is particularly suited to analyze large datasets, compute complex interactions, identify hidden patterns, and generate actionable predictions in clinical settings. In many cases, machine learning has been shown to be superior to traditional statistical techniques [13][14][15][16][17][18]. Machine learning models offer a promising method to predict pain levels and opioid requirements following surgery. However, the applications of machine learning in the context of opioids have been specific and limited in scope, with attempts to predict opioid overdose risk among Medicare beneficiaries with opioid prescriptions and inadequate pain management in patients suffering from depression as two examples [19,20].
In this study, we developed machine learning models to predict postoperative opioid requirements for a wide range of outpatient surgeries. The models were developed and validated using a large dataset comprised of patient, procedure and provider data. The models were built to predict postoperative opioid requirements prior to surgery using preoperative data and at the end of surgery using both preoperative and intraoperative data.

Study setting
This study was approved by the University of Washington Institutional Review Board (IRB# STUDY00002256). Requirement for patient consent was waived. Our academic medical center

Data sources
Patient and procedure information on ambulatory surgeries for 3 years (2016-2018) were extracted from our institution's perioperative information management system data warehouse. Only adult (� 18 years of age) outpatients that received general anesthesia were included. Patients who were on patient-controlled analgesics or who received non-opioid analgesics in the post anesthesia care unit (PACU) were excluded. Additionally, patients who remained intubated, or had a peripheral nerve block placed for postoperative pain management were also excluded. The inclusion and exclusion criteria as well as the patient counts are presented in Fig 1. The choice of data variables for model development was largely based on prior literature that identified factors influencing postoperative pain levels or opioid consumption [8,[21][22][23][24]. Patient and procedure specific parameters prior to and during surgery were considered to achieve the goal of developing models to predict postoperative opioid requirements both prior to and at the end of surgery. The summarized list of patient and procedure specific variables used for model development is outlined in Table 1.

Data preparation
Several data preparation steps were performed prior to model development. Records with outlier data values and key missing data points were identified. They comprised only a small fraction of the total number of records (<1%, N = 179). Hence, they were simply excluded from model development.
Data elements relating to medical, social and psychiatric histories, were embedded in free text fields in the electronic medical record (EMR). Standard natural language processing techniques were used to generate modelling features from these free text data. Pain levels recorded in the EMR were a combination of patient reported numeric rating (0 -no pain to 10 most severe pain) or nurse assessed pain levels (none, mild, moderate and severe pain). For modeling purposes, numeric rating pain scores were normalized to pain levels (0: no pain, 1-3: mild pain, 4-6 moderate pain and 7-10 severe pain) [25].
Administered opioids could be of different types and potencies. Therefore, Morphine Milligram Equivalent (MME) representation was used to consolidate the opioid doses into a single normalized value [26][27][28]. The MME conversion ratios used for the study are tabulated in S1 Table. Numeric MME values are less practical to interpret and act in a clinical setting than categories of opioid requirements that correspond to pain levels. Hence, we proceeded to categorize the MME values into four compartments of opioid requirement-None/very low, low, medium and high. This categorization was based on the average MME opioids used when the pain levels were none, mild, moderate and severe. The mean MME requirements for no, mild, moderate and severe pain levels are shown in Table 2. The average value of MME requirements for adjacent pain levels were used to determine the MME ranges of opioid requirement categories. MME ranges for the opioid requirement categories were: None/very low (0-3 MME), low Table 1. Parameters used in prediction models. The parameters marked in green were used for postoperative opioid requirements prediction prior to surgery and parameters marked in blue were added to the prediction models to perform a postoperative opioid requirements prediction at the end of surgery.

Patient specific parameters
Procedure specific parameters (3-11 MME), medium (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), and high (�25 MME). The categorized postoperative opioid MME requirement served as the dependent variable for the predictive models. Procedure types proved to be useful in defining decision boundaries, but were too numerous for practical use without further processing. Hence, they were thus aggregated into a smaller list based on the location of surgical site and whether the procedure was open or closed. The recategorized procedure types and counts are in S2 Table. Model development The master dataset was randomly split into two groups comprised of a training dataset (80% of the records) and a "holdout" dataset (20% of the records) for unbiased validation. Model development and parameter tuning were each performed using the training dataset. Five models of different classes were developed to predict postoperative opioid requirements: Multinomial Regression, Naïve Bayesian, Neural Network, Random Forest and Extreme Gradient Boosting Trees. Models were developed to predict probabilities of postoperative opioid requirements for each procedure among the four categories (none/very low, low, medium and high). Models were trained to predict opioid requirements prior to surgery using only preoperative data and then a second time at the end of surgery using both preoperative and intraoperative data. This two-phase model development matched the expected user requirements: an initial estimate prior to surgery and a more informed estimate post-surgery both having utility in the clinical setting.
Model development was performed in R programming environment (R Foundation for Statistical Computing, Vienna, Austria) [29].

Model validation
Model validation was performed on the "holdout" dataset that was not used for training. Prediction accuracy over the holdout set was used to differentiate models. The output for model predictions were probabilities in each of the established four categories: none, low, medium, or high opioid requirement. Prediction accuracy could be computed based on the single category with the highest prediction probability and comparing that against the category in which the actual opioid requirement falls. However, this approach becomes overly strict especially when the model predicted probabilities in adjacent categories are similar. In consultation with our clinical partners we adopted an alternate, more balanced, approach to compute accuracy. Instead of simply selecting the category with the highest prediction probability the models were evaluated based on their ability to perform an aggregate prediction within two adjacent opioid requirement categories: None + Low, Low + Medium and Medium+ High. The aggregate prediction bucket with the highest combined prediction probability was determined to be the model prediction. The predicted aggregate bucket was compared to the bucket corresponding to the actual opioid requirement for estimating the model accuracy. The concept is shown in Fig 2. The model with the highest accuracy was chosen and further evaluated in terms of precision and recall. Further, prediction accuracies for different surgical specialties were also determined. To explain the model predictions in terms of feature importance, we used permutation method. The method works by randomly shuffling data one feature at a time for the entire dataset and calculating how much the prediction accuracy decreases when a feature is excluded. A larger change in prediction accuracy represents a greater importance of that feature.

Results
A total of 13,700 patients were included in the study. The patient and procedure characteristics are presented in Table 3. The mean ± SD age of the patients was 51±17 years with geriatric patients (>65 years of age) comprising 22% of the population. The mean BMI was 28.4 ± 7.2 with 35% of the patients obese (BMI > 30). Female patients were a higher fraction (58%) while racial demographics was predominantly white (83%). A significant portion of the patients suffered from depression (24%) or anxiety (26%). Among home medications, 52% of the patient cohort were on non-opioid pain medications and 23% on opioids. Chronic pain was diagnosed in 3.8% of the patients. The mean ± SD of the procedure duration was 75±56 minutes with the main surgical specialties being General (22%), ENT (18%) and Urology (15%). Table 3 also presents the characteristics of the training and testing data subsets. The patient and procedure factors were well matched between the two data subsets with none the factors significantly different between the datasets.

Exploratory data analysis
Exploratory data analysis was performed to understand relationships between variables and to inform modeling steps. To have a basic understanding of the factors affecting opioid requirements, bivariate relationships between patient or procedure factors and postoperative opioid requirements were found. The statistically significant preoperative factors are shown in Fig 3. Longer duration procedures, patients on opioids and plastic surgeries were top factors that were related to higher opioid requirements. On the other hand, no preoperative pain, older age, urological surgeries were the top factors related to lower opioid requirements.

Model predictions
Models were validated on a hold-out dataset (N = 2740, 20% of total data). The prediction accuracies for all five models when including just the preoperative features and when adding The models predicted probabilities of postoperative opioid requirements in four buckets None/very low, Low, Medium and High. For model validation, the ability to predict within two adjacent buckets (None + Low, Low + Medium, Medium + High) was considered. The aggregate prediction bucket that had the highest combined prediction probability was considered as the model prediction. The predicted aggregate bucket was compared against the bucket corresponding to the actual opioid requirement for estimating the model accuracy.
https://doi.org/10.1371/journal.pone.0236833.g002 Table 3. Primary patient and procedure characteristics observed in the overall (N = 13,700), training (N = 10,960) and testing (N = 2740) datasets. Proportions are presented for categorical variables while mean ± standard deviation (SD) are shown for continuous variables. The comparison of characteristics between the training and testing datasets is also presented. intraoperative features are show in Table 4. Random Forest and Multinomial Regression models had the best accuracy. Adding the intraoperative features did not enhance the prediction accuracy of the models significantly. Table 5 presents detailed results for the best performing model, which was the Random Forest. Model accuracies for different surgical specialties are shown. Model accuracies varied for different surgical specialties. Oral and thoracic surgeries had the highest accuracies, though counts of these surgeries were comparatively small. General and plastic surgeries had lower accuracies. Since the model accuracies when predicting opioid requirements at the beginning and end of surgery were similar further analyses focused on the beginning of surgery stage. Table 6 presents the recall (sensitivity) and precision (positive predictive value) of random forest model predictions. Recall was highest when the model predicted to the "none+low" aggregate category. For the "low+medium" category recall was very poor. Precision was highest for "none+low" and "medium+high" categories while lowest for  "low+medium". Table 7 shows the confusion matrices for each category of opioid requirement with true positive, true negative, false positive and false negative counts and rates. Overall, the model performance was poorer when predicting higher opioid requirements. The model had most difficulty predicting the "low+medium" opioid requirement as compared with the other categories. Model performance metrics in predicting single categories is also presented in S3 and S4 Tables.

Feature importance
Feature importance of the Random Forest model, determined through permutation method, is presented in Table 8. The average relative importance of different features contributing to model prediction is outlined. The type of procedure, patient's medical history and procedure duration were the top three features contributing to model predictions. Overall, patient features contributed 65% while procedure features contributed 35% towards model predictions.

Discussion
Management of acute postoperative pain with opioids needs to be optimal to avoid the adverse effects of overdose and underdose. Towards this, we applied artificial intelligence methods specifically, machine learning in this instance to predict postoperative opioid requirements so that proactive planning could be enabled. We used a comprehensive and large perioperative dataset to develop and validate the models which were trained to make predictions preoperatively prior to surgery and at the end of surgery. Our study showed that machine learning models can predict postoperative opioid requirements with an accuracy around 70% when adjacent opioid requirement categories are aggregated. Among the models tried, Random Forest, Multinomial regression and Extreme Gradient Boost models performed better than Naïve Bayes and Neural Network. The differences between the performances of these higher performing models were only marginal.
Several key findings are noted while observing the model predictions. Surprisingly, the model accuracies were very similar prior to surgery and at the end of surgery; suggesting that intraoperative data (intraoperative opioids, other types of analgesics, inhalation agents, fluids, patient position, etc.) did not contribute to improving model accuracy. This may prove to be advantageous because a model that can preoperatively predict the postoperative opioid requirements without compromising accuracy could potentially enable proactive pain management strategies prior to surgery. Model sensitivity (recall) and precision were only modest at best and that too only for the "none+low" category. The model had particular difficulty predicting whether a patient's opioid requirement would fall in the "low+medium" category with a tendency to misclassify the requirement as "none+low". The model performance was better when predicting "none+low" as compared with other categories. This may explain why model accuracies varied for different surgical specialties. The specialties that had higher opioid requirements tended to have lower model accuracies.
Feature importance for best performing Random Forest model predictions made prior to surgery reveals interesting observations ( Table 8). The scheduled procedure type proved to be the most important feature in model predictions. Yet, patient specific factors-demographics, medical history, social history, and psychiatric issues together played a predominant role in determining the postoperative opioid requirements.
Despite including a comprehensive dataset of preoperative and intraoperative parameters, model accuracies in predicting postoperative opioid requirements were not over 72%. This suggests that additional factors influencing opioid requirements were potentially not included in the data used for training the models. A potential factor that we considered was provider practice pattern in ordering opioids for pain management. However, adding surgeon and anesthesiologist data into the model led to no notable improvement. Accuracy of machine learning models can be, in principle, improved with more data. Additional data for model training could be obtained either by extending the time range of the dataset or obtaining data from more institutions. However, here are downsides to each approach. By extending the time range, the risk for encountering changes in practice and documentation patterns over time increases potentially compromising data consistency. Similarly, institutional variations in case mix and practices can negatively affect consistency of multi-institutional data. In this particular project, we noted that adding two additional years of data yielded no improvement in accuracy.
The single center nature of data is a limitation of the study and whether the model performance can be replicated in other centers is unknown at this time. As a future step training and validating the model against standardized multicenter data such as those hosted by Multicenter Perioperative Outcomes Group (www.mpog.org) could be a way to validate the model across institutions. The second limitation of this project is that we focused only on outpatients. This was deliberate to avoid the confounding factors of patient-controlled analgesia, regional blocks for postoperative pain management and variable length of stay that are difficult to incorporate into the model. For this reason, we chose to keep the scope limited to outpatients in this first modeling effort.
In summary, machine learning models were able to predict postoperative opioid requirements in ambulatory surgery patients. Prediction accuracies remained unchanged even after adding intraoperative information to preoperative data. In general, model prediction sensitivities were greater in patients requiring lower amounts of opioids as compared with those requiring higher amounts. Translating such models into point of care tools could provide assistive intelligence to the perioperative care provider leading to improved management of postoperative acute pain.
Supporting information S1 Table. Oral Morphine Milligram Equivalents (MME) conversion ratios used for the study.