Applications of machine learning in decision analysis for dose management for dofetilide

Background Initiation of the antiarrhythmic medication dofetilide requires an FDA-mandated 3 days of telemetry monitoring due to heightened risk of toxicity within this time period. Although a recommended dose management algorithm for dofetilide exists, there is a range of real-world approaches to dosing the medication. Methods and results In this multicenter investigation, clinical data from the Antiarrhythmic Drug Genetic (AADGEN) study was examined for 354 patients undergoing dofetilide initiation. Univariate logistic regression identified a starting dofetilide dose of 500 mcg (OR 5.0, 95%CI 2.5–10.0, p<0.001) and sinus rhythm at the start of dofetilide loading (OR 2.8, 95%CI 1.8–4.2, p<0.001) as strong positive predictors of successful loading. Any dose-adjustment during loading (OR 0.19, 95%CI 0.12–0.31, p<0.001) and a history coronary artery disease (OR 0.33, 95%CI 0.19–0.59, p<0.001) were strong negative predictors of successful dofetilide loading. Based on the observation that any dose adjustment was a significant negative predictor of successful initiation, we applied multiple supervised approaches to attempt to predict the dose adjustment decision, but none of these approaches identified dose adjustments better than a probabilistic guess. Principal component analysis and cluster analysis identified 8 clusters as a reasonable data reduction method. These 8 clusters were then used to define patient states in a tabular reinforcement learning model trained on 80% of dosing decisions. Testing of this model on the remaining 20% of dosing decisions revealed good accuracy of the reinforcement learning model, with only 16/410 (3.9%) instances of disagreement. Conclusions Dose adjustments are a strong determinant of whether patients are able to successfully initiate dofetilide. A reinforcement learning algorithm informed by unsupervised learning was able to predict dosing decisions with 96.1% accuracy. Future studies will apply this algorithm prospectively as a data-driven decision aid.


Methods and results
In this multicenter investigation, clinical data from the Antiarrhythmic Drug Genetic (AAD-GEN) study was examined for 354 patients undergoing dofetilide initiation. Univariate logistic regression identified a starting dofetilide dose of 500 mcg (OR 5.0, 95%CI 2.5-10.0, p<0.001) and sinus rhythm at the start of dofetilide loading (OR 2.8, 95%CI 1.8-4.2, p<0.001) as strong positive predictors of successful loading. Any dose-adjustment during loading (OR 0.19, 95%CI 0.12-0.31, p<0.001) and a history coronary artery disease (OR 0.33, 95%CI 0.19-0.59, p<0.001) were strong negative predictors of successful dofetilide loading. Based on the observation that any dose adjustment was a significant negative predictor of successful initiation, we applied multiple supervised approaches to attempt to predict the dose adjustment decision, but none of these approaches identified dose adjustments better than a probabilistic guess. Principal component analysis and cluster analysis identified 8 clusters as a reasonable data reduction method. These 8 clusters were then used to define patient states in a tabular reinforcement learning model trained on 80% of dosing decisions. Testing of this model on the remaining 20% of dosing decisions revealed good accuracy of the reinforcement learning model, with only 16/410 (3.9%) instances of disagreement. PLOS  Background Decision analysis is an emerging field that uses outcomes from different decision approaches to guide future decision-making [1]. In many cases, medical decisions can be formulated as Markov-decision processes (MDPs), in which a given state of conditions can predict future states based on a model for decision-making [2]. Reinforcement learning, a subset of machine learning (ML), expands on MDPs by embedding reward-based feedback into decision outcomes so that an optimal decision approach, termed the policy, can be identified [3]. In recent years, this approach has achieved supra-human success rates in video and board games, among other applications [4,5]. Reinforcement learning is one of three main categories of ML gaining popularity in medical applications, the other two being supervised and unsupervised learning [6]. Supervised applications use an example dataset to learn general rules (an algorithm) about the relationship of predictor variables (termed "features") to an outcome of interest (termed a "label"). These general rules can then be applied to a new dataset to predict outcomes. Unsupervised learning, in contrast, does not use labelled outcomes and, instead, discovers relationships between different features on its own. The discovery process often restructures data into new classes, "shrinking" and consolidating features for more nimble use in supervised applications. In many applications, these methods complement each other, but whereas supervised and unsupervised methods lead to descriptive analyses, feedback from outcomes allows reinforcement learning to produce prescriptive analyses [7]. For this reason, reinforcement learning holds great promise as a tool to enrich clinical decisions. Currently, however, there are relatively few published applications in healthcare [8,9].
Dofetilide is a common antiarrhythmic medication primarily used to treat atrial fibrillation. It is one of the few anti-arrhythmic medications other than amiodarone that has been approved for use in patients with coronary artery disease or cardiomyopathy. Like many other Vaughan Williams class III agents, dofetilide blocks the rapid delayed rectifier, I Kr current, and thus can cause QT prolongation. Due to the risk of resultant fatal arrhythmias, the FDA has mandated a 3-day monitoring period for drug initiation [10]. There is a recommended algorithm for making dose adjustments during initiation, but these adjustments are still made at the treating provider's discretion [10,11]. In this investigation, we examine the patterns of dofetilide dose adjustment and the role of machine learning to develop algorithms aimed at successful initiation of the medication.

Methods
This study has been approved by the University of Colorado Internal Review Board (COMIRB Protocol #16-2675), and the Partners Human Research Committee (#2013-P002623). All subjects provided written informed consent.

Study population
The Antiarrhythmic Drug Genetic (AADGEN) study is a multi-center collaboration that includes investigators from the Massachusetts General Hospital (MGH, Boston, MA), Beth Israel Deaconess Medical Center (Boston, MA), the Boston-area Veterans Affairs Medical Center (West Roxbury, MA), the Cleveland Clinic (Cleveland, OH), the Mayo Clinic (Rochester, MN), and the University of Colorado Hospital (Aurora, CO). Patients were enrolled from July 7, 2014 to September 19, 2018, with the inclusion criterion being any patient admitted to in-patient telemetry for monitoring of initiation of dofetilide. The exclusion criteria included failure to provide written informed consent and failure to obtain a pre-dofetilide ECG. Massachusetts General Hospital served as the study's coordinating center for this investigation. Internal Review Board approval was obtained at all enrolling centers. This study is a sub-study of a larger investigation into the genetic predictors of cardiac repolarization and drug toxicity of antiarrhythmic medications (Clinicaltrials.gov identifier: NCT02439658).
Demographic and clinical information were obtained on all study participants that included age, height, weight, body mass index (BMI), medications, past medical and cardiac history, including history of pacemaker/defibrillator, atrial fibrillation, ventricular fibrillation, left ventricular function from transthoracic echocardiogram, recent lab values including creatinine, potassium, and magnesium, and electrocardiograms that include underlying rhythm, rate, and relevant intervals (PR, QRS, QT). QT interval was corrected for heart rate using Fridericia's formula [12]. The timing of electrical cardioversion was also recorded.
The outcome of interest was successful loading of dofetilide, defined as discharge on dofetilide at any dose after at least 5 administrations. Data for all participants was collected retrospectively, after completion of the hospitalization; no clinical adjustments or changes were made by treating physicians as part of this investigation. Data was maintained in a centralized RedCap database managed by the study coordinating center at MGH.

Data processing
Prior to analysis, quality control was performed by study investigators, with manual review of outlier values for ECG parameters (i.e., QTc > 600 ms) and for discordant data values (e.g., PR interval on an ECG with rhythm listed as 'atrial fibrillation'). When resolution or validation was not possible, values were replaced as missing. Summary and descriptive statistics are based on analysis of non-missing data; only 4.2% of the total dataset was missing. Due to the restrictions of machine-learning algorithms for complete datasets, missing values needed to be imputed with the median for numerical and integer values and most common for categorical. Categorical variables were also coded using 'one-hot' encoding and numerical variables were rescaled using min-max rescaling. Dose adjustments were only included if they were a decrease in dose from a higher dose, as FDA guidelines for dofetilide initiation suggest starting at the highest dose based on kidney function, and adjusting downward based on the QT changes on ECG; as such, any dose increase during the hospitalization was off-label. Based on this criterion, 14 patients who underwent dose increases were excluded. For all model evaluations, data were split into training (80% of total data) and testing sets (20% of total data) at the patient level.

Supervised analysis
Basic stepwise logistic regression was performed for successful initiation of dofetilide using a p value for exclusion of greater than 0.05. Based on the observation that dose adjustments were a significant predictor of successful initiation, we used ensemble methods to develop predictive models of the dose adjustment process. These models included L1 regularized logistic regression, random forest classification, a boosted decision tree classifier, support vector classification (radial basis function kernel), and K-nearest neighbors classification with a maximum of 10 neighbors. Comparison measures included accuracy, precision and recall scores, F 1score [13,14], and area under ROC curve.

Unsupervised analysis
For unsupervised analysis, we first performed principal component analysis. Plotting the number of principal components (PC) versus variance, we hoped to identify the number of PCs that would account for greater than 90% of the variability in the data. We then performed a cluster analysis based on within cluster variation (sum-of-squares), and used the 'elbow' method to determine cluster numbers with sufficiently low within-cluster variability. We then used a K-means approach to create these clusters for use in subsequent reinforcement learning analyses.

Reinforcement learning
We next applied reinforcement learning using the SARSA algorithm (state-action-rewardstate-action) for selecting dose adjustments based on a negative reward for unsuccessful initiation [15]. We applied two broad approaches to creation of action-value estimates (i.e., Q values) [16]. First, we defined 8 states created using K-means clustering from all clinical features, and performed tabular updates to a Q table based on dynamic programming (step-by-step updates). Alternatively, we performed linear function approximation for the Q values using linear weights (termed 'Q learning' [17]), with updates using stochastic gradient descent based on experience [15]. The available actions in the Q value estimates included 'continue the same dose' or 'decrease the dose'. The reward was selected to be -10 for doses leading to stopping of the medication (last dose before stopping) and 0 for all other doses, in order to penalize decisions resulting in a negative outcome.
The SARSA algorithm [15] updates a Q table with expected reward values based on state and action selected based on the following variation of the Bellman equation [15]: The Q table was initialized at 0 for all values, with gamma (discount factor) of different values ranging from 0.1 to 1.0, and alpha (learning rate) of 0.1. Of note, a gamma close to 1 puts more weight on future states and rewards while a gamma of close to 0 tends to put more weight on immediate rewards. We experimented with a range of learning rates (0.05 to 0.3). The learning rate is the extent to which Q-values are updated with new iterations of data. Reinforcement learning algorithms were fitted with the testing set (per above, 80% of doses) and compared with actual decisions on the held-out test set (per above, 20% of doses). Additional analyses were performed using k = 4 and k = 6 (number of clusters).

Analysis
Descriptive statistical analysis, including chi-square for categorical and t-test for continuous comparison, as well as univariate logistic regression, was performed using Stata IC, Version 15.1 (StataCorp, LLC, College Station, TX). Machine learning, including unsupervised, supervised, and reinforcement learning algorithms, were performed using Python 3, running scripts on Jupyter notebook (v5.0.0) deployed via Anaconda Navigator, on a Macbook Pro laptop computer (High Sierra, v10.13.6). Primary source of machine learning packages was scikitlearn (see Supplemental Methods for details).

Results
The baseline characteristics of the cohort are shown in Table 1. A total of 354 subjects were enrolled, with successful initiation (discharged on dofetilide) in 310 patients (87.1%) and unsuccessful in 44. Use of calcium channel blockers and initial dose of dofetilide were different between patients with successful vs. unsuccessful initiation of dofetilide, although none of these p values reached statistical significance after Bonferroni adjustment for multiple comparisons (probability of false positive = p/(# of rows in Table 1) = 0.05/24 = 0.002). There were no other differences in baseline parameters between patients.  Stepwise univariate regression was performed for successful initiation across the course of dofetilide initiation, which revealed that dose number, dose amount, dose adjustment, ejection fraction, history of heart failure, sinus rhythm, QRS, QTc, presence of a pacemaker, and coronary artery disease were predictors of successful discharge on dofetilide at p < 0.05 ( Table 2). The strongest predictors for successful initiation of dofetilide were starting dose of 500 mcg (OR 5.0, 2.5-10.0, p < 0.001) and dose adjustment during initiation (OR 0.19, 0.21-0.31, p < 0.001), which was a negative predictor. Because it had such a strong effect, we selected dose adjustment as the target for machine learning techniques.  Table 2. Association with successful loading of dofetilide. Univariate logistic regression results for associations with successful loading of dofetilide (discharged on medication). Dose position refers to an integer from 1 to 6, in which 1 would have been the first dose and 5 or 6 would have been the final dose. Dose adjustment is any decrease in dose from the prior dose. Sinus rhythm refers to patients in sinus rhythm at the time of the dosing decision. The 354 subjects in our analysis collectively received a total of 2037 doses of dofetilide. Out of a possible 2037 opportunities to adjust the dose of dofetilide, dose adjustments were made in 144 instances. This corresponds to a dose change probability of 7.1%, indicating that a naïve approach that predicted only no dose adjustment would be accurate 92.9% of the time, which was used as the comparison for machine-learning approaches developed to predict whether a dose adjustment would be made. However, none of the supervised analyses resulted in improvement in identification of a medication adjustment by providers over a naïve approach (based on accuracy, or any of the other classification metrics applied) as shown in Table 3.

OR
As described above, unsupervised principal component analysis was performed across 25 patient and dosing characteristics. We noted that the first two principal components (PCs) accounted for 65.0% of the total variance and 90% of the total variance could be explained by the first 8 PCs (Fig 2A). Cluster analysis using within-cluster sum-of-squares identified cluster numbers of k = 4 or greater as providing sufficiently low within-cluster variability, and validated use of k = 8 clusters (Fig 2B). Qualitative assessment of each PC revealed that there was apparent clustering along the first PC into 6 groups, which likely represent the dose number (S1 Fig). Characteristics of each PC cluster are described in Table 4.
After training the model on the training set (80% of data, 1627 doses), the accuracy of a tabular reinforcement-learning model for predicting actual decisions on the testing set (20%, 410 doses) was good, with only 3.9% disagreement (16/410) noted. Sensitivity analysis using a range of learning rates (alpha) and discount rates (gamma) had no impact on the accuracy of prediction; only the absolute Q values changed (not relative values). The least disagreement was observed in the Q table cluster with the smallest (most negative) values for rewards ( Table 5). The analysis was repeated with use of k = 4 (S1 Table) and k = 6 clusters (S2 Table) which predicted actual decisions with less accuracy than the model with k = 8 clusters (98/410, 23%, correct for k = 4 clusters and 336/410, 82%, correct for k = 6 clusters).
A linear reinforcement-learning policy function was able to achieve equal accuracy to tabular learning for certain hyper-parameter choices (alpha and gamma). Unlike the tabular learning model, however, the linear model was highly labile depending on hyper-parameter choices (S2 Fig). These models also had unstable weight estimates (See S3 Table) across parameters.

Discussion
In this investigation of decision-making surrounding dofetilide initiation, we examined several approaches for evaluating dose adjustment decisions. It is important to note that while Table 3. Supervised learning approaches to decision-making. A naïve approach to dose adjustment classification, in which dose adjustments were predicted based purely on the basis of a dose change probability of 7.1%, was used as a comparator for supervised approaches to predict dose adjustments.

Accuracy
Precision Score Recall Score dofetilide initiation is performed in the hospital primarily for safety reasons (adverse event monitoring), the goal of these admissions is successful initiation of the drug (discharge on dofetilide) while minimizing the risk of subsequent TdP or potentially fatal ventricular arrhythmias [11]. With this in mind, there are important insights to be drawn from this novel application of advanced analytics and machine learning to decision-making surrounding dofetilide initiation. First, it was evident from several models that making dose adjustments, particularly at later time points, was associated with less probability of successful initiation of the medication. This association was evident in both simple logistic regression models, as well as reinforcementlearning models in which the cluster with the most negative reward (#5) was composed of doses at a later state in the hospitalization (dose 4-5 vs. 1-2), and of smaller size. This finding suggests that making a decision to lower the dose of dofetilide in a patient who has already received 3-4 doses and is already on a lower dose (250 or 125mcg) is very unlikely to result in successful initiation. While further work is needed to validate these models prospectively, this finding could have an important impact on reducing healthcare costs. It would save time and money to stop the initiation process early in a patient in whom the probability of successful initiation is unlikely, rather than staying another day or night in the hospital, or perhaps start at a lower dose in patients at higher risk of an unsuccessful initiation.
Second, we found that none of the supervised learning algorithms were able to improve prediction about providers' dose decisions based on the clinical information available. In other  words, we were unable to 'mimic' the decisions of providers using a statistical model when it came to making dose adjustments of dofetilide. This finding suggests that future efforts based on a gold standard of human decision-making may not lead to the desired outcomes of creating a computer algorithm to replace humans in the process, and that focusing efforts on approaches using reinforcement learning may be a better option.
The key difference of reinforcement learning is that it allows the computer to 'learn' its own approach to obtain a given reward, rather than relying on human behavior as the gold standard. This finding has already been noted in creation of algorithms to win at the board game Go [4,18], in which the AlphaGo algorithm based on supervised learning of human decisions [18] was bested by the AlphaGoZero algorithm, which learned entirely on its own, without attempting to replicate human decisions [4]. Reinforcement learning has been studied for many years [19,20], although the medical applications of reinforcement learning are only in their infancy, and there is clearly an opportunity for this approach to greatly improve on clinical decision-making. A number of investigators have recently used this approach to enhance decision-making in clinical care [21], including in the intensive care unit [22].
Interestingly, while use of 8 clusters provided reasonable accuracy (96.1%) with regard to the actual decision made by clinicians, use of smaller numbers of clusters (k = 4 and k = 6) resulted in less accuracy, despite the fact that both of the methods with fewer clusters had more complete Q table (less values of 0.0) and that examination of the first two PCs appeared to suggest that 6 clusters may be a reasonable grouping for the data (S1 Fig). Examination of the characteristics of the clusters for k = 6 (S2 Table) reveals that dose number itself was not the only determinant of cluster composition, as several clusters were composed of mixed dose numbers, although all clusters were composed of sequential dose numbers (for example, no clusters were composed of dose numbers that were out of order, e.g., dose 1 and dose 5). This finding raises a critical issue regarding examination of reinforcement learning for guiding clinical decisions, which is that surrogate outcomes, such as consistency with actual decisions, may not be the ideal approach for identification of the 'optimal' model for guiding decisions to achieve a goal, which in this case was the probability of a successful loading of dofetilide. In that regard, our study highlights a key limitation in applications of machine learning in healthcare data, in which the practical process of data and technology integration limits the ability to build better learning systems. This study was entirely observational, which is in great contrast with most other reinforcement learning applications in which the learning agent is able to practice and improve its policy based on interaction with the environment. A key principle in reinforcement learning is exploration [15], in which better policies can be found by randomly attempting a new action that has been found to already provide the best reward. Without the ability to act on behalf of the policies learned, we were unable to determine if these actions are truly the optimal ones, or if there are conditions in which a decision to change the dose (perhaps at an earlier time in the loading course) could result in a greater likelihood of successful initiation. Whether this limitation was also responsible for the difference in accuracy with use of different cluster numbers, or the lack of convergence we observed using linear function approximation, which has been described in other circumstances [23,24], remains to be determined. Only through future prospective applications can we verify that the approach applied in this study is the best method to maximize likelihood of successful dofetilide initiation.

Limitations
There were a number of key limitations in this study. First, we did not examine long-term outcomes, including recurrence of AF or drug toxicity, including torsade de pointes. This latter limitation is of obvious importance, as the ultimate goal of the 3-day monitoring period is to prevent toxicity [11]; however, there are benefits to identification of factors and approaches to maximize safe initiation of dofetilide as we identified, which can lead to improved patient satisfaction and cost savings. A second limitation was that our investigation was limited to the modest number of covariates collected on patients undergoing dofetilide initiation. To truly capture the benefits of many methods of machine learning, particularly deep learning, we would need to have a much larger number of patients and variables to include in the model. In the future, through more efficient data collection and storage, especially of high-density data such as telemetry information, we will be able to further leverage these 'big data' methods to improve healthcare decision-making [25,26]. Finally, as discussed above, we were unable to prospectively apply and further improve the policy models developed from the observations in this data. Future implementations of these models within a reinforcement learning framework will be needed to determine if this approach is optimal, or if there are better algorithms for ensuring safe and efficient initiation of dofetilide and other medications.
In conclusion, we found that although most patients admitted for initiation of dofetilide are able to successfully complete the loading protocol (i.e., discharged on dofetilide), reinforcement learning approaches to model dose adjustments offer promise to optimize decision making. Future investigations are needed to explore this emerging approach to machine learning and automated clinical decision support.