Initiation of the antiarrhythmic medication dofetilide requires an FDA-mandated 3 days of telemetry monitoring due to heightened risk of toxicity within this time period. Although a recommended dose management algorithm for dofetilide exists, there is a range of real-world approaches to dosing the medication.
Methods and results
In this multicenter investigation, clinical data from the Antiarrhythmic Drug Genetic (AADGEN) study was examined for 354 patients undergoing dofetilide initiation. Univariate logistic regression identified a starting dofetilide dose of 500 mcg (OR 5.0, 95%CI 2.5–10.0, p<0.001) and sinus rhythm at the start of dofetilide loading (OR 2.8, 95%CI 1.8–4.2, p<0.001) as strong positive predictors of successful loading. Any dose-adjustment during loading (OR 0.19, 95%CI 0.12–0.31, p<0.001) and a history coronary artery disease (OR 0.33, 95%CI 0.19–0.59, p<0.001) were strong negative predictors of successful dofetilide loading. Based on the observation that any dose adjustment was a significant negative predictor of successful initiation, we applied multiple supervised approaches to attempt to predict the dose adjustment decision, but none of these approaches identified dose adjustments better than a probabilistic guess. Principal component analysis and cluster analysis identified 8 clusters as a reasonable data reduction method. These 8 clusters were then used to define patient states in a tabular reinforcement learning model trained on 80% of dosing decisions. Testing of this model on the remaining 20% of dosing decisions revealed good accuracy of the reinforcement learning model, with only 16/410 (3.9%) instances of disagreement.
Dose adjustments are a strong determinant of whether patients are able to successfully initiate dofetilide. A reinforcement learning algorithm informed by unsupervised learning was able to predict dosing decisions with 96.1% accuracy. Future studies will apply this algorithm prospectively as a data-driven decision aid.
Citation: Levy AE, Biswas M, Weber R, Tarakji K, Chung M, Noseworthy PA, et al. (2019) Applications of machine learning in decision analysis for dose management for dofetilide. PLoS ONE 14(12): e0227324. https://doi.org/10.1371/journal.pone.0227324
Editor: Randall Lee Rasmusson, University at Buffalo - The State University of New York, UNITED STATES
Received: June 25, 2019; Accepted: December 17, 2019; Published: December 31, 2019
Copyright: © 2019 Levy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly because of privacy concerns and violation of agreement of the informed consent process. Data will be made available upon request from the Partners Healthcare Research Committee (Contact via firstname.lastname@example.org), as well as the PI for AADGEN, Dr. Newton-Cheh (Contact via email at email@example.com), and corresponding author Dr. Rosenberg (Contact via email at firstname.lastname@example.org).
Funding: This work is supported by grants from the NIH T32 program (AEL: 5T32 HL007822) and the NIH NHLBI (MAR: 5K23 HL127296, CNC: R01 HL 143070). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Decision analysis is an emerging field that uses outcomes from different decision approaches to guide future decision-making. In many cases, medical decisions can be formulated as Markov-decision processes (MDPs), in which a given state of conditions can predict future states based on a model for decision-making. Reinforcement learning, a subset of machine learning (ML), expands on MDPs by embedding reward-based feedback into decision outcomes so that an optimal decision approach, termed the policy, can be identified. In recent years, this approach has achieved supra-human success rates in video and board games, among other applications[4, 5].
Reinforcement learning is one of three main categories of ML gaining popularity in medical applications, the other two being supervised and unsupervised learning. Supervised applications use an example dataset to learn general rules (an algorithm) about the relationship of predictor variables (termed “features”) to an outcome of interest (termed a “label”). These general rules can then be applied to a new dataset to predict outcomes. Unsupervised learning, in contrast, does not use labelled outcomes and, instead, discovers relationships between different features on its own. The discovery process often restructures data into new classes, “shrinking” and consolidating features for more nimble use in supervised applications. In many applications, these methods complement each other, but whereas supervised and unsupervised methods lead to descriptive analyses, feedback from outcomes allows reinforcement learning to produce prescriptive analyses. For this reason, reinforcement learning holds great promise as a tool to enrich clinical decisions. Currently, however, there are relatively few published applications in healthcare[8, 9].
Dofetilide is a common antiarrhythmic medication primarily used to treat atrial fibrillation. It is one of the few anti-arrhythmic medications other than amiodarone that has been approved for use in patients with coronary artery disease or cardiomyopathy. Like many other Vaughan Williams class III agents, dofetilide blocks the rapid delayed rectifier, IKr current, and thus can cause QT prolongation. Due to the risk of resultant fatal arrhythmias, the FDA has mandated a 3-day monitoring period for drug initiation. There is a recommended algorithm for making dose adjustments during initiation, but these adjustments are still made at the treating provider’s discretion[10, 11]. In this investigation, we examine the patterns of dofetilide dose adjustment and the role of machine learning to develop algorithms aimed at successful initiation of the medication.
This study has been approved by the University of Colorado Internal Review Board (COMIRB Protocol #16–2675), and the Partners Human Research Committee (#2013-P002623). All subjects provided written informed consent.
The Antiarrhythmic Drug Genetic (AADGEN) study is a multi-center collaboration that includes investigators from the Massachusetts General Hospital (MGH, Boston, MA), Beth Israel Deaconess Medical Center (Boston, MA), the Boston-area Veterans Affairs Medical Center (West Roxbury, MA), the Cleveland Clinic (Cleveland, OH), the Mayo Clinic (Rochester, MN), and the University of Colorado Hospital (Aurora, CO). Patients were enrolled from July 7, 2014 to September 19, 2018, with the inclusion criterion being any patient admitted to in-patient telemetry for monitoring of initiation of dofetilide. The exclusion criteria included failure to provide written informed consent and failure to obtain a pre-dofetilide ECG. Massachusetts General Hospital served as the study’s coordinating center for this investigation. Internal Review Board approval was obtained at all enrolling centers. This study is a sub-study of a larger investigation into the genetic predictors of cardiac repolarization and drug toxicity of antiarrhythmic medications (Clinicaltrials.gov identifier: NCT02439658).
Demographic and clinical information were obtained on all study participants that included age, height, weight, body mass index (BMI), medications, past medical and cardiac history, including history of pacemaker/defibrillator, atrial fibrillation, ventricular fibrillation, left ventricular function from transthoracic echocardiogram, recent lab values including creatinine, potassium, and magnesium, and electrocardiograms that include underlying rhythm, rate, and relevant intervals (PR, QRS, QT). QT interval was corrected for heart rate using Fridericia’s formula. The timing of electrical cardioversion was also recorded.
The outcome of interest was successful loading of dofetilide, defined as discharge on dofetilide at any dose after at least 5 administrations. Data for all participants was collected retrospectively, after completion of the hospitalization; no clinical adjustments or changes were made by treating physicians as part of this investigation. Data was maintained in a centralized RedCap database managed by the study coordinating center at MGH.
Prior to analysis, quality control was performed by study investigators, with manual review of outlier values for ECG parameters (i.e., QTc > 600 ms) and for discordant data values (e.g., PR interval on an ECG with rhythm listed as ‘atrial fibrillation’). When resolution or validation was not possible, values were replaced as missing. Summary and descriptive statistics are based on analysis of non-missing data; only 4.2% of the total dataset was missing. Due to the restrictions of machine-learning algorithms for complete datasets, missing values needed to be imputed with the median for numerical and integer values and most common for categorical. Categorical variables were also coded using ‘one-hot’ encoding and numerical variables were rescaled using min-max rescaling. Dose adjustments were only included if they were a decrease in dose from a higher dose, as FDA guidelines for dofetilide initiation suggest starting at the highest dose based on kidney function, and adjusting downward based on the QT changes on ECG; as such, any dose increase during the hospitalization was off-label. Based on this criterion, 14 patients who underwent dose increases were excluded. For all model evaluations, data were split into training (80% of total data) and testing sets (20% of total data) at the patient level.
Basic stepwise logistic regression was performed for successful initiation of dofetilide using a p value for exclusion of greater than 0.05. Based on the observation that dose adjustments were a significant predictor of successful initiation, we used ensemble methods to develop predictive models of the dose adjustment process. These models included L1 regularized logistic regression, random forest classification, a boosted decision tree classifier, support vector classification (radial basis function kernel), and K-nearest neighbors classification with a maximum of 10 neighbors. Comparison measures included accuracy, precision and recall scores, F1-score[13, 14], and area under ROC curve.
For unsupervised analysis, we first performed principal component analysis. Plotting the number of principal components (PC) versus variance, we hoped to identify the number of PCs that would account for greater than 90% of the variability in the data. We then performed a cluster analysis based on within cluster variation (sum-of-squares), and used the ‘elbow’ method to determine cluster numbers with sufficiently low within-cluster variability. We then used a K-means approach to create these clusters for use in subsequent reinforcement learning analyses.
We next applied reinforcement learning using the SARSA algorithm (state–action–reward–state–action) for selecting dose adjustments based on a negative reward for unsuccessful initiation. We applied two broad approaches to creation of action-value estimates (i.e., Q values) . First, we defined 8 states created using K-means clustering from all clinical features, and performed tabular updates to a Q table based on dynamic programming (step-by-step updates). Alternatively, we performed linear function approximation for the Q values using linear weights (termed ‘Q learning’), with updates using stochastic gradient descent based on experience. The available actions in the Q value estimates included ‘continue the same dose’ or ‘decrease the dose’. The reward was selected to be -10 for doses leading to stopping of the medication (last dose before stopping) and 0 for all other doses, in order to penalize decisions resulting in a negative outcome.
The Q table was initialized at 0 for all values, with gamma (discount factor) of different values ranging from 0.1 to 1.0, and alpha (learning rate) of 0.1. Of note, a gamma close to 1 puts more weight on future states and rewards while a gamma of close to 0 tends to put more weight on immediate rewards. We experimented with a range of learning rates (0.05 to 0.3). The learning rate is the extent to which Q-values are updated with new iterations of data. Reinforcement learning algorithms were fitted with the testing set (per above, 80% of doses) and compared with actual decisions on the held-out test set (per above, 20% of doses). Additional analyses were performed using k = 4 and k = 6 (number of clusters).
Descriptive statistical analysis, including chi-square for categorical and t-test for continuous comparison, as well as univariate logistic regression, was performed using Stata IC, Version 15.1 (StataCorp, LLC, College Station, TX). Machine learning, including unsupervised, supervised, and reinforcement learning algorithms, were performed using Python 3, running scripts on Jupyter notebook (v5.0.0) deployed via Anaconda Navigator, on a Macbook Pro laptop computer (High Sierra, v10.13.6). Primary source of machine learning packages was scikit-learn (see Supplemental Methods for details).
The baseline characteristics of the cohort are shown in Table 1. A total of 354 subjects were enrolled, with successful initiation (discharged on dofetilide) in 310 patients (87.1%) and unsuccessful in 44. Use of calcium channel blockers and initial dose of dofetilide were different between patients with successful vs. unsuccessful initiation of dofetilide, although none of these p values reached statistical significance after Bonferroni adjustment for multiple comparisons (probability of false positive = p/(# of rows in Table 1) = 0.05/24 = 0.002). There were no other differences in baseline parameters between patients.
A total of 354 subjects were enrolled in the Anti-arrhythmic Drug Genetic (AADGEN) study, with successful initiation (discharged on dofetilide) in 310 patients (87.1%) and unsuccessful in 44. Note: Dose excludes 4 patients with a different starting dose than listed.
Fig 1 shows representative dosing approaches for dofetilide, as well as timing of cardioversions. The most common dose regimens included subjects with no adjustments throughout the 5–6 dose course in order to obtain a steady-state of the medication (n = 204, 57.6%). Stepwise univariate regression was performed for successful initiation across the course of dofetilide initiation, which revealed that dose number, dose amount, dose adjustment, ejection fraction, history of heart failure, sinus rhythm, QRS, QTc, presence of a pacemaker, and coronary artery disease were predictors of successful discharge on dofetilide at p < 0.05 (Table 2). The strongest predictors for successful initiation of dofetilide were starting dose of 500 mcg (OR 5.0, 2.5–10.0, p < 0.001) and dose adjustment during initiation (OR 0.19, 0.21–0.31, p < 0.001), which was a negative predictor. Because it had such a strong effect, we selected dose adjustment as the target for machine learning techniques.
A schematic of the most common dosing approaches for dofetilide (color-coded rows) among patients who were successfully initiated (discharged on medicine0. The numbers in each individual cell correspond to the number of electrical cardioversion procedures performed after that specific dose within that specific dosing scheme. 29 patients with atypical dosing regimens (i.e. increases in dose) are excluded. The bottom row represents patients who were not successfully initiated on Dofetilide (n = 44).
Univariate logistic regression results for associations with successful loading of dofetilide (discharged on medication). Dose position refers to an integer from 1 to 6, in which 1 would have been the first dose and 5 or 6 would have been the final dose. Dose adjustment is any decrease in dose from the prior dose. Sinus rhythm refers to patients in sinus rhythm at the time of the dosing decision.
The 354 subjects in our analysis collectively received a total of 2037 doses of dofetilide. Out of a possible 2037 opportunities to adjust the dose of dofetilide, dose adjustments were made in 144 instances. This corresponds to a dose change probability of 7.1%, indicating that a naïve approach that predicted only no dose adjustment would be accurate 92.9% of the time, which was used as the comparison for machine-learning approaches developed to predict whether a dose adjustment would be made. However, none of the supervised analyses resulted in improvement in identification of a medication adjustment by providers over a naïve approach (based on accuracy, or any of the other classification metrics applied) as shown in Table 3.
A naïve approach to dose adjustment classification, in which dose adjustments were predicted based purely on the basis of a dose change probability of 7.1%, was used as a comparator for supervised approaches to predict dose adjustments.
As described above, unsupervised principal component analysis was performed across 25 patient and dosing characteristics. We noted that the first two principal components (PCs) accounted for 65.0% of the total variance and 90% of the total variance could be explained by the first 8 PCs (Fig 2A). Cluster analysis using within-cluster sum-of-squares identified cluster numbers of k = 4 or greater as providing sufficiently low within-cluster variability, and validated use of k = 8 clusters (Fig 2B). Qualitative assessment of each PC revealed that there was apparent clustering along the first PC into 6 groups, which likely represent the dose number (S1 Fig). Characteristics of each PC cluster are described in Table 4.
A. Cumulative and per-component variance explained for each sequential principal component (PC). B. Cluster analysis based on within-cluster sum-of-squares.
Unsupervised principal component analysis was performed across 25 patient and dosing characteristics.
After training the model on the training set (80% of data, 1627 doses), the accuracy of a tabular reinforcement-learning model for predicting actual decisions on the testing set (20%, 410 doses) was good, with only 3.9% disagreement (16/410) noted. Sensitivity analysis using a range of learning rates (alpha) and discount rates (gamma) had no impact on the accuracy of prediction; only the absolute Q values changed (not relative values). The least disagreement was observed in the Q table cluster with the smallest (most negative) values for rewards (Table 5). The analysis was repeated with use of k = 4 (S1 Table) and k = 6 clusters (S2 Table) which predicted actual decisions with less accuracy than the model with k = 8 clusters (98/410, 23%, correct for k = 4 clusters and 336/410, 82%, correct for k = 6 clusters).
Expected reward for each action for each cluster. Based on alpha (learning rate) = 0.05 and gamma (discount factor) = 0.2. Both alpha and gamma range from 0 to 1.
A linear reinforcement-learning policy function was able to achieve equal accuracy to tabular learning for certain hyper-parameter choices (alpha and gamma). Unlike the tabular learning model, however, the linear model was highly labile depending on hyper-parameter choices (S2 Fig). These models also had unstable weight estimates (See S3 Table) across parameters.
In this investigation of decision-making surrounding dofetilide initiation, we examined several approaches for evaluating dose adjustment decisions. It is important to note that while dofetilide initiation is performed in the hospital primarily for safety reasons (adverse event monitoring), the goal of these admissions is successful initiation of the drug (discharge on dofetilide) while minimizing the risk of subsequent TdP or potentially fatal ventricular arrhythmias. With this in mind, there are important insights to be drawn from this novel application of advanced analytics and machine learning to decision-making surrounding dofetilide initiation.
First, it was evident from several models that making dose adjustments, particularly at later time points, was associated with less probability of successful initiation of the medication. This association was evident in both simple logistic regression models, as well as reinforcement-learning models in which the cluster with the most negative reward (#5) was composed of doses at a later state in the hospitalization (dose 4–5 vs. 1–2), and of smaller size. This finding suggests that making a decision to lower the dose of dofetilide in a patient who has already received 3–4 doses and is already on a lower dose (250 or 125mcg) is very unlikely to result in successful initiation. While further work is needed to validate these models prospectively, this finding could have an important impact on reducing healthcare costs. It would save time and money to stop the initiation process early in a patient in whom the probability of successful initiation is unlikely, rather than staying another day or night in the hospital, or perhaps start at a lower dose in patients at higher risk of an unsuccessful initiation.
Second, we found that none of the supervised learning algorithms were able to improve prediction about providers’ dose decisions based on the clinical information available. In other words, we were unable to ‘mimic’ the decisions of providers using a statistical model when it came to making dose adjustments of dofetilide. This finding suggests that future efforts based on a gold standard of human decision-making may not lead to the desired outcomes of creating a computer algorithm to replace humans in the process, and that focusing efforts on approaches using reinforcement learning may be a better option.
The key difference of reinforcement learning is that it allows the computer to ‘learn’ its own approach to obtain a given reward, rather than relying on human behavior as the gold standard. This finding has already been noted in creation of algorithms to win at the board game Go[4, 18], in which the AlphaGo algorithm based on supervised learning of human decisions was bested by the AlphaGoZero algorithm, which learned entirely on its own, without attempting to replicate human decisions. Reinforcement learning has been studied for many years[19, 20], although the medical applications of reinforcement learning are only in their infancy, and there is clearly an opportunity for this approach to greatly improve on clinical decision-making. A number of investigators have recently used this approach to enhance decision-making in clinical care, including in the intensive care unit.
Interestingly, while use of 8 clusters provided reasonable accuracy (96.1%) with regard to the actual decision made by clinicians, use of smaller numbers of clusters (k = 4 and k = 6) resulted in less accuracy, despite the fact that both of the methods with fewer clusters had more complete Q table (less values of 0.0) and that examination of the first two PCs appeared to suggest that 6 clusters may be a reasonable grouping for the data (S1 Fig). Examination of the characteristics of the clusters for k = 6 (S2 Table) reveals that dose number itself was not the only determinant of cluster composition, as several clusters were composed of mixed dose numbers, although all clusters were composed of sequential dose numbers (for example, no clusters were composed of dose numbers that were out of order, e.g., dose 1 and dose 5). This finding raises a critical issue regarding examination of reinforcement learning for guiding clinical decisions, which is that surrogate outcomes, such as consistency with actual decisions, may not be the ideal approach for identification of the ‘optimal’ model for guiding decisions to achieve a goal, which in this case was the probability of a successful loading of dofetilide. In that regard, our study highlights a key limitation in applications of machine learning in healthcare data, in which the practical process of data and technology integration limits the ability to build better learning systems. This study was entirely observational, which is in great contrast with most other reinforcement learning applications in which the learning agent is able to practice and improve its policy based on interaction with the environment. A key principle in reinforcement learning is exploration, in which better policies can be found by randomly attempting a new action that has been found to already provide the best reward. Without the ability to act on behalf of the policies learned, we were unable to determine if these actions are truly the optimal ones, or if there are conditions in which a decision to change the dose (perhaps at an earlier time in the loading course) could result in a greater likelihood of successful initiation. Whether this limitation was also responsible for the difference in accuracy with use of different cluster numbers, or the lack of convergence we observed using linear function approximation, which has been described in other circumstances[23, 24], remains to be determined. Only through future prospective applications can we verify that the approach applied in this study is the best method to maximize likelihood of successful dofetilide initiation.
There were a number of key limitations in this study. First, we did not examine long-term outcomes, including recurrence of AF or drug toxicity, including torsade de pointes. This latter limitation is of obvious importance, as the ultimate goal of the 3-day monitoring period is to prevent toxicity; however, there are benefits to identification of factors and approaches to maximize safe initiation of dofetilide as we identified, which can lead to improved patient satisfaction and cost savings. A second limitation was that our investigation was limited to the modest number of covariates collected on patients undergoing dofetilide initiation. To truly capture the benefits of many methods of machine learning, particularly deep learning, we would need to have a much larger number of patients and variables to include in the model. In the future, through more efficient data collection and storage, especially of high-density data such as telemetry information, we will be able to further leverage these ‘big data’ methods to improve healthcare decision-making[25, 26]. Finally, as discussed above, we were unable to prospectively apply and further improve the policy models developed from the observations in this data. Future implementations of these models within a reinforcement learning framework will be needed to determine if this approach is optimal, or if there are better algorithms for ensuring safe and efficient initiation of dofetilide and other medications.
In conclusion, we found that although most patients admitted for initiation of dofetilide are able to successfully complete the loading protocol (i.e., discharged on dofetilide), reinforcement learning approaches to model dose adjustments offer promise to optimize decision making. Future investigations are needed to explore this emerging approach to machine learning and automated clinical decision support.
- 1. Hogendoorn W, Moll FL, Sumpio BE, Hunink MG. Clinical Decision Analysis and Markov Modeling for Surgeons: An Introductory Overview. Annals of surgery. 2016;264(2):268–74. Epub 2016/01/13. pmid:26756750.
- 2. Alagoz O, Hsu H, Schaefer AJ, Roberts MS. Markov decision processes: a tool for sequential decision making under uncertainty. Medical decision making: an international journal of the Society for Medical Decision Making. 2010;30(4):474–83. Epub 2010/01/02. pmid:20044582; PubMed Central PMCID: PMC3060044.
- 3. Kaelbling LP, Littman ML, AW. M. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 1996;(4):237–85.
- 4. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 2017;550(7676):354–9. Epub 2017/10/21. pmid:29052630.
- 5. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–33. Epub 2015/02/27. pmid:25719670.
- 6. Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial Intelligence in Precision Cardiovascular Medicine. J Am Coll Cardiol. 2017;69(21):2657–64. Epub 2017/05/27. pmid:28545640.
- 7. Johnson KW, Torres Soto J, Glicksberg BS, Shameer K, Miotto R, Ali M, et al. Artificial Intelligence in Cardiology. J Am Coll Cardiol. 2018;71(23):2668–79. Epub 2018/06/09. pmid:29880128.
- 8. Shortreed SM, Laber E, Lizotte DJ, Stroup TS, Pineau J, Murphy SA. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine learning. 2011;84(1–2):109–36. Epub 2011/07/30. pmid:21799585; PubMed Central PMCID: PMC3143507.
- 9. Prasad N, Cheng LF, Chivers C, Draugelis M, B. E. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv 2017;https://arxiv.org/abs/1704.06300.
- 10. Pfizer I. Tikosyn Label Information [Prescribing information]. http://labelingpfizercom/showlabelingaspx?id=639. 2011.
- 11. Naksuk N, Sugrue AM, Padmanabhan D, Kella D, DeSimone CV, Kapa S, et al. Potentially modifiable factors of dofetilide-associated risk of torsades de pointes among hospitalized patients with atrial fibrillation. J Interv Card Electrophysiol. 2019;54(2):189–96. Epub 2018/10/26. pmid:30353374.
- 12. Funck-Brentano C, Jaillon P. Rate-corrected QT interval: techniques and limitations. Am J Cardiol. 1993;72(6):17b–22b. Epub 1993/08/26. pmid:8256750.
- 13. Chai KE, Anthony S, Coiera E, Magrabi F. Using statistical text classification to identify health information technology incidents. Journal of the American Medical Informatics Association: JAMIA. 2013;20(5):980–5. Epub 2013/05/15. pmid:23666777; PubMed Central PMCID: PMC3756261.
- 14. J D, M. G, editors. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning (ICML); 2006; Pittsburgh, PA, USA.
- 15. Sutton RS AG B. Reinforcement Learning. 2nd ed. Cambridge, MA: MIT Press; 2018.
- 16. Vassiliades V, Cleanthous A, Christodoulou C. Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma. IEEE transactions on neural networks. 2011;22(4):639–53. Epub 2011/03/23. pmid:21421435.
- 17. Qiao J, Wang G, Li W, Chen M. An adaptive deep Q-learning strategy for handwritten digit recognition. Neural networks: the official journal of the International Neural Network Society. 2018;107:61–71. Epub 2018/05/08. pmid:29735249.
- 18. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529(7587):484–9. Epub 2016/01/29. pmid:26819042.
- 19. Millán JDR, Torras C. A reinforcement connectionist approach to robot path finding in non-maze-like environments. Machine learning. 1992;8(3):363–95.
- 20. Gullapalli V. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks. 1990;3(6):671–92.
- 21. Yom-Tov E, Feraru G, Kozdoba M, Mannor S, Tennenholtz M, Hochberg I. Encouraging Physical Activity in Patients With Diabetes: Intervention Using a Reinforcement Learning System. Journal of medical Internet research. 2017;19(10):e338. Epub 2017/10/12. pmid:29017988; PubMed Central PMCID: PMC5654735.
- 22. Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018;24(11):1716–20. Epub 2018/10/24. pmid:30349085.
- 23. Boyan JAaM, Andrew W. Generalization in reinforcement learning: Safely approximating the value function. NIPS. 1995:pp. 369–76.
- 24. Tsitsiklis JNaVR Benjamin. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control. 1997;42:674–90.
- 25. Attia ZI, Sugrue A, Asirvatham SJ, Ackerman MJ, Kapa S, Friedman PA, et al. Noninvasive assessment of dofetilide plasma concentration using a deep learning (neural network) analysis of the surface electrocardiogram: A proof of concept study. PLoS One. 2018;13(8):e0201059. Epub 2018/08/23. pmid:30133452; PubMed Central PMCID: PMC6104915.
- 26. Sugrue A, Kremen V, Qiang B, Sheldon SH, DeSimone CV, Sapir Y, et al. Electrocardiographic Predictors of Torsadogenic Risk During Dofetilide or Sotalol Initiation: Utility of a Novel T Wave Analysis Program. Cardiovascular drugs and therapy / sponsored by the International Society of Cardiovascular Pharmacotherapy. 2015;29(5):433–41. Epub 2015/09/29. pmid:26411977; PubMed Central PMCID: PMC4731047.