Using time series analysis approaches for improved prediction of pain outcomes in subgroups of patients with painful diabetic peripheral neuropathy

Prior work applied hierarchical clustering, coarsened exact matching (CEM), time series regressions with lagged variables as inputs, and microsimulation to data from three randomized clinical trials (RCTs) and a large German observational study (OS) to predict pregabalin pain reduction outcomes for patients with painful diabetic peripheral neuropathy. Here, data were added from six RCTs to reduce covariate bias of the same OS and improve accuracy and/or increase the variety of patients for pain response prediction. Using hierarchical cluster analysis and CEM, a matched dataset was created from the OS (N = 2642) and nine total RCTs (N = 1320). Using a maximum likelihood method, we estimated weekly pain scores for pregabalin-treated patients for each cluster (matched dataset); the models were validated with RCT data that did not match with OS data. We predicted novel ‘virtual’ patient pain scores over time using simulations including instance-based machine learning techniques to assign novel patients to a cluster, then applying cluster-specific regressions to predict pain response trajectories. Six clusters were identified according to baseline variables (gender, age, insulin use, body mass index, depression history, pregabalin monotherapy, prior gabapentin, pain score, and pain-related sleep interference score). CEM yielded 1766 patients (matched dataset) having lower covariate imbalances. Regression models for pain performed well (adjusted R-squared 0.90–0.93; root mean square errors 0.41–0.48). Simulations showed positive predictive values for achieving >50% and >30% change-from-baseline pain score improvements (range 68.6–83.8% and 86.5–93.9%, respectively). Using more RCTs (nine vs. the earlier three) enabled matching of 46.7% more patients in the OS dataset, with substantially reduced global imbalance vs. not matching. This larger RCT pool covered 66.8% of possible patient characteristic combinations (vs. 25.0% with three original RCTs) and made prediction possible for a broader spectrum of patients. Trial Registration: www.clinicaltrials.gov (as applicable): NCT00156078, NCT00159679, NCT00143156, NCT00553475.


Introduction
Diabetes affects approximately 8.8% of adults worldwide [1], and around 50% of them develop polyneuropathy [2]. Moreover, 25% of those with diabetic neuropathy also develop neuropathic pain [3]. Reviews of current approaches to the treatment of painful diabetic peripheral neuropathy (pDPN) suggest that although progress is being made, there is still a pressing need to identify optimal ways to treat this condition [4]. Many efforts are under way to optimize therapy by segmenting patients on the basis of phenotypes, etiologies, and risk factors [5][6][7][8][9].
Promising approaches to treatment optimization may be identified more efficiently through integration of randomized controlled trial (RCT) data with non-randomized data [10]. Such integrated data hold the potential for delivering more useful evidence-based treatment-related information for clinicians [11], because randomized data focus on internal validity providing confidence about cause-effect relationships, and evidence from observational data focuses on external validity and provides confidence about relevance of a specific treatment choice. Integrating them quantitatively offers a way to reduce the covariate bias (one of the notable shortfalls of observational data) while still incorporating one of its core strengths related to external validity. Utilization of all available data in this manner holds the promise for optimizing therapy and reducing the impact of diabetic neuropathic pain globally.
Previously, we used data from three existing international RCTs and a large observational study (OS) in Germany [11] to assess treatment outcomes in pDPN patients treated with the α2δ ligand, pregabalin. Pregabalin is approved in Europe for the treatment of neuropathic pain [12], and in the United States for neuropathic pain associated with pDPN and spinal cord injury as well as for postherpetic neuralgia [13]. We implemented hierarchical clustering and applied coarsened exact matching (CEM) at the cluster level to match RCT patients with OS patients. Using these techniques, we were able to create patient subgroups whose pain outcomes could be effectively predicted with time series regressions with lagged variables as inputs at the subgroup level; however, the prior work had not optimized the regressions. The focus of the current analysis was to assess the clinical implications of applying such methods in a microsimulation platform with regressions optimized for prediction to deliver better care based on improved prediction of pregabalin treatment response for subgroups of patients. Our approach was based on applying methods that address the inherent time-varying nature of patient care and of the characteristics associated with their treatments for pDPN. Our goal was to determine whether the inclusion of data from six additional RCTs could further reduce, through CEM, the global imbalance (covariate bias) in the OS data. In addition, we evaluated whether the increase in available patient data could enable a more accurate prediction of treatment responses-first, in virtual patients who could be simulated, but ultimately, in a broader variety of actual patients. Finally, by utilizing machine learning techniques, we also sought to broaden the range of patients with pDPN for whom we could predict pain reduction outcomes with pregabalin.

Methods
We utilized data from nine placebo-controlled RCTs designed to evaluate the efficacy of pregabalin for reducing pain scores in patients with pDPN (data on file for study A0081071; www. design, data collection, and analysis, and they review the paper for scientific accuracy, data verification, and adherence to ICJME authorship and reporting guidelines. However, Pfizer leaves the final decisions on the clinical interpretation of the data and the content of the paper to the expert judgement of the authors. The specific roles of these authors are articulated in the 'Author Contributions' section" of the manuscript. clinicaltrials.gov registration numbers: NCT00156078, NCT00159679, NCT00143156, and NCT00553475; the other trials were not registered on www.clinicaltrials.gov) [14][15][16][17][18][19][20][21]. The trials were conducted between March 1998 and March 2009 in Asia, Australia, Canada, Europe, Latin America, the Middle East, South Africa, and the United States. Patients received flexibleor fixed-dose pregabalin (75, 150, 300, or 600 mg/day) or placebo for 5-13 weeks ( Table 1). Each of the nine studies shared fundamental inclusion criteria, including the requirement for patients to be aged �18 years; have a primary diagnosis of pDPN (type 1 or 2 diabetes mellitus with glycated hemoglobin (HbA1c) �11% and painful, distal, symmetrical, or sensorimotor polyneuropathy for �6 months); have an average pain score �4 [on an 11-point numeric rating scale (NRS), where 0 = no pain and 10 = worst possible pain] over a 7-day baseline period; and have a score �40 mm on the 0-100 mm visual analog scale of the Short-Form McGill Pain Questionnaire at screening and randomization [22]. Patients with creatinine clearance rates ranging from �30 to �60 mL/min were excluded, as were patients with any conditions that could jeopardize their health or confound assessment of pain due to pDPN. All studies were conducted in compliance with the ethics principles originating in or derived from the Declaration of Helsinki, internal review board requirements, or Good Clinical Practice guidelines, and all participants provided written informed consent before participation.
The primary efficacy measure in each study was change in pain score (on the 0-10 NRS) derived from entries in patients' daily pain diaries (S1 Table). Pain responders at the 50% threshold were defined as those with [pain score at baseline-pain score (t)]/pain score at baseline �50% (where t = time at the end of the study). Secondary efficacy measures included change from baseline to end of study in pain-related sleep interference (PRSI) scores derived from daily sleep diaries in which patients rated how much their pain had interfered with their sleep using an 11-point NRS, where 0 = pain does not interfere with sleep and 10 = pain completely interferes with sleep.
The RCTs contained the following data for patients receiving active treatment: age, gender, body mass index (BMI), baseline pain score (0-10 NRS), baseline PRSI score (0-10 NRS), HbA1c normal or elevated, concomitant insulin use, pregabalin monotherapy, duration of diabetes, allodynia at baseline, average weekly pain (based on daily pain scores), average weekly PRSI (based on daily scores), prior gabapentin use, and past or current medical history of depression.
The OS data were from a 6-week, open-label study in standard outpatient settings in Germany [23]. The physicians were free to prescribe pregabalin 150-600 mg/day as either monotherapy or add-on therapy in accordance with the European Summary of Product Characteristics dosing schedule [12]. The OS collected the following data, which overlapped with data from the RCTs: age, gender, BMI, baseline pain score, baseline PRSI score, HbA1c (normal or elevated), ongoing insulin use (yes or no), prior gabapentin use, and past or current medical history of depression. The OS was different from RCTs in the following aspects: it did not collect data on duration of diabetes and allodynia at baseline; it utilized flexible dosing of pregabalin; and it recorded additional information on duration of pDPN, history of sleep disorder, and history of anxiety. The OS recorded pain and PRSI scores at baseline and at weeks 1, 3, and 6 (in contrast to daily diary scores in the RCTs). Furthermore, only the OS dataset included 'general feeling' responses to three questions (calm and relaxed, full of energy, sad and discouraged) on a 6-point always-to-never scale, recorded at baseline and at weeks 1, 3, and 6. To impute missing data at weeks 2, 4, and 5, we used the EXPAND PROC (SAS Institute, Cary, North Carolina, USA) that utilizes second-order interpolation.
Another difference is that the six newly added RCTs were 12-or 13-weeks' duration (S1 Table), whereas the original three RCTs were 6 weeks. For these six new RCTs, we utilized the 6-week outcomes for patients for the initial clustering, CEM, and time series regressions optimized using the least absolute shrinkage and selection operator (LASSO) method (described below). Then, during the simulation phase (described below), we incorporated data from the remaining weeks so as to take into account 50% pain responders at 6 weeks who were not responders at 12 or 13 weeks, as well as those who were not responders at 6 weeks but became responders by 12 or 13 weeks.
We sought to use the RCT data to reduce the level of bias in the distributions of covariates in the OS data. We used CEM to match the RCT data to the OS data [24] as we did in prior work [11].
We implemented hierarchical cluster analysis in the OS and then used CEM to match RCT patients to the patients in the OS clusters as described elsewhere [11], with one change. We used the Gower distance method rather than the Euclidean distance method because it could better represent mixed data types (eg, dichotomous, categorical, continuous) [25]. For each of the variables, we also analyzed whether the clusters in the matched dataset differed from one another in a statistically significant manner (as follows). For each pairwise comparison of variables with respect to the proportions of patients within a cluster, we applied Fisher's exact test in three ways: (1) within the matched dataset used to derive the regressions with lagged variables; (2) within the validation dataset used for the initial evaluation of the regressions with lagged variables; and (3) between the clusters in the matched and validation datasets.
Next, we used the regression models to predict the behavior of a time series from past values, taking into account the existence of cross-correlations in the time series data to best represent multivariate analysis of the pain score at a given time lag in relation to: 1. Pain score at antecedent time lags; 2. PRSI score and other relevant time-dependent variables (eg, general feeling variables and prior treatment dose) at different time lags; 3. Specific patient demographic and/or medical history data likely to influence pain score.
The matched dataset was used to derive and calibrate the regressions for each of the clusters (the calibration or training dataset). Initially, we used a maximum likelihood method to estimate the parameter calibration of the regressions for each of the matched dataset clusters. We used forward and backward techniques to explore time lags and other variables to be included in each model for each cluster [26]. An initial validation of the regressions was implemented using patients in the OS who did not match with RCT patients and thus were not included in the calibration dataset. We plotted observed vs. predicted pain scores as well as the residuals to demonstrate confidence in the predictive validity of the regressions. A t-test of the time series of the observed vs. predicted pain scores was also performed in order to confirm that our data were consistent with the null hypothesis. Once we had achieved the best possible results with traditional techniques, we then employed a penalized regression method to improve the predictive capability of our regressions. We tested three methods for penalized regression: LASSO, adaptive LASSO, and elastic net. For LASSO selection, the penalty is placed on L1 norm of the regression coefficients; adaptive LASSO modifies the LASSO penalty by applying weights to each parameter that forms the LASSO constraint; and for elastic net, the penalty is on the combination of L1 and L2 norms of the regression coefficients [27]. Elastic net includes two tuning parameters, whereas LASSO and adaptive LASSO include only one. With our data, the three models performed very similarly for each of the six different clusters [See S2 Table].
We then decided to implement the simpler LASSO method given the similarities in the results of the three methods in this context to reduce the overall prediction error of the model based on the bias-variance tradeoff, which is achieved by decreasing the variance of the coefficient estimates while slightly increasing bias via shrinking the sum of absolute values of regression coefficients to zero [27,28].
In order to analyze how we could predict pain scores over time, including the final score, for a novel patient, we used an agent-based modeling and simulation (ABMS) platform to create 'virtual' patients having different combinations of characteristics [29]. In order to define the domain of possible pain scores that a novel patient could face over the 6-week period, we adapted the Monte Carlo technique to perform 1000 virtual instances of a single novel patient. These virtual instances included fixed characteristics that were the same for each of the 1000 instances, along with time-varying characteristics that were different for each instance. We derived the probability densities for the time-varying components (eg, pain scores) in a threestep process of assigning a novel patient to a cluster, thereby triggering cluster-specific regression models to derive variations on patient responses, and then simulating a range of likely response trajectories (ie, the pain scores that a patient would most likely experience over time, starting from certain specific characteristics at baseline).
For the first step in the process, assigning a novel patient to a cluster, we utilized two instance-based machine learning techniques (k-Nearest Neighbor and Supervised Fuzzy C-Means). This cluster assignment then triggered which regression model would be applied to the simulation for the novel patient. Next, 1000 instances of the possible trajectories for the novel patient were created to reflect the possible trajectories of pain scores-effectively, 1000 virtual patients derived from the novel patient based on combinations of the fixed variables and the time-varying variables (described below). The regressions were applied weekly over the 6-week period. Dose was also assigned week by week considering the treatment given to the responder patients of the cluster achieving at least a 50% reduction in pain score.
The cluster-specific regressions included some parameters that were fixed (gender, age group, BMI, pDPN duration, taking insulin, pregabalin monotherapy, previous gabapentin, and past or current medical history of depression), general feeling variables at baseline (full of energy, calm and relaxed, sad and discouraged), and some that varied over time (PRSI scores in current and prior weeks, pain scores in prior weeks). For the variables with parameters that change during the 6 weeks' treatment simulation, the ABMS platform relied on a method that would incorporate patients who were similar in how they changed over time. We utilized the k-Nearest Neighbor (kNN) approach for this task. The kNN approach defines 'closest' based on considering patients as vectors and calculating the Euclidean distance between the novel patient and the patients in the cluster, according to the considered variables. We adopted the convention of using the square root for 'k' [30]. In this situation, k becomes the square root of the cluster size. We then used the values in the trajectories from the 'k' closest patients to generate the Probability Density Functions (PDFs). These PDFs are then used as the source of the time-varying values (eg, PRSI scores and general feelings with a distinct PDF for each) for the 1000 instances of the novel patient. The variables were only used if they were significant for the regression model for a specific cluster to which the novel patient was assigned at the beginning. Once we obtained the k closest patients, we collected (1 week later: time t+1) their values for PRSI scores and general feeling variables, and plotted the probability densities for these k patients. For the subsequent week (time t+2), we performed the same steps, such that we were only considering the values of the time-varying variables at t+1 (and prior) for the 'k' patients who generated trajectories. This means that only those as close as possible to the novel patient would be simulated (ie, although all patients in the cluster were sorted by Euclidean distance, it was only the first 'k' patients who were considered for the PDF). The simulation returned the range of response trajectories that a patient would most likely experience over time, starting from specific characteristics at baseline. Distributions for pain score and responder status were displayed at the end of the 6-week simulation. The trajectories over the 6-week period were displayed in the Virtual Lab output in order to show how each virtual patient arrived at the final pain score at the conclusion of the simulation. We examined with simulation whether we could correctly predict the responder status of patients in the validation dataset using positive predictive value (PPV) and accuracy.
We also explored how we could extend analyses when we only used some of the available data [eg, no OS data beyond 6 weeks, no RCT data beyond 6 weeks (N = 3 studies), RCT data for 12-13 weeks (N = 6 studies)]. Hence, the Virtual Lab included a capability for estimating whether a candidate novel patient would maintain or modify beyond the end of week 6 his/her responder status achieved at the end of week 6. We accomplished extrapolation of responder status by focusing on the monotonicity of pain trajectories. Monotonicity is a measure of the extent to which a trajectory monotonically increases or decreases (S1 Fig) [31,32]. In particular, we implemented the following steps: 1. Identification of two main outcome indicators: median pain score of the 1000 virtual patient trajectories at week 6 and the overall monotonicity of the cloud of those trajectories from week 0 to week 6; 2. Identification through a kNN approach of a subset of week 12-13 RCT patients who were closer to the simulated novel patient with respect to the two indicators noted in the prior step; and 3. Calculation of the monotonicity for patients in the six RCTs with 12-13 weeks of data based on the monotonicity of the kNN's subset and use that information to predict patients who fell into three categories: patients who were responders at 6 weeks but became non-responders at 12-13 weeks; patients who were not responders at 6 weeks but became responders at 12-13 weeks; and patients who did not change their responder status.

Results
The hierarchical cluster analysis of the OS (2642 patients) yielded six clusters (195-626 patients each) based on the semipartial R 2 that measures the homogeneity of merged clusters [11] with the following clustering variables: gender, age, insulin use, BMI, past or current medical history of depression, pregabalin monotherapy, prior use of gabapentin, baseline pain score, and baseline PRSI score. The dendrogram from the cluster analysis, along with the dendrogram from the prior work, can be found in S2 Fig. Table 2 (4)(5)(6) or severe (7-10). The nine RCTs and the combined dataset covered 66.1% and 76.6%, respectively, of the 192 possible combinations of these patient characteristics.
The Fisher's exact test results (Table 3) for the proportion of patients within a cluster for each pairwise comparison of the variables showed that at least two thirds of the pairwise comparisons were statistically different for: 1. Matched dataset used for regression model calibration: gender, insulin use, past or current medical history of depression, prior gabapentin use, and pregabalin monotherapy; 2. Validation dataset used for initial regression model evaluation: insulin use, prior gabapentin use, and pregabalin monotherapy; and 3. Matched vs. Unmatched datasets: gender, age group, BMI group, insulin use, PRSI score at baseline, and pregabalin monotherapy.
All of the final regression models that estimated weekly pain scores for the matched data performed well after implementing the regularization with LASSO, with adjusted R 2 ranging from 0.97 to 0.98, and root mean square errors ranging from 0.47 to 0.51 (Table 4). The most influential variables were those associated with time-lagged relationships. The regressions before regularization with LASSO also performed well but with many more variables (See S4  Table).
The following were influential in several cluster-specific regressions: pain in prior weeks, PRSI, treatment dose, feeling calm and relaxed, and feeling full of energy. Age group was influential in one cluster-specific regression.
We also evaluated how well these regression models predicted responders in the OS patients who did not match with RCT patients (validation dataset N = 876). When comparing observed pain scores in the validation dataset with those predicted using the regression models derived from the calibration dataset, two-sample t-tests found statistical similarity for all clusters (all P > 0.05). The regressions of the scatterplots of observed vs. predicted pain scores showed R 2 values of 0.97, 0.98, 0.98, 0.98, 0.98, and 0.97 for clusters 1-6, respectively (see Fig 2 for plots of observed vs. predicted pain scores and the residuals for these plots for each cluster). Table 5 shows the PPV and accuracy results for virtual patients achieving a 50% reduction in pain scores compared with actual patients. Overall, the PPV was 77.8% (ranging from 68.6% to 83.8% for an individual cluster). If we used the 30% pain responder definition, the PPV and accuracy improved to 91.6% (ranging from 86.5% to 93.9%, depending on the cluster) and 91.6% (ranging from 86.5% to 93.9%), respectively.
Results for the monotonicity analysis were based on the following patient groups: (a) increased pain: 7.6% of patients were responders at week 6 and became non-responders at week 12 or 13 (N = 71); (b) decreased pain: 11.3% of patients were not responders at week 6 but became responders by week 12 or 13 (N = 106); and (c) maintained pain response from weeks 6 to 12-13: 81.2% of patients did not change their responder status between weeks 6 and week 12 or 13 (N = 358 for responders and N = 404 for non-responders) (S5 Table). We looked at the monotonicity in each of these three groups and identified monotonicity categories: monotonicity above a positive threshold (x>0.2), monotonicity between positive and negative thresholds (-0.2�x�0.2), and monotonicity below a negative threshold (x<-0.2). The thresholds of -0.2 and +0.2 produced balanced combinations in these groups with the lowest standard deviation of the mean monotonicity for the group. Results in Fig 3A show these thresholds leading to the more balanced combination of the three patient groups [ie, decreased pain (n = 106), maintained pain response from weeks 6 to 12-13 (n = 358+404 = 762), increased pain (n = 71)] and the lowest standard deviation of the mean monotonicity. The accuracy for how well we could correctly predict whether a patient would further decrease his/her pain score, maintain his/her pain score, or increase his/her pain from 6 to 12-13 weeks was 82.4%, 71.9%, and 89.5%, respectively. Fig 3B shows the receiver operating characteristic (ROC) curves for these three groups.

Discussion
These results demonstrate the potential value of integrating patient data from RCT and OS sources and developing subgroup-specific regressions for prediction of therapeutic response (discussed further below). The inclusion of additional RCTs in this analysis enabled a 46.7% increase in matched OS patients (from 1204 to 1766) compared with earlier work. A high degree of imbalance often occurs in OSs because their assignments to treatments are not random. A reduction of this imbalance can be achieved, however, through a matching of OS patients with those from relevant RCTs in which the covariates are, in principle, more highly balanced owing to the randomized design. Matching is thus intended to identify a better balance in the multidimensional distribution of covariates, resulting in the lower covariate bias desired to establish better explanatory models of potential causal relationships among measured variables [33]. Global imbalance after matching was also substantially reduced in all six Time series analysis to predict pregabalin outcomes in painful diabetic peripheral neuropathy clusters (with the nine RCTs) instead of in only five of them (with three RCTs), with all of the resulting clusters having less covariate bias and being able to support robust prediction. Although there were only small gains in terms of prediction, the larger RCT pool now allowed prediction for a broader spectrum of patients because the nine RCTs covered 66.8% of possible combinations of patient characteristics compared with 25.0% with the three original RCTs. That capability for predicting the response to treatment in a wider range of patient types was further extended owing to the ensemble machine learning techniques that relaxed the requirements for matching a novel patient to a cluster; response trajectories of patients matched with a broader combination of characteristics could be modeled, simulated, and ultimately predicted.
Consistent with prior work, the heterogeneity of patients with pDPN was confirmed by the existence, in the OS, of six clusters of patients defined by different patterns of patient variables ( Table 2). Most baseline variables did not sort into a single cluster (age group, BMI group, duration of pDPN, pain score, PRSI score, feeling full of energy, calm and relaxed, sad and discouraged), but rather varied in their combinations. A few were present in isolation (eg, all females in one cluster and all males in one cluster, no current use of insulin in one cluster, pregabalin monotherapy in two clusters but completely absent in one cluster, prior use of gabapentin in one cluster but completely absent in two clusters, past or current medical history of depression in one cluster but completely absent in three clusters), but they were found in various proportions in the remaining clusters. These findings demonstrate the substantial level of complexity and interaction among variables and underscore the challenge facing clinicians who are selecting appropriate treatment and dose for individual patients. The need to process enormous amounts of disparate clinical information (in the absence of effective modeling tools) may explain why clinicians often experience a great degree of uncertainty, but might also depend upon intuition rather than on a clearly defined rationale about which patients are likely to respond better to certain therapeutic approaches. For example, one could assume that the existence of females in one cluster suggests that a subgroup of females might be similar; however, most of the females are spread across five subgroups. Thus, the trait of being female combined with other characteristics is what defines their placement within a specific cluster.

Across Clusters Within Validation Dataset (# of Unique Clusters out of 15 Pairwise Comparisons) Calibration vs. Validation Dataset (# of Unique Clusters out of 36 Pairwise Comparisons)
Gender https://doi.org/10.1371/journal.pone.0207120.t003 Time series analysis to predict pregabalin outcomes in painful diabetic peripheral neuropathy There were 50% pain responders and non-responders in every cluster, and all treatment doses were present in all clusters. The clusters were quite similar to the clusters from the prior work; however, there were fewer instances of clusters with only one state for the dichotomous variables. This finding might be due to the change from using Euclidean distance to Gower distance, with its ability to better identify distances with different types of variables.
The pairwise comparison of variables in the six clusters (Table 3) supported prior work related to the distinctive patterns of variables in each cluster. Some overlap was found in pairwise comparisons for some clusters within the matched OS patient dataset, within the unmatched OS patient dataset (validation dataset), and between the two datasets. As with the prior work, dose was not influential in defining a cluster; however, based on its presence in the majority of the cluster-specific regression equations, dose plays an important role in predicting whether a patient is likely to respond to pregabalin therapy. Since some patients require titration above a 300 mg dose and others do not [12, 13}, it is not surprising that titration was not predictive in all of the cluster-specific regressions. Fixed patient variables combined with 'on-treatment' variables in different ways to predict responses in the different clusters, as seen in the regression results in Table 4. The parameters in the regression models before regularization (See S4 Table) reinforced the reciprocal influences between pain and PRSI [34] and dose in prior weeks [35,36]. They also showed the relevance of selected psychosocial variables for certain subgroups of patients, but not others, as has been shown in other studies [37,38]. Other variables such as age and insulin use were the only  Time series analysis to predict pregabalin outcomes in painful diabetic peripheral neuropathy significant predictors in one of the cluster-specific regression models before we optimized them using the LASSO method, although demographic and other medical care characteristics are often used as a basis for subgroup analyses in clinical studies of pain [7,37,[39][40][41][42]. The optimized regressions for prediction, however, had fewer variables suggesting that when prediction is the goal, specific regression regularization techniques are warranted. The robust adjusted R 2 and root mean square error results for the regressions (Table 4) support the strong predictive capability afforded by using regression models with lagged variables as inputs for predicting the magnitude of response to pregabalin when useful subgroups (clusters) of patients are created first, as was found for prior work [11]. By using regressions that take into account past values, we were better able to evaluate the possible importance of other variables for the purposes of prediction (as opposed to description). The regressions predicted pain responses even in OS patients who had not matched, thus providing significant validation for these regressions that were generated from the entirely different matched dataset. Plots of observed vs. predicted pain scores by cluster shown in Fig 2  validate the good predictive performance of the regressions. Two-sample t-test results corroborate these findings (S6 Table).
The cluster-specific regressions are the basis for the Virtual Lab's performance in predicting a novel patient's treatment outcomes because they establish and provide for not only weekly pain scores, but also weekly patient variations of response through incorporation of 'on-treatment' variables as well as fixed ones. Fig 1 provides an overview of the various steps in the prediction process. Fig 4 shows an example of the output from the simulation steps described in Fig 1. Hence, proper assignment of a novel patient to a specific cluster is very important given the role of the regressions in the platform.
Consequently, we focused substantial efforts in this next generation of the platform on how to broaden the characteristics of patients who could be assigned to a cluster-as well as to represent the uncertainty in the predicted outcomes accordingly, while maintaining or enhancing the accuracy of the prediction.
The PPV results (Table 5) indicated that 68.6-83.8% of the time (depending on the cluster) and 77.8% overall, we correctly predicted patients who attain the 50% reduction from baseline in pain scores. Correct predictions (PPV and accuracy) improved to 91.6% overall with a range of 86.5-93.9% at the 30% threshold for pain score reduction. From a clinical decisionmaking point of view, many clinicians and patients would agree that a 30% reduction in pain is a clinically meaningful therapeutic response [43].
We also explored a way to introduce information about outcomes that could occur after 6 weeks of treatment by utilizing data from longer RCTs (12 or 13 weeks) even though we lacked Time series analysis to predict pregabalin outcomes in painful diabetic peripheral neuropathy OS data for this time period. Overall, 81.2% of the patients who had or had not achieved a 50% pain reduction at 6 weeks maintained that outcome at 12 or 13 weeks (S6 Table). Thus, we estimated-using our monotonicity thresholds-the likelihood for a novel patient after 6 weeks of treatment improving his/her response to reach the 50% threshold in pain score reduction (11.3%) or for falling below that threshold (7.6%). We discovered that individuals with similar monotonicity were showing similar evolutions in pain scores after week 6 (falling into the categories of remaining at the same pain score or increasing or decreasing). This estimation was possibly based on identifying similar patients in the cluster rather than on relying upon global average rates for these phenomena. We incorporated this extension of prediction capability into the Virtual Lab by calculating the overall monotonicity of the aggregation of 1000 possible pain trajectories for the simulated novel patient. We also calculated the final pain score of the simulated novel patient as the median of the 1000 simulated final pain scores. We then identified the patients in the expanded RCT dataset that are closer to the simulated novel patient (using the kNN approach) on the basis of monotonicity at week 6 and pain at week 6. We used these nearest neighbors to calculate the monotonicity from week 6 to 12/13 in order to identify to which category the novel patient could be assigned (decreased, maintained, or increased). The results of how many of the RCT nearest neighbor patients fell into each category are plotted in a histogram. S1 Fig shows an example of the histogram and how the monotonicity analyses could be incorporated into the Virtual Lab output to determine whether a novel patient would or would not maintain pain score in the subsequent 6-7 weeks.
These results demonstrate the potential utility of integrating patient data from RCT and OS sources to identify patient subpopulations that will manifest different degrees of therapeutic response to a particular treatment. We have shown how a combination of carefully selected modeling and microsimulation approaches can be used to predict which patients with pDPN may have a higher probability of responding to pregabalin and at which dose. These findings also highlight the importance of tracking changes over time in certain variables (eg, pain and PRSI) after the initiation of treatment in order to adjust baseline predictions accordingly. The use of ABMS provided a mechanism for introducing patient variability into the analyses by integrating the on-treatment variables that changed over time with fixed patient characteristics and simulating the resulting on-treatment pain response. Future research can explore the range of potential applications for this approach of clustering, matching, and time series regressions with lagged variables as inputs.

Limitations and future work
A limitation of our analyses is that for all studies included, we used data only from those subjects who completed the studies, so more work would be needed to determine if the results are generalizable to other patient populations (eg, patients who discontinued the studies owing to adverse events). However, prior studies that occurred over a decade have found that treatment-emergent adverse events in pregabalin-treated patients infrequently lead to discontinuation [39] and that the most common side effects of pregabalin resolve over time [44]. Given that we are including lagged variables as inputs, events not reflected in the variable included could affect the time series data. Using traditional regression techniques with added variables may be correlated (e.g., high variability, correlated predictor variables), which is a limitation that motivated us to add a penalized regression method (LASSO) to regularize the predictive capability of the cluster-specific regressions. Subsequent Virtual Lab work will focus on predicting discontinuations and adverse events using these same approaches, along with refining how we predict outcomes beyond 6 weeks. Also, we have not yet tried prospectively to predict actual new patient outcomes using our platform. Future work will need to examine how such efforts could work in actual practice. We need to explore how the proposed approach, including a capability for dynamic real-time updates, could be used in providing care to patients. Finally, these findings are specific to patients with pDPN, and not all patient variables associated with pDPN have yet been studied. Other clinical circumstances may require less or more complex approaches to enable prediction.

Conclusions
The inclusion of data from six additional RCTs expanded the combinations of patient characteristics and outcomes that could be predicted using our methodological approach (consisting primarily of hierarchical cluster analysis, CEM, regressions optimized based on use of shrinking and penalty search algorithms, and microsimulation). In addition, matching of substantially more RCT patients to OS patients strengthened the representativeness of OS patients while maintaining strong overall predictive power. These analyses reinforced the predictive value of utilizing patient subgroups that reflect more complex patterns of fixed patient characteristics and 'on-treatment' variables that change over time.
Supporting information S1