Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Use of a machine learning framework to predict substance use disorder treatment success

  • Laura Acion ,

    Affiliations Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina, Iowa Consortium for Substance Abuse Research and Evaluation, University of Iowa, Iowa City, Iowa, United States of America

  • Diana Kelmansky,

    Affiliation Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina

  • Mark van der Laan,

    Affiliation Division of Biostatistics, University of California, Berkeley, California, United States of America

  • Ethan Sahker,

    Affiliations Iowa Consortium for Substance Abuse Research and Evaluation, University of Iowa, Iowa City, Iowa, United States of America, Counseling Psychology Program, Department of Psychological and Quantitative Foundations, College of Education, University of Iowa, Iowa City, Iowa, United States of America

  • DeShauna Jones,

    Affiliation Iowa Consortium for Substance Abuse Research and Evaluation, University of Iowa, Iowa City, Iowa, United States of America

  • Stephan Arndt

    Affiliations Iowa Consortium for Substance Abuse Research and Evaluation, University of Iowa, Iowa City, Iowa, United States of America, Department of Psychiatry, Roy J and Lucille A Carver College of Medicine, University of Iowa, Iowa City, Iowa, United States of America, Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa, United States of America

Use of a machine learning framework to predict substance use disorder treatment success

  • Laura Acion, 
  • Diana Kelmansky, 
  • Mark van der Laan, 
  • Ethan Sahker, 
  • DeShauna Jones, 
  • Stephan Arndt


There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.


There are several methods for building prediction models. Prediction models are often generated using some form of linear or logistic regression—e.g. [14]. More recently, other learning algorithms such as random forests (RF) or artificial neural networks (ANN) are being used for prediction in the health sciences—e.g. [57]. These new techniques may be able to enhance prediction, thus improving the chances of matching patients to the most effective treatments.

The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best prediction method. Super learning (SL) [8] is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. This property of SL is both theoretically [8] and empirically supported [9]. The goal of this paper is to introduce various prediction methods, some of which are novel to the field of substance use disorders (SUD) treatment.

Accounting for individual patient characteristics to maximize positive outcomes is at the heart of precision medicine. SUD treatment is one of the many areas that can greatly benefit from optimizing patients’ pathways to the best possible treatment outcome. For example, United States estimates indicate only 19.8% of cases in need of alcohol use disorders treatment were ever treated [10]. Identifying key predictors of successful treatment can serve to discover disparities, strengths and weaknesses in service delivery, eventually increasing treatment success and reducing unmet treatment needs [1113].

The first step in identifying key characteristics for matching patients to the most effective treatments is to predict who will succeed at a given treatment. The importance of this topic is reflected by the rich literature aiming to identify patients’ characteristics increasing the efficacy of SUD treatment (e.g. [4, 1416]). The methodological focus of this paper leaves a substantive literature review regarding SUD treatment success out of its scope.

Some of the literature predicting successful SUD treatment takes advantage of large publicly available datasets such as the Substance Abuse and Mental Health Services Administration Treatment Episode Data Set—Discharges (TEDS-D) [17]. The main advantage of datasets such as TEDS-D is the large number of records available including important patient characteristics and treatment features. Due to the computer power required when large data sets with numerous predictors are used, the advantage of working with large datasets or big data used to be also the greatest disadvantage for applying sophisticated prediction algorithms such as SL or other sophisticated machine learning methods. Computer power is currently less of a problem thanks to technological advances and an active open source software developer community. For instance, the open source community has optimized SL and other machine learning algorithms for the analysis of big data using open source software [18].

Most of the methods featured in this paper were not used to analyze TEDS-D before. The aim is to evaluate what method works best at predicting successful treatment using a real life large database. Rather than a dataset tailored to showcase the methods properties, we use a dataset commonly used in the SUD field. We hypothesized that SL will generate the best prediction model as measured by the area under the receiver operating characteristic curve (AUC) evaluated in a test sample not included in the training sample used to fit all prediction models.

The use of SL as a prediction tool of success in SUD treatment may contribute to the literature by bettering the ability to identify treatment outcome disparities that, when addressed, may lead to improve patients’ treatment outcomes.

Materials and methods


To illustrate different prediction analytic approaches, we focused in SUD outpatient treatment for Hispanics during adulthood using publicly available TEDS-D 2006–2011 data [17]. TEDS-D is an excellent example of a large administrative dataset that may be of interest to addiction researchers in real life. This dataset allows us to illustrate the use of the methodologies introduced in this work within a realistic setting.


Treatment completed was considered a successful treatment discharge status, all other treatment discharge reasons (e.g., left against professional advice, incarcerated, other) were considered as indicators of non-successful treatment. Treatment completion is a standard process outcome measure because it predicts longer-term outcomes such as less future criminal involvement, fewer readmissions, employment and income one year after treatment [1922].


Twenty eight predictors recorded by TEDS-D were included in the analysis. Predictors include 10 patient characteristics (i.e., age, gender, race–e.g., White, Black—, ethnicity–indicating patient’s specific Hispanic origin, marital status, education, employment status, pregnant at time of admission, veteran status, and living arrangement), 3 treatment characteristics (i.e., intensity, medication-assisted opioid therapy, and length of stay), principal source of referral, summary of type of problematic substance (with categories “alcohol only”, “other drugs only”, or “alcohol and drugs”), and mental health problem. TEDS-D records thorough information about substances of misuse. This includes the following 12 predictors: primary, secondary, and tertiary substance problem, usual route of administration, frequency of use, and age at first use. The substances include: alcohol, cocaine/crack, marijuana/hashish, prescription opiates/synthetics, and methamphetamine use. Several other drug use categories were collapsed for analysis because of low percentages.

Inclusion criteria.

TEDS-D includes all admissions/discharges rather than individuals. Consequently, only records that indicate the individual had no prior SUD treatment were included in the analyses. Part of our previous work focuses on racial and ethnic minorities [11, 12, 23, 24]. It is known that racial and ethnic minorities vary in their treatment access and success levels [2527]. Thus, we restricted our analysis only to cases indicating a Hispanic/Latino ethnicity, 18 years old or older, and treatment in outpatient service settings. We focused on outpatient service settings because criteria for treatment completion/success (i.e., the outcome of interest) and duration for other types of services (e.g., 24-hour inpatient, detoxification-only) and outpatient services are often very different. This is a sample arbitrarily chosen to exemplify the different analytical approaches. The choice is based on our previous knowledge in this field and not on the results obtained after using the methods described in the following paragraphs.

Exclusion criteria.

Records with missing data in any of the predictors, the outcome, or characteristics used for determining inclusion in the study were excluded from the analysis. Since not all states collect the same information for their patients, from a total of 9,829,536 records, 4,385,825 (45%) had all required data complete.

These inclusion and exclusion criteria did not affect the performance of SL when compared to the rest of the analytical strategies used. A total of 99,013 records representing unique individuals were selected according to the inclusion and exclusion criteria and were used in the analysis. This sample was separated in a training set with 80% of the sample (n = 79,210) and a test set with the remaining 20% (n = 19,802).

Because these data represent public information and there is no subject identification, the University of Iowa Human Subjects Office Institutional Review Board exempted this study from review.

Statistical analysis

The goal of the analysis was to compare results of classical analytical strategies that are commonly used to address prediction of treatment success (e.g., logistic regression) alongside results of newer methods for prediction (e.g., ANN). We also compared SL results. SL is an ensemble machine learning methodology that encompasses all other methodologies in its library and has demonstrated (both theoretical and empirical) superiority for prediction [8, 9].

Formally, the analysis is as follows. Let W = {W1, , Wk} denote the predictors of interest and Y represent the binary outcome treatment success (Yes/No). Let O = (W, Y) be a random variable such that O~P0 (i.e., the true probability distribution of O is P0). Since only records that indicate the individual had no prior treatment in a drug or alcohol program were included in the analyses, we assumed that the individuals observed can be represented as independent and identically distributed observations of the random variable O. For each individual i, Yi and Wi are observed. We sought to estimate —i.e., the probability of succeeding in treatment given the predictors of interest. is unknown. Hence, we aimed to find the best estimator of . This was achieved by maximizing the AUC. We did this using logistic regression, 3 types of penalized regression, RF, ANN, and the corresponding SL that includes these algorithms in its library. SL also maximized the AUC.

We used 2 mechanisms to avoid overfitting. The analysis used 2-fold cross-validation (CV) using 80% of the data set (n = 79,210) and the final model was validated using 19,802 randomly selected individuals that were only included in the test set (Fig 1). All analytic approaches explored were compared using the AUC in the test set. The best analytic strategy for predicting treatment success was identified as the model maximizing AUC. The AUC is a convenient measure of prediction success that can be interpreted as the probability that any of the algorithms ranks a randomly chosen successful patient higher than a randomly chosen unsuccessful patient [28]. AUC ranges from 0 to 1, AUC = 1 means perfect prediction and AUC = 0.5 suggests chance levels of prediction.

We used at least 2 different configurations for each analytic strategy compared. The first approach included all 28 predictors available. Since it can be useful to reduce the number of predictors considered and simplify the final prediction formula, we also selected a subset of the 10 predictors with highest variable importance in a RF model. In decreasing order of importance, these predictors were: length of stay, age, principal source of referral, primary problematic substance, its age of first use, and its frequency of use, employment level, SUD type (i.e., only alcohol, only drugs, or both alcohol and drugs), education level, and patient’s specific Hispanic origin (e.g., Puerto Rican, Mexican, Cuban).

For each set of predictors, the following algorithms were compared: logistic regression, least absolute shrinkage and selection operator (lasso), ridge, and elastic net penalized regressions [29], RF [30], ANN [29, 31], and SL [8]. Since none of the aforementioned regression models considered interaction effects in the predictive model, an additional model including terms for all predictors and selected 2-way interactions was also added to the SL library for each type of regression. Two-way interactions were initially screened using all possible logistic regression models of 2 predictors at a time and their 2-way interaction. All interactions with p-values<0.0001 were included along with all predictors in the regression models of the SL library. We compared 17 algorithms/algorithm configurations.

There is a multitude of other analytic strategies that could be chosen [29]; however, to illustrate SL use for predicting treatment success when compared to other methods, we consider the chosen algorithms are adequate. In this context, where the data generating model is unknown, SL will be superior to the rest of the methodologies, regardless of the set of algorithms initially chosen. Some characteristics of the chosen algorithms are presented subsequently.

Logistic regression.

Logistic parametric regression is the most commonly used algorithm for prediction in the SUD treatment outcomes field. Logistic regression is easily implemented and interpreted. However, logistic regression assumptions are strong and, since the true data generating model is unknown, these assumptions are usually violated. For example, including many predictors, their interactions, and/or other higher order terms in a logistic regression model does not guarantee that it is the best model, due, for example, to collinearity between the predictors included (which increases variability) or model misspecification (which introduces bias).

Penalized regression.

Penalized regression (such as lasso, ridge regression, or elastic nets) offers an alternative to parametric regression models. The 3 types of penalized regression applied in this work vary in their variance/bias tradeoffs depending on the characteristics of the predictor set. For example, lasso will select only one term from a set of correlated predictors. This may not be appropriate. In fact, when the number of predictors is small compared to the number of independent observations, ridge regression outperforms lasso when variables are highly correlated. Additionally, if the true data generating model has only a few predictors but the candidate models have a large number of predictors; lasso may eliminate predictors that were not in the data generating model. On the other hand, ridge regression will keep all terms in the final model. The elastic net penalized regression provides some balance between lasso and the ridge regressions.

In the example presented here, there are 28 categorical predictors that correspond to 135 terms when the model is parametrized using dummy variables. The screening step for the most relevant 2-way interactions of these 135 terms, preselected 257 2-way interaction terms. Thus, the regression models including all predictors and selected interactions had 393 terms including the intercept. While logistic regression estimated 393 parameters for this model, lasso estimated parameters only for terms uncorrelated with each other and zeroed-out the rest; ridge regression kept all 393 but down weighted each term. In this way, penalized regression allows for tuning large models adapting them to the information provided by the data.

Random forests.

RF is a recursive partitioning method popular in many fields with high-dimensional data (e.g., genomics). RF can evaluate a number of predictor variables even in the presence of complex interactions, including those that are not possible to model using regression. RF is an ensemble of classification and regression trees constructed on bootstrap samples. Unlike individual trees, RF is more protective against overfitting.

Artificial neural networks.

An ANN uses interconnected nodes within various layers to explain an outcome given a set of predictors. The relationships between the nodes are defined by weights calculated using a given rule. The initial weights are preassigned by the analyst. The ANN algorithm iterates adjusting the weights. At the end of each iteration, the performance at outcome prediction is evaluated. ANNs efficiently generate non-linear classification rules but can be prone to overfitting. More recently, some types of ANN are referred to as deep learning [31]. Deep learning allows modeling multiple levels of non-linearity in the data and is scalable to large datasets and big data in general. We used deep learning ANN with hidden layer sizes of 200.

Super learning.

SL is a generalization of the stacking algorithm [32], an ensemble machine learning method that takes a weighted average of all other algorithms considered for prediction and produces a single prediction function (PF) with optimal tradeoff between variance and bias. SL is very flexible for learning from the data as it combines the strengths of all methodologies considered (including different configurations of the same algorithm) while minimizing modelling flaws. Another advantage of SL is that it eliminates the need to select a priori a single or a few methodologies for the analysis. SL allows analyzing the data using simultaneously all the methodologies the researcher considers suitable.

Fig 1 depicts the work flow used to analyze the data and details all the steps necessary for running SL. The input of the analysis consists of the training and test data sets together with the algorithmic library. Since we used 2-fold CV to obtain each algorithm PF, as well as, the SL PF, the training set was initially partitioned in 2 blocks. Each algorithm in the library was fitted using each block independently. We used the data block excluded from the model fitting to calculate the CV AUC for each algorithm in the library. We averaged both CV AUCs, resulting in a single training set CV AUC for each algorithm. Up to this step, model fitting follows a regular 2-fold CV modelling path. The PF of discrete SL, a simpler version of SL, is the PF of the algorithm with the minimum CV AUC.

However, SL performs better when a weighted combination of the algorithms’ PFs is used. Thus, the next step for obtaining SL PF is to calculate a weight for each algorithm PF. This is done regressing Y on the values of Y predicted by each algorithm in the library. Next, each algorithm is fitted using the whole training set and the SL PF is obtained by applying the estimated weights to the algorithm predictions for each observation.

It can be demonstrated that using 2-fold CV, the procedure can end here and the AUCs of fitting each algorithm and SL to the whole training dataset would suffice for SL to outperform the rest of the algorithms without overfitting. However, we included an additional validation step: all the PFs obtained with the training set, where used to predict SUD treatment success in the test data set. We calculated AUCs compared to evaluate all models adjusted using the test dataset.

A thorough description of all the algorithms used in this work is out of the scope of this manuscript. The interested reader will find further details about SL in van der Laan and Rose [33] and about the rest of the aforementioned methodologies in Friedman et al [29] and Bengio [31].

Models were fitted using the open source R programming language [34] and the H2O R interface version [35] that optimizes all the analytical methods used for large datasets. The h2oEnsemble package version 0.1.8 [36] was used to fit the SL model. We set all tuning parameters for each algorithm in the SL library (e.g., the ANN implementation used in this manuscript has over 20 parameters) to their default values. The analysis took about 2.5 hours to run in a Windows 7 Professional desktop computer with 8-core i7-3770 3.40 Ghz CPU and 8 Gb RAM. Most of the analytic time was used to fit the four regression models including 2-way interaction terms. The other 13 algorithms, including SL, required only about 6 minutes of the 2.5 hours. AUC confidence intervals and variances were estimated using the Delong and colleagues methodology [37] as implemented in the R package pROC [38]. Briefly, Delong et al [37] used the equality between AUC and the Mann-Whitney U statistic and asymptotic normality to analytically derive variance and confidence interval estimators for AUC.


Treatment success rate and characteristics for the patients included in the analyses are shown in Table 1. For brevity, only gender and the top 10 most important predictors as identified by RF were included in the table.

Table 2 shows the AUC in the test set (N = 19,802) for the 17 algorithms/algorithm configurations compared. All AUC are between 0.793 and 0.820. This indicates that, for this set of models, the probability that any of the algorithms will rank a randomly chosen successful patient higher than a randomly chosen unsuccessful patient is between 0.793 and 0.820. This is usually considered a good performance for prediction models.

Table 2. AUC in the test set (N = 19,802) for each algorithm and algorithm parametrization used.

As hypothesized, SL shows the largest AUC. SL performance is very closely followed by RF including all predictors. For this particular example, the algorithm with the worst performance is logistic regression including all predictors and selected 2-way interactions. The relative improvement in AUC of SL is 3.3% when compared to the worst prediction method. Also, the AUC for SL has the smallest estimated variance. The rest of the models considered have estimated AUC variances up to 19% higher than SL. As revealed by Fig 2, SL AUC is higher than AUC for all other models, with the exception of the RF model including all predictors.

Also, in most cases, the models including all predictors performed slightly better (but not statistically so) than the models including only the top 10 predictors selected in the initial screening step to simplify the prediction model. All parametric regression models both penalized and non-penalized approaches performed almost identically with respect to AUC for models not including interactions. This behavior could have been anticipated since a model with only 28 predictors can be considered too small for a data set this size. However, for models including all predictors and selected 2-way interactions, lasso outperformed the other 3 regression models. AUC for lasso was higher than for ridge and logistic regressions.


This work compared 17 models for predicting successful completion of SUD treatment. Both traditional and newer analytic strategies were equated in a nationwide large dataset. Particularly, super learning, an ensemble machine learning algorithm, was introduced for the first time to the study of SUD treatment success using large datasets. As expected, SL showed the best predictive performance.

In this particular example, SL superiority was meager when compared to more traditional models such as logistic regression. This result was not at all evident before analyzing these data. We understand this may dampen the interest in this work. However, we think that the finding that there are no major differences in the solutions of all methods is important in itself and worth discussing. Unlike what happens with other predictive models, the lack of substantive difference between SL and the rest of the models used means that: a) any of these methods could be used for these data and b) there are no major problems in the assumptions of the different models used. These findings are never evident before or after analyzing a dataset when analysis is performed using a traditional analytical approach. Sensitivity analysis is one approach used to address this limitation of traditional methods. In this sense, SL serves as a tool to streamlining and improving sensitivity analysis for prediction. SL does not require that a model is chosen a priori, at the end of the analysis results in the best model, and, if there are no substantial differences with the rest of the models, SL provides a solid justification for the use of any of the models. SL is a valuable, but underutilized, tool for obtaining robust prediction results as required, for example, by the National Institutes of Health [39].

In addition, even small improvements in prediction can have a high impact depending on each particular problem. In this case, a small prediction improvement could significantly impact patients’ outcomes and/or treatment costs. In other cases, small prediction improvements could save lives. Since SL is theoretically proven to perform as well or better than the best model included in its library, the results presented here are relevant to illustrate the use of SL in the SUD treatment outcome prediction field.

In other applications, SL has shown 44% and 12% increase in performance when compared to ANN and RF, respectively [40]. Also, SL has shown as good or significantly better predictive performance than linear regression and other methods when used in 13 real datasets—see Fig 3.4 in [40]. In fact, stacking algorithms similar to SL are usually among the type of algorithms that win prediction contests such as the Heritage Provider Network Health Prize [41]. The Heritage Provider Network Health Prize was a competition to develop a predictive algorithm helping identify patients most likely to be admitted to healthcare providers.

The results of this work should be interpreted as an illustration of the presented statistical methods in a realistic setting. The practical conclusions should be seen in light of the data limitations. Analysis was restricted to Hispanics and very few features describing the type of treatment and how treatment was administered were included. There may be considerable reservation with using treatment completion as an indicator of treatment success. While post-treatment follow-up outcome measures of abstinence or reduced use would be best, such nation-wide data collection efforts were out of the scope of this project. While these are important limitations to inform treatment success prediction, these issues do not affect the main goal of this work.

Traditional analytical approaches for prediction may not only consider the prediction performance of a model, but also its ecological validity for a given research area. That is one of the reasons logistic regression could be preferred over SL for predicting SUD treatment success. While the model proposed by SL is designed to excel at prediction, it is not meant to be interpreted or to be used for effect estimation. If the true or close to true data generating model is included in the SL library, SL will likely have a low gain in predictive performance and its lack of interpretability could discourage its use. However, most of the time, researchers are uncertain about the model that generated the data. When ecological validity is important in a prediction context, SL could be used as a simpler model validator if an interpretable model in its library is close to SL in predictive performance. In the example presented here, SL has significantly better predictive performance than logistic regression; thus, it would be difficult to defend any of the logistic regression models used in this application in lieu of SL. On the other hand, these data show that, if the ecologically validity of RF models could be easily assessed, the use of the RF model with all the predictors could be used instead of the SL model.

When the goal of the research is to interpret the effect of different predictors in the outcome, SL can be used in the context of the targeted learning framework [33]. SL is the first stage for obtaining doubly-robust effect estimates using targeted maximum loss-based estimation (TMLE) [42]. TMLE is an estimation technique that allows for double robust estimation of effects with fewer untestable assumptions than the usual parametric methods used for estimation. TMLE can be applied either in a causality framework or in a merely associative framework (additional details about TMLE is out of the scope of this work, we recommend [33] for further information). Using the prediction model suggested by SL and TMLE, one could answer questions such as “What is the treatment success rate difference between Hispanics with comorbid psychiatric disorders and those without comorbid disorders?” and “Is this difference different from zero?” In contrast, statistical inference is not possible in machine learning methodologies such as RF or deep learning. This important limitation of machine learning algorithms is overcome by targeted learning, the first analytical framework that provides estimates and hypothesis testing while using machine learning [43]. Future directions of this work include determining the benefits of applying targeted learning for different effect estimations and inference in the addiction field.


This work was partially funded by postdoctoral fellowship 2651/2014 awarded to Dr. Acion by CONICET. The authors thank all open source software developers that made the analysis possible.

Author Contributions

  1. Conceptualization: LA DK MvdL ES DSJ SA.
  2. Data curation: LA ES SA.
  3. Formal analysis: LA DK MvdL SA.
  4. Funding acquisition: LA DK.
  5. Investigation: LA ES DSJ SA.
  6. Methodology: LA DK MvdL ES DSJ SA.
  7. Software: LA.
  8. Supervision: DK SA.
  9. Validation: DK.
  10. Visualization: LA SA.
  11. Writing – original draft: LA.
  12. Writing – review & editing: LA DK MvdL ES DSJ SA.


  1. 1. Acion L, Ramirez MR, Jorge RE, Arndt S. Increased risk of alcohol and drug use among children from deployed military families. Addiction. 2013;108(8):1418–25. pmid:23441867
  2. 2. Alang SM. Sociodemographic Disparities Associated With Perceived Causes of Unmet Need for Mental Health Care. Psychiatr Rehabil J. 2015;38(4):293. pmid:25664758
  3. 3. Glasheen C, Pemberton MR, Lipari R, Copello EA, Mattson ME. Binge drinking and the risk of suicidal thoughts, plans, and attempts. Addictive Behaviors. 2015;43:42–9. pmid:25553510
  4. 4. Sahker E, Acion L, Arndt S. National Analysis of Differences Among Substance Abuse Treatment Outcomes: College Student and Nonstudent Emerging Adults. Journal of American College Health. 2015;63(2):118–24. pmid:25470217
  5. 5. Gowin JL, Ball TM, Wittmann M, Tapert SF, Paulus MP. Individualized relapse prediction: Personality measures and striatal and insular activity during reward-processing robustly predict relapse. Drug and alcohol dependence. 2015;152:93–101. pmid:25977206
  6. 6. Launay C, Rivière H, Kabeshova A, Beauchet O. Predicting prolonged length of hospital stay in older emergency department users: use of a novel analysis method, the Artificial Neural Network. European journal of internal medicine. 2015;26(7):478–82. pmid:26142183
  7. 7. Pflueger MO, Franke I, Graf M, Hachtel H. Predicting general criminal recidivism in mentally disordered offenders using a random forest approach. BMC psychiatry. 2015;15(1):1.
  8. 8. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Statistical applications in genetics and molecular biology. 2007;6(1).
  9. 9. Polley EC, Rose S, van der Laan MJ. Super learning. Targeted Learning: Springer; 2011. p. 43–66.
  10. 10. Grant BF, Goldstein RB, Saha TD, Chou SP, Jung J, Zhang H, et al. Epidemiology of DSM-5 alcohol use disorder: results from the National Epidemiologic Survey on Alcohol and Related Conditions III. JAMA psychiatry. 2015;72(8):757–66. pmid:26039070
  11. 11. Arndt S, Acion L, White K. How the states stack up: Disparities in substance abuse outpatient treatment completion rates for minorities. Drug and alcohol dependence. 2013;132(3):547–54. pmid:23664124
  12. 12. Sahker E, Toussaint MN, Ramirez M, Ali SR, Arndt S. Evaluating racial disparity in referral source and successful completion of substance abuse treatment. Addictive behaviors. 2015;48:25–9. pmid:25935719
  13. 13. Wells K, Klap R, Koike A, Sherbourne C. Ethnic disparities in unmet need for alcoholism, drug abuse, and mental health care. American Journal of Psychiatry. 2001;158(12):2027–2032. pmid:11729020
  14. 14. Compton WM III, Cottler LB, Jacobs JL, Ben-Abdallah A, Spitznagel EL. The Role of Psychiatric Disorders in Predicting Drug Dependence Treatment Outcomes. American Journal of Psychiatry. 2003;160(5):890–5. pmid:12727692
  15. 15. Sahker E, McCabe JE, Arndt S. Differences in successful treatment completion among pregnant and non-pregnant American women. Archives of Women's Mental Health. 2015;19(1):79–86. pmid:25824855
  16. 16. Simpson DD, Joe GW. Motivation as a predictor of early dropout from drug abuse treatment. Psychotherapy: Theory, research, practice, training. 1993;30(2):357.
  17. 17. United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Office of Applied Statistics. Treatment Episode Data Set—Discharges (TEDS-D)—Concatenated, 2006 to 2011. Inter-university Consortium for Political and Social Research (ICPSR) [distributor]; 2014.
  18. 18. LeDell EE. Scalable Ensemble Learning and Computationally Efficient Variance Estimation: University of California, Berkeley; 2015.
  19. 19. Zarkin GA, Dunlap LJ, Bray JW, Wechsberg WM. The effect of treatment completion and length of stay on employment and crime in outpatient drug-free treatment. Journal of Substance Abuse Treatment. 2002;23(4):261–71. pmid:12495788
  20. 20. Garnick DW, Lee MT, Horgan CM, Acevedo A, Workgroup WCPS. Adapting Washington Circle performance measures for public sector substance abuse treatment systems. Journal of Substance Abuse Treatment. 2009;36(3):265–77. pmid:18722075
  21. 21. Evans E, Li L, Hser Y-I. Client and program factors associated with dropout from court mandated drug treatment. Evaluation and program planning. 2009;32(3):204–12. pmid:19150133
  22. 22. TOPPS-II Interstate Cooperative Study Group. Drug treatment completion and post-discharge employment in the TOPPS-II Interstate Cooperative Study. Journal of Substance Abuse Treatment. 2003;25(1):9–18. pmid:14512103
  23. 23. Sahker E, Yeung C, Loh Y, Park S, Arndt S. Asian American and Pacific Islander Substance Use Treatment Admission Trends. Drug and alcohol dependence. 2017;171:1–8. pmid:27988403
  24. 24. Marzell MS E.; Pro G.; Arndt S. A brief report on Hispanic youth marijuana use: Trends in substance abuse treatment admissions in the United States. Journal of ethnicity and substance abuse. 2016:1–10.
  25. 25. Substance Abuse and Mental Health Services Administration. The NSDUH Report: Need for and Receipt of Substance Use Treatment among Hispanics. Rockville, MD: 2012.
  26. 26. Substance Abuse and Mental Health Services Administration. The NSDUH Report: Need for and Receipt of Substance Use Treatment among American Indians or Alaska Natives. Rockville, MD: 2012.
  27. 27. Substance Abuse and Mental Health Services Administration. Need for and Receipt of Substance Use Treatment among Blacks. 2013.
  28. 28. Acion L, Peterson JJ, Temple S, Arndt S. Probabilistic index: An intuitive non‐parametric approach to measuring the size of treatment effects. Statistics in medicine. 2006;25(4):591–602. pmid:16143965
  29. 29. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning: Springer series in statistics Springer, Berlin; 2001.
  30. 30. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
  31. 31. Bengio Y. Learning deep architectures for AI. Foundations and trends® in Machine Learning. 2009;2(1):1–127.
  32. 32. Wolpert DH. Stacked generalization. Neural networks. 1992;5(2):241–59.
  33. 33. Van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data: Springer Science & Business Media; 2011.
  34. 34. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2016.
  35. 35. Aiello SK, Tom; Maj, Petr; with contributions from the team. h2o: R Interface for H2O. R package version 2015.
  36. 36. LeDell EE. h2oEnsemble: H2O Ensemble Learning. R package version 0.1.8. 2016.
  37. 37. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988:837–45. pmid:3203132
  38. 38. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics. 2011;12(1):1.
  39. 39. National Institutes of Health. Research instructions for NIH and other PHS agencies 2016 [cited 2017 01/24/2017]. Available from:
  40. 40. Rose S. Mortality risk score prediction in an elderly population using machine learning. American journal of epidemiology. 2013;177(5):443–52. pmid:23364879
  41. 41. Brierley P, Vogel D, Axelrod R. Heritage Provider Network Health Prize Round 1 Milestone Prize: How we did it–Team ‘Market Makers’. 2011.
  42. 42. Rose S, van der Laan MJ. Why TMLE? Targeted Learning: Springer; 2011. p. 101–18.
  43. 43. Rose S. Targeted Learning for Variable Importance. Handbook of Big Data. 2016:411.