Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer

  • Christine Eulenburg ,

    c.h.zu.eulenburg@umcg.nl

    Affiliation Medical Statistics and Decision Making, Department for Epidemiology, University Medical Center Groningen, Groningen, The Netherlands

    ORCID http://orcid.org/0000-0001-9365-5548

  • Anna Suling,

    Affiliation Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

  • Petra Neuser,

    Affiliation KKS Philipps University Marburg, Marburg, Germany

  • Alexander Reuss,

    Affiliation KKS Philipps University Marburg, Marburg, Germany

  • Ulrich Canzler,

    Affiliation Dept. of Gynecology and Obstetrics, University of Dresden, Dresden, Germany

  • Tanja Fehm,

    Affiliations Dept. of Gynecology University Medical Center Duesseldorf, Duesseldorf, Germany, Dept. of Gynecology and Obstetrics, University Hospital Tuebingen, Tuebingen, Germany

  • Alexander Luyten,

    Affiliation Dept. of Gynecology, Obstetrics and Gynecologic Oncology, Klinikum Wolfsburg, Wolfsburg, Germany

  • Martin Hellriegel,

    Affiliation Dept. of Gynecology, Georg-August-University Goettingen, Goettingen, Germany

  • Linn Woelber,

    Affiliation Department of Gynecology and Gynecologic Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

  • Sven Mahner

    Affiliation Department of Gynecology and Obstetrics, Ludwig-Maximilians-University, Munich, Germany

Propensity Scoring after Multiple Imputation in a Retrospective Study on Adjuvant Radiation Therapy in Lymph-Node Positive Vulvar Cancer

  • Christine Eulenburg, 
  • Anna Suling, 
  • Petra Neuser, 
  • Alexander Reuss, 
  • Ulrich Canzler, 
  • Tanja Fehm, 
  • Alexander Luyten, 
  • Martin Hellriegel, 
  • Linn Woelber, 
  • Sven Mahner
PLOS
x

Abstract

Propensity scoring (PS) is an established tool to account for measured confounding in non-randomized studies. These methods are sensitive to missing values, which are a common problem in observational data. The combination of multiple imputation of missing values and different propensity scoring techniques is addressed in this work. For a sample of lymph node-positive vulvar cancer patients, we re-analyze associations between the application of radiotherapy and disease-related and non-related survival. Inverse-probability-of-treatment-weighting (IPTW) and PS stratification are applied after multiple imputation by chained equation (MICE). Methodological issues are described in detail. Interpretation of the results and methodological limitations are discussed.

Introduction

One of the pertinent challenges in estimating causal treatment effects from observational data is to control for confounding bias. The lack of randomization can lead to systematic differences between treated and untreated subjects. In this case, observed differences in outcome cannot securely be attributed to treatment exposure. Propensity scoring (PS) is the established statistical approach to reduce bias resulting from imbalanced measured covariate distributions across treatment groups [15]. The propensity score (PS) e(xi) for a subject i is the probability that the subject receives the treatment Zi, given its individual vector of covariates xi, e(xi) = P(Ti = 1|xi). Zi = 1 applies if subject i receives the treatment, otherwise Zi = 0. Various PS methods exist including PS matching[2], PS stratification[6] PS covariate adjustment[7] and inverse-probability-of-treatment-weighting (IPTW)[8]. All PS models are very sensitive to missing values, which are regularly encountered in retrospective studies. Patients or alternatively covariates with missing data have to be excluded from the analysis. Different approaches to solve the problem of missing values in PS analyses have been studied[914]. The multiple-imputation-by-chained-equations (MICE) has been demonstrated to be an appropriate method to deal with missing values, if they are missing at random[1316]. With this method, missing values are replaced by repeatedly drawn values from conditional probability distributions.

The results of the primary analysis and of one propensity score approach using available data of the AGO-CaRE 1 (Arbeitsgemeinschaft Gynäkologische Onkologie—Chemo- and Radiotherapy in Epithelial Vulvar Cancer) study were reported in a medical companion paper[17]. We re-analyzed the data, containing lymph-node positive vulvar cancer patients, of which a subgroup was treated with adjuvant radio(chemo)therapy. Associations with mortality from vulvar cancer (disease-related death (DRD)) and death from other / unknown causes (DOC) were analyzed as competing risks. In the present work, the methodology of data analysis using multiple imputation and propensity scoring to estimate causal effects from observational data is shown in detail and considerations about methodological issues are disclosed. The specific focus of this work is the detailed description and discussion of the applied statistical methodology. The use of the applied techniques are opposed to other potential techniques. Advantages and disadvantages are discussed.

Patients

In the AGO-CaRE 1 study, 1618 patients with advanced vulvar cancer (FIGO stage ≥ IB [UICC staging 2006]) treated between 1998 and 2008 were retrospectively collected[13]. In the present analysis, a subgroup of 346 patients with lymph-node involvement, age ≤90 years and documented follow-up status were included. Of these patients, 182 (52.6%) were treated with adjuvant radiotherapy, whereas 164 (47.4%) did not receive adjuvant radiotherapy.

Ethical Approval and Informed Consent

The study protocol was approved by local ethics committees at each center [leading vote: Hamburg (reference number PV3658)] and registered with clinicaltrials.gov (NCT01304667). Patients provided general written informed consent to access their medical records for scientific analysis at first contact with the respective study center. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Statistical Methods

Multiple imputation

The MICE approach is an established imputation method creating multiple complete data sets in which the missing values are replaced by estimates from a specified regression model using the observed data[13;15;16]. The procedure assumes the missing data to be missing at random, which means that the probability that a value is missing only depends on the measured data.

With these multiply-imputed data sets, estimation is possible without omitting covariates or individuals with missing values. Let x1,…,xk be the k variables to be considered, with some or all of them having missing values. In the first step, all missing values are replaced at random. Then, the first variable with missing values, e.g. x1, is regressed on the variables x2,…,xk. From this estimation model on observed values of x1, a prediction of x1 is generated, from which the missing values of x1 are replaced by simulated drawing. The next variable with missing values x2 is regressed on all other variables x1,x3,…,xk, including the imputed values. Again, missing values of x2 are replaced by drawings from the posterior predictive distribution of x2. The procedure is repeated for all variables with missing values. After completion of such one cycle, the procedure is replicated for ten cycles to create one complete data set with stabilized imputations, Xcomplete. It is recommended to generate m = 3–20 data sets [10;13;18]. In this analysis, m = 10 complete data sets were generated. Considered variables were those listed in Table 1, except resection margin and lymph node metastasis diameter, as these variables contained too many missing values (62% and 70%, respectively). The different types of variables (continuous, dichotomous, categorical) were accounted for, and implausible values (negative count data, non-existing categories) were avoided[10;13]. To account for possible imbalances of the covariates amongst the treated and untreated patients, MI was conducted for both treatment groups separately[10].

thumbnail
Table 1. Patient characteristics by treatment group and standardized differences.

https://doi.org/10.1371/journal.pone.0165705.t001

Estimating treatment effects on disease-related and unrelated death

The effect of adjuvant therapy on the competing causes of death was computed separately in the 10 imputed data sets and then averaged over data sets using Rubin's combination rules[19]. The cause-specific hazards model was applied to consider the competition of the investigated causes of death. Using this approach, the specific events are analysed separately, treating the competing events as censored. Tests were performed two-sided with a 5% level of significance.

Propensity Scoring

Identifying confounders and estimation of the PS.

Confounding exists if a baseline variable correlates with the outcome and is furthermore imbalanced between the treatment groups[20]. Identification of relevant variables to be included in the PS model is a key factor for confounding control. Simulations showed that variables that are related to the outcome should be included in the model, even though they are not associated with the exposure [21]. The variance of the estimated exposure effect is decreased by this technique, without increasing bias [21]. In contrast, variables that are imbalanced with respect to the exposure can only produce bias, if they were related to the outcome. Including variables associated with the exposure but not with the outcome would increase this variance without decreasing bias [21,22]. However, the ultimate aim of propensity scoring is to balance covariates. Therefore, an iterative procedure was described by Austin (2011) [8]. In his work, he proposed to start with an initially specified propensity score model and to evaluate the resulting balance. If important systematic differences between exposure groups remain, the PS model should be modified. This procedure can be repeated until the group differences have been “reduced to an acceptable level” [8]. In the present investigation, we follow these two approaches. In an initial step, all potential confounders associated with either one of the competing endpoints were taken into account. Associations with outcome were tested using univariate cause-specific hazards models stratified across the 10 imputed data sets . After applying the PS and evaluating the balance achievement, the selection of confounders was adjusted iteratively until acceptable balance for all covariates was achieved.

The PS as defined by Rosenbaum and Rubin[1] represents the conditional probability of receiving the treatment of interest, given the variables observed at baseline. It was estimated using multivariate logistic regression of the treatment status on the confounding baseline covariates selected in the previous step. The resulting logit of the PS was then used to predict the probability of being treated[5;14].

Application of the propensity score.

The IPTW method[7;2325] was applied in each of the imputed data sets before averaging the results. The idea behind this method is to reweight the single individuals in the data set by the inverse probability of receiving the treatment, calculated from the PS. Thus, a sample in which the treatment assignment is independent of the distribution of measured covariates has been created[8].

Stabilized weights wi for individuals i have been defined as

[23]. The variable Zi indicates the treatment status for each subject i. If subject i was treated, then Zi equals 1, and 0 otherwise. PSi defines the individual propensity score for patient i and P is the rate of patients receiving the treatment. A robust variance estimator was used.

For comparison, PS stratification was applied. The study sample was split up into five strata according to quintiles of the PS. Stratified Cox regressions for comparing treated and untreated groups were performed for each imputed data set and then averaged. A robust variance estimator was used. Rosenbaum and Rubin stated that five strata according to quintiles of the PS can remove 90 percent of the bias in the considered covariates[1]. If the PS was correctly specified, the treated and untreated subjects within each stratum would have similar distributions of baseline covariates and could be compared directly without bias[26].

Balance check.

If all prognostically relevant covariates were balanced between the treatment groups the result of a univariate group comparison could be interpreted as a causal effect. The recommended way to examine if continuous variables are balanced is to compute standardized differences between treatment groups, defined as the difference between treated and untreated means of each factor, divided by the pooled standard deviation[5]. A method for factor variables is also described in Crowson et al.[5]. In this work, balance was tested in the originally measured data and in the data sets after applying the individual propensity score techniques. In the multiply imputed data, results of the balance checks were averaged across data sets using Rubin’s rules[19]. Absolute values of standardized differences <0.1 indicated sufficient balance[26].

Achieving balance across treatment groups is the goal of PS. Therefore, balance was checked after PS application. Depending on the resulting balance, the set of confounding variables was adapted and a new PS was calculated and applied.

Software

For MI, the Stata packages "ice" and "mim" were utilized[10;27;28]. R.2.15 (The R Foundation for Statistical computing, Vienna, Austria) and Stata (StataCorp. 2015. Stata Statistical Software: Release 14. College Station, TX: StataCorp LP.) with the packages "pscore"[29] and "pbalchk"[30] were applied for PS.

Results

Out of 346 included patients with lymph-node positive vulvar cancer and documented follow-up, 182 (53%) received adjuvant radiation therapy. Median follow-up was 16.4 months (range 0.3–163.6 months). During follow-up, 78 disease-related deaths, 17 disease-unrelated deaths and 40 deaths due to unknown reasons were observed. Median disease-free and overall survival were 15.3 months and 42.7 months, respectively. The patient characteristics as well as their association with treatment assignment are summarized in Table 1. Several differences between the treated and untreated patients were observed. Treated patients had considerably better Eastern Cooperative Oncology Group (ECOG) performance status than untreated patients, but at the same time treated patients were older and had more affected lymph-nodes with larger lymph-node metastases. Additionally, distribution of the type of groin surgery and groin dissection differed amongst the groups.

Missing values

Of the 346 patients, only 24 (7%) were completely documented regarding the 15 considered covariates (Table 1), whereas 79 (23%) had more than three missing values. Of the 15 considered variables, only four were fully documented. Lymph-nodes metastasis diameter (70% missing) and minimum resection margin (62.4% missing) could not be considered as covariates due to their high missing rates.

Naïve group comparison

Naïve univariate comparisons of the treated and untreated patients showed no associations between therapy and disease-related or unrelated mortality (hazard ratio (HR) 0.83, 95% confidence interval (CI): 0.53–1.29; p = 0.403 and HR 0.70, 95% CI 0.42–1.18; p = 0.177, respectively) (Table 2).

thumbnail
Table 2. Univariate associations between baseline characteristics and competing causes of death.

https://doi.org/10.1371/journal.pone.0165705.t002

Selection of confounders for computing the PS

Associations with disease-related death were found for the variables tumor stage, ECOG, number of affected nodes, type of groin surgery and age in the original data set as well as in the imputed data. Tumor stage, resection status, ECOG, number of affected nodes, type of groin dissection (uni- / bilateral) and age were related to death from other / unknown causes in both, the original and the imputed data (Table 2). These variables also show imbalances with regards to the standardized differences (Table 3). With respect to the achieved balance the best results were obtained by considering all these potential confounders except tumor stage to compute the PS.

thumbnail
Table 3. Standardized differences to identify imbalances between treatment groups before and after imputing and inverse-probability-of-treatment-weighting.

https://doi.org/10.1371/journal.pone.0165705.t003

Inverse-probability-of-treatment-weighting (IPTW)

Weighting the data according to the inverse probability of treatment resulted in predominantly balanced confounding variables (Table 3). Estimated hazard ratios after MI and IPTW for DRD were HR 0.69; 95% CI: 0.43–1.12; p = 0.135 and for DOC HR 0.73; 95% CI: 0.42–1.27; p = 0.269, respectively (Table 4).

thumbnail
Table 4. Results after propensity scoring using the potential confounders age, resection status, ECOG, number of affected nodes, type of vulva surgery and groin dissection.

https://doi.org/10.1371/journal.pone.0165705.t004

PS stratification

Based on the quintiles of the PS, the data set was stratified into four groups with 69 patients and one group with 70 patients. Effect estimates pooled across strata and combined from the multiply imputed data sets were HR: 0.66; 95% CI: 0.40–1.09; p = 0.103 for DRD and HR: 0.75; 95% CI: 0.41–1.36; p = 0.337 for DOC, respectively (Table 4).

Discussion

In this study, MI followed by PS was applied to estimate the causal effect of radiation therapy in lymph-node positive vulvar cancer on competing causes of death using data from the AGO-CaRE 1 study[17].

In detail, ten complete data sets were generated using MI by chained equation (MICE), stratified by treatment allocation[10;15;16]. Then, confounders to include in the PS calculation were identified by testing univariate associations between baseline covariates and outcomes, stratified across the multiple complete data sets. Thirdly, the PS was computed for each subject. In a fourth step, PS was applied using IPTW and PS stratification[6;8;19;26;31;32]. With IPTW, each patient was weighted according to her PS value. Stratification entailed splitting each data set according to quintiles of the PS and performing analyses stratified over groups. The achieved balance of baseline covariates before and after MI and IPTW was evaluated by standardized differences. The cause-specific hazards model was used to evaluate associations between treatment allocation and the competing causes of death. Results were estimated within each of the imputed data sets and then averaged. This approach is comparable to the 'Within approach' from Mitra and Reiter (2012), who applied PS matching after MI[33]. In contrast, other approaches to overcome the problem of missing values in PS estimation have been studied[9;11;12]. For example, Qu and Lipkovich (2009) proposed an adaptation including indicators of missing data patterns in the PS model. This technique may reduce bias when data are not missing at random[11].

The results from both applied PS methods after MI were very similar and also comparable to those from the naïve group comparison without MI and PS. All approaches agree in showing no associations, but slight tendencies towards improved disease-related survival in patients receiving radiation therapy (Table 4).

The two other established PS methods, PS matching and PS covariate adjustment, were not appropriate in this example. PS matching entails assigning matched sets of treated and untreated patients, sharing a similar PS value. Various techniques are available to select one or more untreated subjects to match each treated subject [2;8;26;31;32;3438]. However, all PS matching approaches require the group of untreated patients to be large enough (two- to threefold larger than the group of treated subjects) to provide acceptable matching partners[32]. In the AGO-CaRE 1 data, the number of treated patients exceeded the number of controls. In such situations, matching would either result in heterogeneous matched pairs or in a small number of matched pairs, omitting a significant amount of treated or untreated subjects for which no appropriate matching partner could be found. Therefore, PS matching was not implemented in this work. With PS covariate adjustment, the PS is included as adjusting covariate in a Cox proportional hazards model, where the outcome is regressed on the treatment variable. There is currently no consensus whether there is a benefit of this method, compared to performing a multivariate regression model adjusted for the confounding variables[39]. Furthermore, differences in covariate variances between treated and untreated patients can cause difficulties. In such cases, D'Agostino (1998)[32] advises against this method, which was therefore not applied in the present work.

The general purpose of the PS method is to reduce imbalances in outcome-related variables. Most imbalances that were present in originally observed and imputed data were cured after IPTW. The tumor stage, the type of vulva surgery, tumor diameter and the number of dissected groin lymph-nodes were still imbalanced in the multiply imputed data. However, these variables (except tumor stage) had no association with the outcome (Table 2) and therefore do not bias the results.

The validity of the results is limited by the assumptions inherent to the methods used. MI requires that the missing values are missing at random, which led to a similar distribution of baseline variables (Table 3) and similar univariate associations between baseline variables and outcome (Table 2) in the originally observed and the multiply imputed data. A general assumption in all PS methods is the presumption of no unmeasured confounders. Confounders that are not accounted for because they are not or imperfectly measured or not measurable can still bear a bias. In the present example, psychological factors and quality-of-life aspects were not considered and may therefore bear the risk of unmeasured confounding.

In conclusion, the points to consider in our PS application were:

  1. Missing values can be a problem in propensity score analysis. Different methods like MI, as applied here, or the use of an missing values pattern indicator[9;11;12] are available. In our example, results from a complete case analysis did not differ much from PS after MI.
  2. Different propensity score methods are established, like matching, stratification or IPTW, each providing even more options to choose from. The IPTW method yields an averaged treatment effect of all subjects, in contrast to most matching methods, which calculate the averaged treatment effect of the treated patients. Further, if the groups have similar size, the IPTW method performs well[9].
  3. The set of confounders to include in the PS have to be chosen carefully. The main goal of all PS methods is however to obtain balance in the variables considered to be “important” in the analysis.
  4. For computing the PS, a logistic regression model is the established method. However, there are also other ways including boosting or CART models[40].

Acknowledgments

We wish to thank Dr. Amit Gulati, Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany for going through the manuscript critically and for his valuable suggestions.

Author Contributions

  1. Conceptualization: CE AS PN AR SM LW.
  2. Data curation: PN AR UC TF AL MH LW SM.
  3. Formal analysis: CE AS PN AR.
  4. Investigation: CE AS PN AR UC TF AL MH LW SM.
  5. Methodology: CE AS PN AR.
  6. Software: CE AS PN AR.
  7. Validation: CE AS PN AR.
  8. Writing – original draft: CE AS PN AR SM LW.
  9. Writing – review & editing: CE AS PN AR UC TF AL MH LW SM.

References

  1. 1. Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55.
  2. 2. Rosenbaum P, Schenck LA. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. American Statistician 39, 33–38. 1985.
  3. 3. Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 2009 Nov 10;28(25):3083–107. pmid:19757444
  4. 4. Brookhart MA, Wyss R, Layton JB, Sturmer T. Propensity score methods for confounding control in nonexperimental research. Circ Cardiovasc Qual Outcomes 2013 Sep 1;6(5):604–11. pmid:24021692
  5. 5. Crowson CS, Schenck LA, Green AB, Atkinson EJ, Therneau TM. The Basics of Propensity Scoring and Marginal Structural Models. Department of Health Sciences Research, Mayo Clinic Rochester, Minnesota; 2013 Aug 1.
  6. 6. Rosenbaum P, Schenck LA. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association 79, 516–524. 2014.
  7. 7. Rosenbaum P. Model-based direct adjustment. The Journal of the American Statistician 82, 387–394. 1987.
  8. 8. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res 2011 May;46(3):399–424. pmid:21818162
  9. 9. Stuart E. Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science 2010;25(1):1–21. pmid:20871802
  10. 10. Lunt M. A Guide to Imputing Missing Data with Stata—Revision: 1.4. 2011.
  11. 11. Qu Y, Lipkovich I. Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach. Stat Med 2009 Apr 30;28(9):1402–14. pmid:19222021
  12. 12. Seaman SWI. Inverse Probability Weighting with Missing Predictors of Treatment Assignment or Missingness. Communication in Statistics—Theory and Methods 2014;43(16):3499–515.
  13. 13. Royston P, White IR. Multiple Imputation by Chained Equations (MICE): Implementation in Stata. Journal of Statistical Software 2011;45(4):1–20.
  14. 14. Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 1987.
  15. 15. Van Buuren S., Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999 Mar 30;18(6):681–94. pmid:10204197
  16. 16. Van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 2007;16:219–42. pmid:17621469
  17. 17. Mahner S, Jueckstock J, Hilpert F, Neuser P, Harter P, de Gregorio N, et al. Impact of adjuvant therapy in lymph-node positive vulvar cancer—the AGO-CaRE 1 (Chemo- and Radiotherapy in Epithelial Vulvar Cancer) study. Journal of the National Cancer Institute 107 [3]. 2015.
  18. 18. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011 Feb 20;30(4):377–99. pmid:21225900
  19. 19. Rubin DB. Propensity score methods. Am J Ophthalmol 2010 Jan;149(1):7–9. pmid:20103037
  20. 20. Mertens BJ, Datta S, Brand R, Peul W. Causal effect estimation strategies in a longitudinal study with complex time-varying confounders: A tutorial. Stat Methods Med Res 2014 Aug 20.
  21. 21. Brookhart MA, Schneeweis S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. Am J Epidemiol. 2006 Jun 15;163(12):1149–56. pmid:16624967
  22. 22. Greenland S, Pearl J. Adjustments and their Consequences—Collapsibility Analysis using Graphical Models. International Statistical Review 2011, 79, 3, 401–426
  23. 23. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000 Sep;11(5):550–60. pmid:10955408
  24. 24. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000 Sep;11(5):561–70. pmid:10955409
  25. 25. Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004 Oct 15;23(19):2937–60. pmid:15351954
  26. 26. Austin PC. A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality. Multivariate Behav Res 2011;46(1):119–51. pmid:22287812
  27. 27. Royston P, Carlin JB, White IR. Multiple imputation of missing values: New features for mim. The Stata Journal 2009;9(2):252–64.
  28. 28. Royston P. Multiple imputation of missing values: update. The Stata Journal 2005;5(2):1–14.
  29. 29. Becker SO, Ichino A. Estimation of average treatment effects based on propensity scores. The Stata Journal 2[4], 358–377. 2002.
  30. 30. PBALCHK: Checking Covariate Balance [computer program]. 2015.
  31. 31. Austin PC. The use of propensity score methods with survival or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments. Stat Med 2014 Mar 30;33(7):1242–58. pmid:24122911
  32. 32. D'Agostino RB Jr. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998 Oct 15;17(19):2265–81. pmid:9802183
  33. 33. Mitra R, Reiter JP. A comparison of two methods of estimating propensity scores after multiple imputation. Stat Methods Med Res 2012 Jun 11.
  34. 34. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing [computer program]. Version version 4.0.10 10feb2014 2014.
  35. 35. Dehejia RH, Wahba S. Propensity score matching methods for nonexperimental causal studies. RevEcon Stat 84, 151. 2012.
  36. 36. Dehejia RH, Wahba S. Causal effects in nonexperimental studies: re-evaluation of the evaluation of training programs. Journal of the American Statistical Association 94, 1043–1062. 1999.
  37. 37. Baser O. Too much ado about propensity score models? Comparing methods of propensity score matching. Value Health 2006 Nov;9(6):377–85. pmid:17076868
  38. 38. Rubin DB. Bias Reduction Using Mahalanobis-Metric Matching. Biometrics 36, 293–298. 1980.
  39. 39. Rubin DB. Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association 74, 318–324. 1979.
  40. 40. Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol 2010 Aug;63(8):826–33. pmid:20630332