^{1}

^{*}

^{2}

^{3}

^{4}

^{1}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: GS DA LA LD MS LS. Performed the experiments: GS MS. Analyzed the data: GS DA LA LD MS LS. Contributed reagents/materials/analysis tools: GS MS. Wrote the paper: GS DA LA LD MS LS.

Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way.

We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model.

For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.

Individual participant data (IPD) systematic review and meta-analysis in which the original “raw” data from each participant in the relevant trials are centrally collected, checked, re-analysed and combined

To date, most IPD analyses have taken a two-stage approach to analysis. In the first stage individual participant data within a trial are analysed to generate trial-level summary statistics (e.g. relative risks). In the second stage these results from each trial are combined across trials using conventional meta-analytical methods

A “one-stage” approach, by contrast combines all individual participant data in a single meta-analysis based on a regression model stratified by trial (e.g. a logistic regression). In order to incorporate random-effects to allow for heterogeneity, hierarchical or mixed-effect regression models are used

In the two-stage framework associations between treatment and participant characteristics may be investigated by subgroup analysis commonly accompanied by tests of interaction

The one-stage approach is flexible, allowing incorporation of both random treatment effects

The flexibility of the one-stage approach offers multiple approaches to model specification and so increases the potential for data dredging

There is currently no consensus or guidance on the appropriateness of the different approaches to analysis of IPD

Here we present an empirical comparison of one and two-stage methods for dichotomous outcome data, based on a large individual participant dataset which includes both large and small randomised controlled trials (range 22 to 8016 participants). We compared fixed-effect and random-effects estimates of overall treatment effectiveness and treatment-covariate interactions using one and two-stage approaches to analysis. We use these findings, together with those of others and theoretical underpinnings, to explicitly consider the tradeoffs between computational and statistical complexity with the ability to minimise potential bias and provide insights into treatment effectiveness. Our aim is to provide pragmatic guidance on choice of methods.

Treatment effect | Treatment-covariate interaction | Within and across trial coefficients | |

Model 1 | fixed | NA | NA |

Model 2 | random | NA | NA |

Model 3 | random | fixed | NA |

Model 4 | random | fixed | NA |

Model 5 | random | fixed | yes |

Model 6 | random | random | NA |

The dataset comprises IPD collected as part of an international collaborative IPD meta-analysis evaluating antiplatelet agents for the prevention of pre-eclampsia

Q(df = 23) = 31.19, p = 0.12, I^{2} = 26.3.

Relative risk | 95% CI | Amount ofHeterogeneity | |

Two-stage fixed | 0.90 | 0.83 to 0.96 | NA |

One-stage fixed | 0.90 | 0.83 to 0.97 | NA |

Two-stage random | 0.87 | 0.78 to 0.97 | 0.011 (se 0.016) |

One-stage random | 0.90 | 0.83 to 0.97 | 0 (se 0.000) |

The previously published two-stage meta-analysis

Two-stage | One-stage | ||||

Subgroup | Category | Relative risk(95% CI) | Interactionp value | Relative risk(95% CI) | Interaction coefficient (standard error) p value |

First pregnancy with/withouthigh risk factor | with | 0.90 (0.76 to 1.08) | 0.71 | 0.88 (0.66 to 1.09) | 0.03 (0.13) p = 0.81 |

without | 0.87 (0.75 to 1.02) | 1.16 (1.00 to 1.31) | |||

Second pregnancy with/withouthigh risk factor | with | 0.89 (0.81 to 0.99) | 0.56 | 0.88 (0.78 to 0.98) | −0.08 (0.17) p = 0.62 |

without | 0.98 (0.73 to 1.33) | 0.95 (0.63 to 1.27) | |||

Second pregnancy with/withoutHistory of hypertension | Yes | 0.86 (0.77 to 0.97) | 0.25 | 0.88 (0.49 to 1.25) | −0.07 (0.10) p = 0.46 |

No | 0.96 (0.82 to 1.12) | 0.94 (0.53 to 1.35) | |||

Renal disease | Yes | 0.63 (0.38 to 1.06) | 0.23 | 0.60 (0.35 to 1.04) | −0.43 (0.31) p = 0.17 |

No | 0.90 (0.82 to 0.96) | 0.90 (0.82 to 0.98) | |||

Diabetes | Yes | 0.63 (0.38 to 1.06) | 0.26 | 0.71 (0.35 to 1.06) | −0.21 (0.19) p = 0.27 |

No | 0.90 (0.82 to 0.96) | 090 (0.81 to 0.98) | |||

Hypertension | Yes | 0.97 (0.84 to 1.12) | 0.28 | 0.97 (0.82 to 1.15) | 0.10 (0.10) p = 0.32 |

No | 0.88 (0.81 to 0.96) | 0.89 (0.82 to 0.96) | |||

Previous small for gestationalage infant | Yes | 1.05 (0.86 to 1.28) | 1.05 (0.80 to 1.36) | ||

No | 0.85 (0.73 to 0.98) | 0.27 | 0.85 (0.69 to 1.05) | 0.25 (0.14) p = 0.07 | |

No previous infant | 0.89 (0.79 to 0.99) | 0.85 (0.75 to 1.32) |

Two-stage analyses to investigate the association between treatment effect and covariates, that is, treatment-covariate interactions, were conducted by subgroup analysis for each covariates of interest. It was not possible to build a one-stage multivariate model incorporating all covariates, because different subgroups were reported in different trials, thus each covariate was considered in turn. One-stage fixed and random-effects analyses were conducted by extending regression models 1 (fixed) and 2 (random) to include the covariate and an interaction term between treatment and covariate as proposed by

Subgroup | Category | Two-stage | Continuous covariate coefficients | One-stage | ||

Relative risk(95% CI) | Interaction test p- value | Relative risk, |
Interaction coefficient p- value | |||

Maternal age (years) | <20 | 0.97 (0.78 to 1.20) | Treatment | 0.88 (0.87 to 0.89) |
||

20–35 | 0.87 (0.80 to 0.95) | |||||

>35 | 1.02 (0.83 to 1.26) | 0.35 | Interaction | 1.00 (0.98 to 1.01) |
0.85 | |

Gestational age at randomisation (weeks) | <20 | 0.87 (0.79 to 0.96) | Treatment | 0.90 (0.84 to 0.95) | ||

≥20 | 0.95 (0.85 to 1.06) | 0.2 | Interaction | 1.00 (0.99 to 1.01) | 0.53 |

We used presence of a high risk factor (as defined by

Model | Treatment coefficient | Interaction coefficient | AIC | ||

Log odds ratio (se) | p | Log odds ratio (se) | p | ||

3 | −0.13 (0.07) | 0.08 | 0.006 (0.09) | 0.94 | 16189 |

4 | −0.12 (0.07) | 0.09 | 0.007 (0.09) | 0.93 | 16137 |

5 | −0.16 (0.06) | 0.01 | 0.001 (0.006) | 0.85 | 16200 |

6 | −0.13 (0.07) | 0.08 | 0.006 (0.09) | 0.94 | 16199 |

Model | Treatment coefficient | Interaction coefficient | AIC | ||

Log odds ratio (se) | p | Log odds ratio (se) | p | ||

3 | −0.12 (0.004) | 0.005 | 0.001 (0.006) | 0.85 | 16199 |

4 | −0.12 (0.004) | 0.005 | 0.001 (0.006) | 0.79 | 16190 |

5 | −0.10 (0.05) | 0.04 | 0.0004 (0.007) | 0.94 | 13998 |

6 | −0.15 (0.04) | 0.0007 | −0.001 (0.008) | 0.86 | 16200 |

A range of models making different assumptions were used to analyse interactions between treatment and high risk factor and between treatment and maternal age. Model 3 assumes that the effect of the covariate and the treatment-covariate interaction are common to all trials. Model 4, however, allows for independent effects of the covariate across trials. Model 5 separates the within-trial information on the treatment-covariate interaction from the across-trials information. A final novel one-stage-model incorporates random-effects for both the treatment effect and the treatment-covariate interaction (model 6). We also explored the ease with which one-stage models could be extended to include multiple covariates or treatment-covariate interactions.

Method | Computational and statistical complexity | Potential problems |

Two-stage subgroup analysis | ||

Two-stage, combining within-trial regression coefficients |
||

Simple one-stage regression |
||

Complex one-stage regression (e.g. separating within- and across-trial information |

1. | Estimate overall intervention effects and generate forest plots using conventional two-stage methods. |

2. | Fit a two-stage analysis combining within-trial regression coefficients, to eliminate aggregation bias. Forest plots of interaction coefficients from such analyses are particularly useful for graphical display. |

3. | If statistical support is available fit simple one-stage models with single treatment-covariate interactions (model 3). Compare with two-stage results. |

4. | If possible and statistical support is available, fit one-stage models separating within and across trials information (model 5 or 6). Is there evidence of aggregation bias? Do within- and across-trials results differ? |

○ If there is evidence of aggregation bias: Report results from the within-trials association from model 5 or 6; or within-trial regressions where one- stage analysis was not possible. | |

○ If there is no evidence of aggregation bias: Report results from model 3, if similar to model 5 or 6. These results are likely to have greater precision than two-stage analysis results. | |

5. | If statistical support is available, consider extending model 3 (in the absence of aggregation bias) or models 5 or 6 (with aggregation bias) to include multiple covariates and interactions. Compare multiple models, select a best fitting model and report its results, with a summary of all models considered. |

Two-stage analyses were undertaken using the Metafor package

The overall estimates of effect of anti-platelets in preventing pre-eclampsia obtained by one and two-stage approaches were compared (^{2}) indicating no variation between studies. The standard (method-of-moments) estimate of heterogeneity in the two-stage approach was 0.011 (Q = 28.98, p = 0.18, I^{2} = 21%) indicating minimal variation, but both restricted maximum likelihood and maximum likelihood approaches estimated heterogeneity as zero (I^{2} = 0.01%). Heterogeneity therefore appears sensitive to method of computation, but not to choice of one or two-stage model directly. However, none of these differences were of material importance or would lead to different clinical interpretation of findings. The forest plot (

Neither one-stage nor two-stage methods identified any statistically significant interactions between the effect of anti-platelet administration and any of the types of women considered. There are important differences in the way results are presented between the approaches. Two-stage analyses generally present a p-value for an interaction test, whilst one-stage models provide interaction coefficients. Generally, in the two-stage approach effect sizes are only presented for subgroup categories (e.g. separately for men and for women) when there is clear evidence of a differential effect of the intervention, as indicated by statistically significant test for interaction. In which case the clinically utility of the intervention for each type of participant group is likely to be best judged with respect to the particular effect estimate obtained for that group (e.g. the effect estimate obtained for men will be used to make decisions about the use of interventions in men). Where there is no indication that particular types of participant benefit disproportionally from the intervention, as indicated by a non significant test for interaction, the overall result generally remains the best estimate to use when making clinical judgements of utility (e.g. the same overall effect will be used to make decisions about the treatment of both men and women). One-stage models generally present regression and interaction coefficients. When there is evidence of a differential effect of the intervention (e.g. women benefit more than men) the coefficients can be converted to effects to aid clinical interpretation. Here we present effects for all subgroups to facilitate comparison of one and two-stage model output along with p-values from associated tests of interaction. In actuality, neither approach would generally present results separately by subgroup category as there were no indications of differential effectiveness. There were no consistent differences between one and two-stage methods in terms of the size, precision or differences between subgroups (

There were some numerical differences in results for continuous covariates, although the differences were not clinically significant (

One-stage models have the advantage of avoiding potentially arbitrary dichotomisation of continuous covariates and allow exploration of non-linear relationships. The inclusion of quadratic terms in models with maternal age did not substantively alter the treatment or treatment-age interaction coefficients. Model comparison indicated that simpler models, treating maternal age as linear, represented a better trade-off between the amount of variation explained and complexity than did models with quadratic terms. One-stage models did not consistently converge (models crashed) when estimating relative risks, but did converge when outcomes were measured as odds ratios. This reflects the additional complexity of modelling relative risk (which requires inclusion of a link function: see

The full range of one-stage models described in the introduction (

The additional complexity of model 5 was warranted in terms of the extra variability the model explained in comparison to other models where maternal age was concerned. Model fit, measured in terms of AIC was similar across models except for model 5, which had a substantially lower AIC than the other models in the analysis of maternal age, suggesting a better model fit, and that accounting for aggregation bias was important in that analysis (

The one-stage models performed similarly in terms of robustness and speed of convergence (measured in seconds rather than minutes), although convergence of the models was not always possible when calculating relative risks rather than odds ratios. Exploratory analyses based on inclusion of multiple covariates simultaneously suggest that extending the models (particularly model 5) beyond a single treatment-covariate interaction may not always be possible as multivariate models were prone to crash.

The selection of analytical method for IPD may not be straightforward. Advocates of a one-stage approach point to increased power to detect treatment-covariate interactions, ability to control for aggregation bias and also suggest that one-stage approaches may provide deeper insights into the data by allowing testing of different assumptions about model structure and adjustment for multiple covariates

In this example, the increased power of one-stage methods was not manifest in tighter confidence intervals for overall treatment effects or for treatment-covariate interactions of the seven categorical covariates investigated. However, when using a one-stage approach, the treatment-covariate interaction terms were more precisely estimated for continuous covariates. The lack of interaction may be more apparent in one-stage models which display interaction coefficients and standard errors than two-stage models where interactions are assessed using p- values from subgroup analysis. This may be of importance where p-values are close to statistically significant boundaries.

One-stage models have the advantages of not requiring the potentially arbitrary dissection of covariates, and ability to test for non-linear relationships for continuous covariates. However, interpretation of the resulting coefficients may be difficult. Centering treatment effects on median values is appropriate where there are no interactions as this reflects the population for whom the estimate is most applicable. Where interactions are identified, it may be appropriate to express coefficients in terms of treatment effects for categorical groups identified in protocols, as is standard in presentation of results from two-stage analysis.

Aggregation bias is a potential problem in any meta-analysis. Avoiding this bias by using IPD and distinguishing between within-trials and across-trials information is therefore important. To eliminate such bias, only within-trial information on the association between treatment effects and covariates should be considered. This is possible in both one-stage (see model 5) and two-stage approaches. In a two–stage analysis aggregation bias can be avoided by only including trials that report effects for all subgroup categories. Alternatively, within-trial treatment-covariate interactions can be identified by undertaking regression analyses within each trial and combining regression coefficients in a meta-analysis across trials. One-stage models which separate within and across trial treatment-covariate interactions provide direct measures of effect and precision thereby allowing quantification of the effects of aggregation bias. Model specification, can allow statements to be made directly about the magnitude and significance of aggregation bias (

A key advantage of a one-stage approach is the flexibility in terms of the models that may be fitted. One-stage models allow for the inclusion of multiple covariates in a single model, multiple random-effects on different parameters and the separation of within and across-trials information. The different models may not necessarily lead to different results, as was found in this analysis. One-stage models may also be compared, in terms of both goodness of fit and parsimony of model, by using, for example, the AIC statistic. This allows the selection of a “best fitting” model to be identified across a range of possible models. Use of AIC reduces the risks of over-fitting and data-dredging by including too many, irrelevant covariates or specifying multiple implausible models. The rationale for choice of model should be transparently reported and justified to ensure that the flexibility of the one-stage approach does not result in selective reporting of results. As with any meta-analysis,

Whilst two-stage approaches may sometimes be unrealistically simple, one-stage approaches may be intractably complex. In this analysis models expressed in terms of relative risk could not always be applied, and problems arose as multiple covariates were included in any model. Simpler models may be preferable, with fewer covariates, fewer random-effects and expressing outcomes as log odds rather than log relative risk. Alternatively, use of software such as WinBUGS

One- stage models have greater statistical complexity and are therefore harder to interpret. Statistical support is an important pre-requisite for the implementation of these models. Careful interpretation and explanation of the coefficients is required. For dichotomous outcomes expression of results as odds ratios or risk ratios with 95% confidence intervals is preferable to display of raw coefficients as this is more meaningful to most non statisticians. Reporting guidelines have not yet been developed specifically for IPD analyses but should clearly consider the reporting of one-stage models with emphasis on the explicit value judgements regarding model structure, sensitivity of results to model choice, and interpretation of regression coefficients.

Here we suggest a pragmatic approach to analysis in IPD reviews based on existing empirical comparisons of one and two-stage methods (the current work,

Irrespective of the final approach, performing a two-stage analysis as an initial step is generally advisable. This generates forest plots enabling results across trials to be compared visually, heterogeneity investigated and differences across subgroups visualised, all of which are essential in understanding the dataset underlying the review. We suggest that for reviews of large randomised trials that are homogeneous in populations and design, a two-stage analysis will often be sufficient. Large numbers of participants mean that lack of statistical power is unlikely to be an issue and clinical homogeneity of trials reduces the risk of aggregation bias. In such situations two-stage and one-stage methods are likely to give similar results. However, a one-stage analysis may still be preferred for evaluating treatment-covariate interactions of continuous covariates, to avoid arbitrary categorisation and to incorporate non-linear relationships. They may also be of use for fitting single models including multiple covariates, particularly where covariates are expected to be highly correlated.

One-stage methods may be most appropriate when trials are small, participant numbers are few and where there is clinical heterogeneity across trials. In this case two-stage methods may lack statistical power and subgroup analyses may be affected by aggregation bias, particularly if some trials did not include participants in some specified subgroup categories.

To avoid aggregation bias in two-stage analyses, models should be fitted to estimate treatment-covariate interactions within each trial, and these estimates pooled across trials, rather than using conventional subgroup analysis. Where possible, one-stage models should be parameterised to separate within and across trial treatment-covariate interaction, at least as a sensitivity analysis. Where one-stage models are used a range of plausible models should be fitted (ranging from too simple to too complex) and these models compared.

Trade-offs between computational and statistical complexity and potential for problems such as aggregation bias and lack of statistical power are explicitly considered in

Major benefits of obtaining IPD are accrued prior to analysis and where an IPD review evaluates effectiveness based on sufficient data from randomised controlled trials, one-stage statistical analyses may not add much value to simpler two-stage approaches. Researchers should therefore not be discouraged from undertaking IPD synthesis through lack of advanced statistical support.

(TIF)

(DOCX)

(DOCX)

Thanks to CRD colleagues and co-authors who have been supportive of the first author’s transition to medical research and especially to Chris Schmid, Kerrie Mengersen, Issa Dahabreh and Wolfgang Viechtbauer who have been valued mentors in the development of statistical and programming skills. We would also like to thank two anonymous reviewers for their helpful suggestions.

The manuscript is dedicated to Rosa Stewart in the hope that decisions impacting on her generation are informed by robust individualised evidence.