Systematic reviews (SRs) can provide accurate and reliable evidence, typically about the effectiveness of health interventions. Evidence is dynamic, and if SRs are out-of-date this information may not be useful; it may even be harmful. This study aimed to compare five statistical methods to identify out-of-date SRs.
A retrospective cohort of SRs registered in the Cochrane Pregnancy and Childbirth Group (CPCG), published between 2008 and 2010, were considered for inclusion. For each eligible CPCG review, data were extracted and “3-years previous” meta-analyses were assessed for the need to update, given the data from the most recent 3 years. Each of the five statistical methods was used, with random effects analyses throughout the study.
Eighty reviews were included in this study; most were in the area of induction of labour. The numbers of reviews identified as being out-of-date using the Ottawa, recursive cumulative meta-analysis (CMA), and Barrowman methods were 34, 7, and 7 respectively. No reviews were identified as being out-of-date using the simulation-based power method, or the CMA for sufficiency and stability method. The overall agreement among the three discriminating statistical methods was slight (Kappa = 0.14; 95% CI 0.05 to 0.23). The recursive cumulative meta-analysis, Ottawa, and Barrowman methods were practical according to the study criteria.
Citation: Pattanittum P, Laopaiboon M, Moher D, Lumbiganon P, Ngamjarus C (2012) A Comparison of Statistical Methods for Identifying Out-of-Date Systematic Reviews. PLoS ONE 7(11): e48894. https://doi.org/10.1371/journal.pone.0048894
Editor: Joel Joseph Gagnier, University of Michigan, United States of America
Received: June 14, 2012; Accepted: October 3, 2012; Published: November 20, 2012
Copyright: © 2012 Pattanittum et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Financial support from the Office of the Higher Education Commission and the Thailand Research Fund through the Royal Golden Jubilee Ph.D. Program (Grant Number PHD/0270/2550) for PP and ML is greatly appreciated. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Systematic reviews (SRs) are an important scientific tool that can provide accurate and reliable evidence, typically about the effectiveness of health interventions . SRs are a useful starting point for practice guideline developers, health policy analysts, and health care providers , . A few granting agencies are starting to require SR evidence when making decisions about funding new research, particularly randomized control trials , .
Recent data suggest that 11 new SRs are published daily, while annually 5,000 SRs are indexed in Medline . SRs are most useful when they are up-to-date. Evidence is dynamic, and if SRs are out-of-date this information may not only be unhelpful, it may be harmful. In 1995 Jadad et al  reported that only 3% of the 39 non-Cochrane SRs and 50% of 36 Cochrane SRs had been updated 2 years after publication. In 2002, 70% of 362 Cochrane SRs published in 1998 had been updated . Only 2% (2/88) of non-Cochrane SRs and one-third (47/125) of the Cochrane SRs (focusing on therapeutic effectiveness) were updated in 2004 . Garrity et al.  conducted an internet-based international survey of healthcare organizations involved in SRs, and found that only 33% (35/105) of respondents updated their SRs regularly, although most of them agreed with the importance of updating SRs.
We built on a recent systematic review of strategies, techniques, and statistical approaches on when and how to update SRs , , searching for further information about statistical methods to update SRs. At present there are five proposed statistical methods to determine whether a given SR is out-of-date (see Table 1 for more details):
- recursive cumulative meta-analysis (CMA); 
- CMA for sufficiency and stability; 
- a test for identifying null meta-analyses that are ripe for updating (Barrowman method); 
- quantitative signal of changes in evidence (Ottawa method) ; and
- the power of an updated meta-analysis using simulation (simulation-based power method) .
With no standard approach, it is unclear when and how to update SRs. The Cochrane Collaboration advocates periodic updating every 2 years , . This updating strategy may not be appropriate for all SRs; in one study examining 100 SRs, 23% of 100 SRs became out-of-date within 2 years after publication, 15% within one year, and 7% were already out of date at the time of publication . One contributory reason is possibly different publication trajectories for different treatments, and/or various conditions. Also of concern is that simulation studies suggest that frequent updating of SRs can result in an inflated type I error rate , and might lead to publication bias  because the studies with significant results are likely to be published faster than those with non-significant results.
Of the available statistical methods for updating, among 99 respondents to the aforementioned Internet-based survey, 11% used a CMA method (not specified which of the two), while 4% favored the Barrowman method . One recent study  found that the Barrowman and simulation-based power methods produced results that agreed closely, when the study sample was homogeneous. There is no further information regarding other statistical methods.
Since there are five statistical methods available to identify whether a SR is out of date, and there is no evidence comparing these methods in terms of their agreement with one another, consistency nor practicality, we aimed to compare these methods for identifying out-of-date SRs.
Materials and Methods
A sample of SRs registered in the Cochrane Pregnancy and Childbirth Group (CPCG) published between 2008 and 2010 in the Cochrane Database of Systematic Reviews (CDSR) was examined. We selected the CPCG because it was the first Cochrane Review Group to be established, and it has published the largest number of Cochrane SRs.
Primary outcome identification
For the purposes of this study, a single primary outcome was identified for each study as the outcome of interest.
- When the CPCG review authors defined a single primary outcome, that one was used.
- If the review authors pre-defined more than one, the outcome with the largest number of included studies and/or participants was chosen as the primary outcome of interest. If more than one primary outcome satisfied these criteria, an obstetrician (Pisake Lumbiganon; PL) selected the single most clinically important one.
- If no primary outcome had been identified by the review authors, the reported outcomes were ranked by PL, based on clinical importance and the CPCG review's objective.
The CPCG review was included if it met the following inclusion criteria:
- it reported a meta-analysis of at least 3 included studies for the primary outcome;
- the analysis did not include results of cluster randomized trials;
- the primary outcome measure was either dichotomous or continuous, and
- For a dichotomous outcome, numbers of events and sample sizes in treatment and control groups were reported
- For continuous outcomes, the means, standard deviations and sample sizes in treatment and control groups were reported
- the publication date of the most recent included study was at least 3 years later than dates of the first two included studies
- the included studies that were published at least 3 years before the most recent included study, yielded a non-significant meta-analysis at the 5% significance level.
Searching and selection
A list of all active CPCG review titles was identified through Archie – the Cochrane Collaboration's central server (accessed April, 2011). The reviews were retrieved for full text from the Cochrane Library and screened according to defined inclusion criteria.
A data collection form was used to extract data from the eligible CPCG reviews (e.g., topics in obstetrics, study objective, primary outcome, and comparisons). For individual studies included in each CPCG review, the year of publication and summary statistics were also extracted, as indicated in inclusion criteria above.
One author (Porjai Pattanittum; PP) extracted the data from all eligible CPCG reviews. To check data extraction accuracy, data from a random sample of 10% of the reviews was extracted independently by a second reviewer (Chetta Ngamjarus; CN). The rate of discrepancies was 0.125 (1/8 reviews); this rate was very small (1/80 items) if considered by item. Discrepancies on data extraction and errors were resolved by rechecking with the full text of PCG reviews.
Statistical methods for identifying an out-of-date SR
To examine the detection of out-of-date SRs using each method, we compared the results of a previous meta-analysis and an updated meta-analysis. The previous meta-analysis was defined as a meta-analysis of the studies published more than 3 years before the most recent study (this was a hypothetical review), while the updated meta-analysis including all studies, and was the actual CPCG review.
We selected a 3 year period between the previous meta-analysis and updated meta-analysis in part because this period is one of the criteria for one statistical method (the simulation-based power method ) examined in this present work. As well, Jaidee et al  reported that a median time before the first update of CPCG reviews was 3.3 years (95% CI 2.7 to 3.8 years).
The cohort of eligible CPCG reviews was assessed for the need to update using the five statistical methods , , , , . Methods, and their strengths and limitations are briefly summarized in Table 1, while more details are available elsewhere , .
A random effects model was used, as a conservative method for meta-analysis of results .
Outcome measures and data analysis
The main outcomes of this study were agreements between methods as to the need for updating, as well as assessment of the ease of calculations and practicality of each method.
We calculated the agreement in identifying out-of-date SRs among all possible study methods by a pooled Kappa statistic and its corresponding 95% confidence interval (CI). Frequencies and survival time to update with 95% CI were used to describe characteristics of the study cohort. Data analyses were conducted using R software  and STATA .
Our study assessed out-of-date in the null meta-analyses of CPCG reviews. The search identified 415 active review titles, which were screened as depicted in Figure 1 Ultimately, 80 reviews were included.
Characteristics of included CPCG reviews
Among the 80 reviews, 55% were published in 2010, and 95% reported the primary outcome as dichotomous data, of which 89% presented treatment effects as risk ratios. The median numbers of included studies before and after updating were 4 and 5.5 studies, respectively. The median numbers of participants before and after updating were 1,346, and 2,274 persons, respectively (Table 2). The most common review topic was induction of labour (14/80). Sixty percent (48/80) of the reviews had been updated previously; up to three times. Of the 48 Cochrane updated reviews, 8 reported that the conclusion had changed after the update activities.
Comparing out-of-date detection using the five methods
Applying the five statistical methods to the 80 reviews, the Ottawa method identified 34 reviews (Appendix S4; 5, 8, 10, 12,14,16, 18, 20, 23, 25–30, 32–34, 39–40, 44, 46–47, 49, 52–53, 58–59, 61, 67, 72, 74, 78–79) as being out-of-date, while the recursive CMA and Barrowman methods each identified 7 reviews as being out-of-date. The CMA for sufficiency and stability, and the simulation-based power method did not identify any review as being out-of-date. Brief results of each method are presented for the 10 reviews with the highest magnitude of the indicators in Appendix S3, Table S3 to S7.
Recursive CMA method
Seven of the 80 reviews yielded a signal indicating a need for updating: 4 reviews had an out-of date ratio greater than 1.5; the remaining 3 reviews produced a ratio less than 0.5 (Appendix S3, Table S3). Of these, 3 reviews presented changes in directions of treatment effects in updated meta-analyses.
CMA for sufficiency and stability method
All of 80 reviews yielded a failsafe ratio less than 1, which could imply that too few additional studies were available to update the previous meta-analysis. The stability of effect size could not be explored because too few studies were identified in the three year interim period.
In Appendix S3, Table S4 it can be seen that the number of ‘hidden’ study(ies) (Nfs) was smaller than the benchmark (e.g., Abalos E, 2007 (Appendix S4; 1) revealed Nfs = 20 studies but the benchmark was 110 studies, with a failsafe ratio = 0.18). As a result of lack of sufficiency to determine the stability of effect size, the out-of-date status of none of the 80 reviews could be predicted using this method.
Seven of 80 reviews were deemed to be out-of-date using this method. The highest participant ratio was 34.9, with the treatment effect measured as mean difference (MD). The results of this method were shown in Appendix S3, Table S5. Although the participant ratios of those 7 reviews identified as being out-of-date exceeded unity, only 2 reviews (Appendix S4; 20, 32) provided significant results for the updated meta-analyses (not shown in Table S5).
This method indicated 34 (43%) reviews as being out-of-date. Three reviews were detected by the first quantitative signal (change in statistical significance), and 31 reviews were found by the second quantitative signal (change in effect size of at least 50%). The maximum and minimum RRR or MD ratios were 33.1 and −15.5. Thirty-one reviews reported relative risk, while 3 reported mean difference. Ten reviews (Appendix S4; 18, 29–30, 34, 39, 44, 53, 58, 59, 78) presented changes in the direction of the treatment effect compared with the results of previous meta-analyses (not shown in Appendix S3, Table S6).
Simulation-based power method
No review was identified as being out-of-date using this method. The maximum power of update meta-analysis was only 63% (Appendix S3, Table S7).
Agreement between methods
Thirty-seven reviews were identified to be out-of-date by one or more statistical methods; recursive CMA, Barrowman, and Ottawa methods with slight agreement between them (Kappa = 0.14; 95% CI 0.05 to 0.23). Only one review (Appendix S4; 25) was identified as out-of-date by all three methods.
Among the three pairs of methods that discriminated between reviews potentially needing updating in this work, the observed agreement ranged from 43 to 69 reviews not needing updating, while the positive results ranged from 3 to 5 reviews. Fair agreement was observed between the recursive CMA and Barrowman methods (Kappa = 0.37; 95% CI 0.03 to 0.72; see Appendix S3, Table S8).
The practicalities of methods
Practical methods were considered to be those requiring less intensive analysis, and the straightforward data. The recursive CMA and the Ottawa methods were the most practical methods because they require only two parameters to calculate the indicator (pooled treatment effects from current and updated meta-analyses, or p-value from current and updated meta-analysis). These parameters are automatically calculated by any meta-analysis software. The Barrowman method also does not require the updated meta-analysis to be performed; only sample sizes and Z-statistics from additional studies are required.
There is an increasing number of statistical methods aimed at detecting signals of the need to update systematic reviews. Comparing five of these approaches, three methods could detect potentially out-of-date SRs in our sample of 80 CPCG reviews.
A cut-off of relative change in treatment effect of either <0.5 or >1.5 for the recursive CMA is arbitrary, and seems quite large. This method detected five more out-of-date reviews (Appendix S4; 32, 45–46, 49, 73). With a narrower up-to-date range of less than 25% change; the arbitrary cut-off point, a total of 12 out-of-date reviews were identified.
The CMA sufficiency and stability method represents the most stringent test to detect an out-of-date review; none of 80 reviews had a sufficient number of new studies. The average additional study(ies) in the updated SRs in our sample was two studies, whereas the CMA sufficiency and stability method requires six studies to overturn the significance in meta-analysis (Nfs). The six hidden studies are, however, a much smaller number than the average benchmark (41 studies), which is why all 80 reviews had a sufficiency below unity. Conversely, the Ottawa method is a sensitive method to detect a potentially out-of-date review. Thirty-four reviews were predicted to be out-of-date according to this method. Although the simulation-based power method detected no out-of-date reviews with a power of at least 80%, sensitivity, is increased at lower powers, and this method would identify 4 reviews as being out-of-date when the power was at least 60%.
Our findings show fair agreement between the recursive CMA and Barrowman method in identifying out-of-date systematic reviews. Sutton et al  compared two statistical methods, the Barrowman method and simulation-based power method, across 12 reviews and found the Barrowman method identified 5 reviews as being out-of-date, while only one review was detected by the simulation-based power method. With our sample of reviews the Barrowman method identified seven (of 80) reviews as out-of-date, while the simulation-based power method did not identify any out-of-date review. The review identified by the simulation based power method in Sutton et al  presented the highest power at 89%, p-value after updating was 0.01. This case added 7,397 participants in 5 studies, in addition to the original 7 studies. In our study, the highest power was 63.4%, with a p-value after updating of 0.21. Only a single additional study was added to the previous two studies. The 377 additional participants were observed and this was only 5% (377/7,397) compared to the Sutton study.
A recent study  compared the Ottawa method (using modified qualitative signal, and quantitative signal) to the RAND method (a combination of literature search and the assessment of content experts) across four systematic reviews, 77 outcomes. The paper reported substantial agreement between the methods (Kappa = 0.64, 95% CI 0.45 to 0.83). Our study found less agreement between methods. A possible reason is that the study of Shekelle, et al  applied both quantitative and qualitative approaches, whereas our study only used quantitative signal for the need to update SRs.
In further exploration of the 37 reviews that had been identified for the need to update by one or more statistical methods, we compared the features of the results from our analyses with the eight updated reviews in which the conclusions changed in The Cochrane Library. Of the 34 out-of-date reviews detected by the Ottawa method, 3 had changes in their conclusions in the updated Cochrane report. The Barrowman method indicated 7 outdated reviews, 3 of which had changed conclusions in the updated Cochrane report. None of seven out-of-date reviews identified by the recursive CMA method had changed their conclusions. Only 2 reviews with changed conclusions (Appendix S4; 30, 52) included the same study(ies), comparison and outcome as in the present work. Upon closer examination of the 8 updated Cochrane reviews with changed conclusions it was apparent that discrepancies arose due to differences in the updating time periods, resulting in mismatches between studies included in our and Cochrane “previous” meta-analyses.
With a low power to detect out-of-date reviews, and due to the study design not matching updating periods in The Cochrane Library, there were few out-of-date reviews identified by the Ottawa and Barrowman methods, and none by the recursive CMA method that corresponded with changed conclusions after updating in The Cochrane Library. Further research would ideally use a prospective data collection and analyses to flag reviews that are potentially out of date using a range of statistical methods, and correlate the predictions with subsequent changes in conclusions following updating.
Limitations of this work include the aforementioned study design. As well, we applied the five statistical methods in a retrospective 80 reviews with non-significant meta-analysis at 5% significant level by using a made up updating time – removing the most recent 3 years of included study (ies). The number and types of reviews meant that we could not explore subgroups such as agreement between methods according to the type of effect measure (none of the reviews reported treatment effects as relative difference, or standardized mean difference), and the study cohort was restricted to CPCG reviews.
The practical methods (recursive CMA, Ottawa, and Barrowman methods) suggested by this study can be used for surveillance of the need to update systematic review. However, there is currently no standard approach to determining if a SR is in need of updating. The statistical methods examined in this study were not consistent with one another, in some cases at most agreeing slightly. These methods are all based on changes in statistical significance and precision, which do not take into account other important factors such as an emergence of a superior alternative treatment, or new information on benefit or harm of treatment that contribute to a decision to update, as well as the potential risk of bias(es) of the new evidence from trial(s). Thus our findings represent additional information, rather than a solid basis for the decision.
An approach to calculate the power of updated meta-analysis.
Formula for estimating probability of an event, and mean in the treatment arm.
This study was conducted as part of a doctoral dissertation of PP in Public Health at Khon Kaen University, Khon Kaen, Thailand. Dr. David Moher (DM) is supported in part by a University of Ottawa Research Chair.
We would like to thank Professor Alex Sutton for giving his valuable time to explain, and for providing the STATA code for his method. We would like to gratefully acknowledge Dr. Joanne McKenzie, and Asst. Prof. Dr. Jiraporn Khiewyoo for their valuable suggestions and comments. Special thanks to Dr. Meg Sears for proofreading.
Conceived and designed the experiments: PP ML DM. Performed the experiments: PP PL CN. Analyzed the data: PP. Wrote the paper: PP ML DM. Interpreted the data: PP ML DM.
- 1. Liberati A, Altman D, Tetzlaff J, Mulrow C, Gøtzsche P, et al. (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 339: b2700 10.1136/bmj.b2700.
- 2. Cook D, Mulrow C, Haynes R (1997) Systematic reviews: Synthesis of best evidence for clinical decisions. Ann Intern Med 126: 376–380.
- 3. Chalmers I, Haynes B (1994) Systematic Reviews: Reporting, updating, and correcting systematic reviews of the effects of health care. BMJ 309: 862–865.
- 4. Bastian H, Glasziou P, Chalmers I (2010) Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLoS Med 7: e1000326.
- 5. Jadad A, Cook D, Jones A, Klassen T, Tugwell P, et al. (1998) Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. JAMA 280: 278–280.
- 6. French S, McDonald S, McKenzie J, Green S (2005) Investing in updating: how do conclusions change when Cochrane systematic reviews are updated? BMC Med Res Methodol 5: 33.
- 7. Moher D, Tetzlaff J, Tricco A, Sampson M, Altman D (2007) Epidemiology and reporting characteristics of systematic reviews. PLoS Med 4: e78.
- 8. Garritty C, Tsertsvadze A, Tricco AC, Sampson M, Moher D (2010) Updating Systematic Reviews: An International Survey. PLoS ONE 5: e9914.
- 9. Moher D, Tsertsvadze A, Tricco A, Eccles M, Grimshaw J, et al. (2008) When and how to update systematic reviews. Cochrane Database Syst Rev MR000023.
- 10. Moher D, Tsertsvadze A, Tricco AC, Eccles M, Grimshaw J, et al. (2007) A systematic review identified few methods and strategies describing when and how to update systematic reviews. Journal of Clinical Epidemiology 60: 1095.e1091–1095.e1011.
- 11. Ioannidis JPA, Contopoulos-Ioannidis D, Lau J (1999) Recursive cumulative meta-analysis: a diagnostic for the evolution of total randomized evidence from group and individual patient data. J Clin Epidemiol 52: 281–291.
- 12. Mullen B, Muellerleile P, Bryant B (2001) Cumulative Meta-Analysis: A Consideration of Indicators of Sufficiency and Stability. Personality and Social Psychology Bulletin 27: 1450–1462.
- 13. Barrowman N, Fang M, Sampson M, Moher D (2003) Identifying null meta-analyses that are ripe for updating. BMC Med Res Methodol 3: 13 Epub 2003 Jul 2023.
- 14. Shojania K, Sampson M, Ansari M, Ji J, Doucette S, et al. (2007) How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med 147: 224–233 Epub 2007 Jul 2016.
- 15. Sutton A, Cooper N, Jones D, Lambert P, Thompson J, et al. (2007) Evidence-based sample size calculations based upon updated meta-analysis. Stat Med 26: 2479–2500.
- 16. Green S, Higgins J, Alderson P, Clarke M, Mulrow C, et al.. (2008) Chapter 1: Introduction In: Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.0.1: The Cochrane Collaboration.
- 17. Higgins J, Green S, Scholten R (2008) Chapter 3: Maintaining reviews: updates, amendments and feedback. In: Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.0.1: The Cochrane Collaboration.
- 18. Borm GF, Donders ART (2009) Updating meta-analyses leads to larger type I errors than publication bias. Journal of Clinical Epidemiology 62: 825–830.e810.
- 19. Sutton A, Donegan S, Takwoingi Y, Garner P, Gamble C, et al. (2009) An encouraging assessment of methods to inform priorities for updating systematic reviews. J Clin Epidemiol 62: 241–251 Epub 2008 Sep 2010.
- 20. Jaidee W, Moher D, Laopaiboon M (2010) Time to update and quantitative changes in the results of cochrane pregnancy and childbirth reviews. PLoS One 5: e11553.
- 21. Berlin JF, Laird NF, Sacks HF, Chalmers TC (1989) A comparison of statistical methods for combining event rates from clinical trials. Stat Med 8: 141–151.
- 22. Team RDC (2010) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- 23. StataCorp (2007) Stata Statistical Software: Release 10. College Station, Texas: StataCorp LP.
- 24. Shekelle P, Newberry S, Wu H, Suttorp M, Motala A, et al. (2011) Identifying Signals for Updating Systematic Reviews: A Comparison of Two Methods [Internet]. Agency for Healthcare Research and Quality (US) CTI - AHRQ Methods for Effective Health Care