Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

How trustworthy and applicable is the evidence from systematic reviews of depression treatments: Protocol for systematic examination

  • Iwo Fober,

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Meta-Research Centre, University of Wroclaw, Wroclaw, Poland

  • Lidia Baran,

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliations Meta-Research Centre, University of Wroclaw, Wroclaw, Poland, Institute of Psychology, University of Wroclaw, Wroclaw, Poland

  • Myrto Samara,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Psychiatry, Faculty of Medicine, University of Thessaly, Larissa, Greece

  • Spyridon Siafis,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Technical University of Munich, TUM School of Medicine and Health, TUM University Hospital, Department of Psychiatry and Psychotherapy, Munich, Germany

  • David Robert Grimes,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation TCD Biostatistics Unit, School of Medicine, Trinity College Dublin, Dublin, Ireland

  • Bartosz Helfer

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – original draft, Writing – review & editing

    bartosz.helfer@gmail.com

    Affiliations Meta-Research Centre, University of Wroclaw, Wroclaw, Poland, Institute of Psychology, University of Wroclaw, Wroclaw, Poland

Abstract

Background

Depression is a common mental disorder significantly impacting daily functioning. Standard treatments include drugs, psychotherapies, or a combination of both. Treatment selection relies on scientific evidence, though the trustworthiness and applicability of this evidence can vary.

Objectives

This protocol presents a method to evaluate evidence from systematic reviews for pharmacological and psychological treatments for depression, focusing on trustworthiness and applicability structured into five components: quality of conduct and reporting, risk of bias, spin in abstract conclusions, robustness of meta-analytical results, heterogeneity and clinical diversity.

Methods

We will conduct a systematic search of systematic reviews in MEDLINE, Embase, PsycInfo, and Cochrane Database of Systematic Reviews. Our focus will be on systematic reviews of first-line treatments for depression in adults, including antidepressants, psychotherapy, or combined treatments, compared to either active or inactive comparators. We will extract information needed for a comprehensive methodological evaluation using qualitative tools, including AMSTAR 2, ROBIS, Conflict-of-Interest assessment, Referencing Framework for SRs, Spin Measure, and heterogeneity exploration assessment. For quantitative analyses, such as Fragility Index, Ellipse of Insignificance, Region of Attainable Redaction, GRIM test, Leave-N-Out analysis, and prediction intervals, we will select and recalculate two meta-analyses per review. We define a set of outcomes to enable practical and intuitive interpretation of these analyses’ results. Descriptive statistics, non-parametric statistical tests, and narrative summaries will be used to synthesize and compare outcomes across several pre-specified subgroups.

Expected outcomes

We expect these analyses to provide an enhanced perspective on the practice of evidence synthesis in the field of mental health, offer methodological guidance for future systematic reviews and meta-analyses, and contribute to improved informed decision-making by clinicians and patients.

OSF registration

osf.io/7f9cj and osf.io/ynejs

Introduction

Background

With approximately 280 million people worldwide affected by depressive disorders [1], there is a constant demand for effective interventions. According to evidence-based practice (EBP) and evidence-based medicine (EBM), the utilization of the best evidence in clinical decision-making is not only ethical but also necessary to address worldwide treatment demands [2,3]. The ethicality of applying these approaches in mental health is subject to debate [4], nevertheless systematic reviews (SRs) and randomized controlled trials (RCTs) still occupy the highest tiers in many evidence hierarchies and are instrumental in forming recommendations and treatment guidelines, including psychotherapy and pharmacotherapy for depression [59]. However, despite decades of research on depression treatments [10,11], concerns persist regarding the evidence base.

For instance, RCTs’ reports often lack detailed descriptions of control conditions such as ‘treatment as usual,’ [12] and industry-funded studies tend to report higher effectiveness, especially in pharmacological research [13]. Many studies do not reflect real-world conditions [14,15], and underpowered trials limit the detection of true treatment effects [16]. Additionally, the risk of bias due to issues such as unblinded designs or inadequate randomization is common [17,18]. A high risk of bias and low-quality of evidence from trials can, in turn, lead to incorrect estimation of treatment effects [19].

In evidence synthesis, it is essential to address and account for the limitations of primary studies when drawing conclusions; otherwise, the findings may be misleading. However, in addition to inheriting some methodological flaws from the primary studies, SRs also face review-specific challenges that reduce their trustworthiness and applicability. Specifically, SRs are susceptible to multiple factors impacting their quality of conduct and reporting. These include issues at the design stage (e.g., failure to account for previous reviews, lack of prespecified methods, conflicts of interest), with prospectively registered protocol being particularly important [20], and execution stage (e.g., insufficiently comprehensive search strategies, single study selection and data extraction) [21]. The risk of bias might be introduced to SR by inappropriate or unclear inclusion criteria, or flawed data synthesis and analysis, among others [22]. Of particular concern are inaccuracies in interpretation, which can lead to spin – the presentation or interpretation of findings in a way that emphasizes favourable results or downplays unfavourable ones, potentially leading to biased conclusions [23]. When a quantitative synthesis is performed, the robustness of overall effect size estimation should receive as much attention as its magnitude and statistical significance, as many meta-analyses are fragile to even slight changes in trials results, or their statistical significance relay on a single study [24,25]. Robustness may also be compromised when data in primary studies are redacted or when studies with reporting anomalies are excluded – issues that have increasingly drawn the attention of the public and researchers in recent years, particularly in psychology and biomedical sciences [2628]. Finally, evidence-based approaches have been criticized for focusing on average treatment effects, which may not adequately reflect individual patient outcomes due to variability in responses [29,30]. In addition, the diagnosis of mental health issues, which forms the basis for inclusion in clinical trials, may rely on self-reported scales, clinical judgment, or structured interviews based on various sets of criteria that are periodically updated. The lack of biomarkers further complicates the standardization of patient samples. Moreover, psychological treatments are complex interventions, which are difficult to standardize and may be compared against a range of heterogeneous comparators (e.g., other psychotherapies, waiting lists, or treatment as usual). To address these limitations, it is crucial to sufficiently account for between-study heterogeneity in SRs while considering the clinical diversity of patients seeking help for mental health issues. This approach is essential for gaining a better understanding of treatment effects, ensuring the generalizability of evidence, and translating it into practice effectively [3133].

Hypotheses

Based on the overview of common issues in systematic reviews [34,35], and drawing on prior analyses conducted in other areas of medicine [3642] and mental health [4346], showing critically low quality of 53%–99% and 68%–88% SRs respectively (see S1 Appendix), we hypothesize that a significant proportion of SRs on depression treatments will show low overall quality of conduct and reporting, with issues like lack of pre-registered protocols, incomplete risk of bias evaluations, and insufficient justification for excluded studies. Most SRs will demonstrate low or unclear overall risk of bias, with some shortcomings in reporting eligibility criteria and search strategies, reflecting previous findings [4749]. A substantial portion of SRs will exhibit spin, potentially distorting the interpretation of findings, aligning with previous evidence from psychotherapy [50] and adolescent depression trials [51]. Many meta-analyses will display low robustness, where statistically significant results could be easily reversed with minimal changes in trial events [24], or the removal of a single study significantly alters the overall effect, undermining the stability of conclusions [52,53]. Supplementing meta-analytic results with prediction intervals (PIs) will alter or add to conclusions about the safety and efficacy of depression treatments. Specifically, in some reviews PIs will be much wider than reported confidence intervals, offering a broader perspective on result uncertainty, potentially including null effects or effects in the opposite direction to those reported, capturing both positive and negative effects of similar size within the interval. We also anticipate that clinical diversity and statistical heterogeneity will be inadequately addressed, limiting the generalizability of findings [3133,5456]. Conclusions of these analyses will vary between subgroups of reviews defined by factors like interventions, comparators, or methodological quality.

Anticipated new evidence

This study aims to evaluate trustworthiness and applicability structured into five components: quality of conduct and reporting, risk of bias, spin in abstract conclusions, robustness of meta-analytical results, heterogeneity and clinical diversity.

The qualitative assessment was designed to replicate and expand on findings from analyses partially overlapping with our project [4346]. To date, SRs in mental health have primarily been evaluated using the AMSTAR tool. Our analysis incorporates ROBIS, as well as additional tools that enable a more nuanced examination of key concepts we believe warrant closer attention – namely referencing, conflicts of interest, spin, and heterogeneity exploration practices.

To the best of our knowledge, the quantitative assessments we propose have not been applied to such a large body of evidence on depression or any other mental disorder. Examining the fragility of meta-analyses will enrich the assessment of evidence certainty by adding a new dimension of robustness – an aspect whose importance is increasingly recognized in other areas of medicine, both in the evaluation of primary studies [57] and meta-analyses [24,5861], as well as in its influence on clinical guidelines [6267]. In addition to calculating the fragility for the meta-analysis, we will also contextualize it using dropout rates from the included clinical trials. Calculating prediction intervals will enhance understanding of the impact of heterogeneity on conclusions drawn from meta-analyses of depression treatments [68]. To interpret them meaningfully, we adopted the most practical approaches based on analysing the relationship between prediction intervals, confidence intervals, and the line of null effect. This is particularly valuable given the clear need among mental healthcare professionals for a framework that supports the implementation of EBM and EBP in clinical practice [69].

Through additional analyses, we will be able to offer further insight into certain issues specific to clinical research and evidence synthesis in psychiatry and clinical psychology, such as differences in evaluating pharmaceutical and psychological interventions for the same indication, and the impact of how the research question was framed or which comparator was selected.

Our approach, grounded in systematic and transparent evaluation using state-of-the-art tools that are rarely or never applied in this field, will provide a new, in-depth perspective on the implementation of EBP and EBM methods in mental health. Ultimately, this will reduce research waste by offering methodological guidance for future systematic reviews and meta-analyses, strengthen the evidence base for clinical guidelines, and support more informed clinical decision-making and personalized patient care.

Materials and methods

This observational meta-research study employs systematic approach for data collection, analysis, and reporting, and follows PRISMA guidelines [70]. A visual summary of the materials and methods is presented in Figs 1 and 2.

thumbnail
Fig 1. Study workflow leading to the assessment of quality of conduct and reporting, risk of bias and spin in abstract conclusions.

https://doi.org/10.1371/journal.pone.0325384.g001

thumbnail
Fig 2. Study workflow leading to the assessment of heterogeneity and clinical diversity, and robustness of meta-analytical results.

https://doi.org/10.1371/journal.pone.0325384.g002

Given the exploratory nature of this analysis and the need to balance the significance of our study’s conclusions with the feasibility of the project, we defined our eligibility criteria to obtain a sample of systematic reviews applicable to the broadest possible patient population, focusing on well-established first-line treatments and addressing the fundamental questions of efficacy and safety. Reviews must clearly indicate in the title or abstract that their primary focus is on treatments for depression or depressive symptoms, as defined below. To ensure a systematic and transparent study selection process while maintaining feasibility, we decided that eligibility for this study would be determined based on the inclusion and exclusion criteria prespecified in the reviews’ methods sections, rather than the characteristics of the trials actually included. Table 1. contains a summary of eligibility criteria.

Inclusion criteria

Population.

The inclusion criteria for this study will encompass systematic reviews focused on depressive disorders, regardless of how they are described or defined by the authors (e.g., ‘depression,’ ‘major depressive disorder,’ ‘unipolar depression,’ ‘elevated depression symptoms’). We will include reviews based on self-reports, structured diagnostic criteria or clinical judgment, as well as those that do not specify the diagnostic criteria for depression in their inclusion criteria. Additionally, we will include reviews investigating specific features of depression in populations where the primary diagnosis is a depressive disorder. The target population is adults (over 18 years of age), and if age is not explicitly mentioned and the target group is unclear, we will assume the reviews focus on adults.

Interventions.

We will include reviews on the acute treatment of depression. If the phase of treatment is not explicitly mentioned, we will assume the review is on acute treatment. Eligible pharmacotherapies include drugs commonly referred to as ‘antidepressants,’ whether in mono- or polytherapy. This includes individual drugs (e.g., ‘sertraline’, ‘mirtazapine’), pharmacological groups (e.g., ‘tricyclic antidepressants’), or antidepressants as a drug class. Additionally, we will include psychological treatments in the form of psychotherapies (‘talking therapies’ that consist mainly of verbal communication with specialist) of all formats (e.g., ‘individual’, ‘group’, or ‘family therapies’) and modes of delivery (e.g., ‘face-to-face’, ‘videoconference’, ‘telephone call’). Eligible theoretical approaches will include but will not be limited to cognitive behavioural therapy, third-wave cognitive behavioural therapy (e.g., ‘dialectical behaviour therapy,’ ‘acceptance and commitment therapy’), problem-solving therapy, interpersonal therapy, psychodynamic therapy, behavioural activation therapy, and life review therapy. Reviews of combinations of all the above treatments (used simultaneously or sequentially) will be included.

Comparators.

We will include reviews that consider any of the interventions described above as an active comparator. Additionally, control conditions such as ‘care as usual,’ ‘minimal treatment,’ ‘no treatment,’ ‘placebo pill,’ ‘psychological placebo,’ and ‘waiting list’ will be included.

Outcomes.

The focus of eligible reviews must be on safety and/ or efficacy, measured by any related outcomes (e.g., ‘response,’ ‘remission,’ ‘symptom reduction,’ ‘dropouts’ rate,’ ‘adverse events’). We will include reviews aiming to explore moderators or predictors only if the overall effect size is reported.

Trials’ design.

Only reviews of RCTs will be included

Review type and data availability.

In the qualitative assessment (i.e., quality of conduct and reporting, risk of bias, spin in abstract conclusions), we will include any study meeting the eligibility criteria and identified by the authors as a systematic review, regardless of whether it incorporates a meta-analysis. Additionally, studies that broadly follow a systematic approach will also be eligible. Specifically, they should address a clearly defined research question. It should be evident that the included studies were identified through searches conducted in medical databases or clinical trial registries, as opposed to, for example, analysing studies conducted by a single drug manufacturer. Eligibility criteria should be defined at least in terms of population, intervention, and comparator. The results should report the number and characteristics of the included studies. If no meta-analysis was performed, the authors should provide an unbiased narrative summary of the review’s findings, covering aspects such as efficacy, safety, quality, or quantity of available evidence, and offering recommendations for clinical practice or future research. Reviews employing Network Meta-Analyses (NMAs) will also be included in the qualitative assessment.

For the quantitative analysis (i.e., robustness of meta-analytical results, heterogeneity evaluation with prediction intervals), we will include reviews that report pairwise meta-analyses, provided that the extraction of necessary data is possible either from forest plots or from the original reports of included primary studies. We aim to maintain a focus on direct pairwise comparisons with clear methodological assumptions. Therefore, Network Meta-Analyses (NMAs) and Individual Participants Data Meta-Analyses (IPDMAs) will be excluded from this part of the project. However, a recommended and widely adopted practice when conducting review with NMAs is to first perform a standard direct pairwise meta-analysis. Some IPDs also report such results. In these cases, NMAs and IPDMAs will be treated as sources of eligible meta-analyses, which will be extracted and included in the quantitative assessment as if derived from a standard SRs.

Exclusion criteria

Population.

We will exclude reviews focused on treatment-resistant depression, psychotic depression, schizoaffective disorder, and bipolar depression. Additionally, reviews targeting specific adult groups (e.g., ‘older adults,’ ‘late life depression’, ‘peripartum depression,’ ‘depression in patients with physical illness,’ ‘students,’ ‘specific races or ethnic groups’) will be excluded. Reviews that include both eligible and ineligible populations will be excluded. Reviews with a broad (e.g., ‘depression and anxiety disorders’) or unspecified scope (e.g., ‘common mental disorders’) will be excluded. The exclusion of other affective disorders with a depressive component and specific populations is due to differences in presumed pathophysiology, clinical presentation, management, and prognosis [7175]. Their inclusion would also conflict with our core objective of assessing pieces of evidence applicable to the broadest possible population and would negatively impact the feasibility of the study.

Interventions.

We will exclude reviews focused on continuation and maintenance treatment, relapse, and recurrence prevention. Additionally, pharmacotherapies involving drugs other than antidepressants (e.g., ‘antipsychotics’, ‘mood stabilizers’), phytopharmaceuticals, and dietary supplements (e.g., ‘St. John’s wort,’ ‘fatty acids’), as well as psychedelics (e.g., ‘psilocybin’, ‘LSD’, ‘ketamine’) and psychedelic-assisted psychotherapy, will be excluded, as they are not first-line treatments. Self-guided and Internet-delivered programs (without therapist engagement) will be excluded. Reviews focused on both the included and excluded therapies mentioned above will be excluded.

Comparators.

Reviews considering comparators other than specified in the inclusion criteria for this study (e.g., ‘acupuncture,’ ‘physical exercises’, ‘self-guided programs,’ ‘light therapy’) or both eligible and ineligible comparators, will be excluded.

Outcomes.

Reviews reporting solely outcomes irrelevant to safety and/ or efficacy will be excluded. We will exclude reviews exploring moderators and predictors (unless the overall effect size is reported) and methodological reviews (addressing aspects of depression research like ‘differences in baseline severity’ or ‘types of outcomes measures’).

Trials’ design.

Reviews including studies other than RCTs or both RCTs and non-RCTs, will be excluded.

Review type and data availability.

We will exclude publications synthesising trials selected in a non-systematic manner (e.g., ‘pooled analysis’ of all trials performed by a manufacturer). IPDMAs will be excluded unless they report a pairwise meta-analysis, as described in the inclusion criteria.

Information sources and search strategy

We performed a comprehensive search in electronic databases, including MEDLINE (via PubMed), Embase, and the Cochrane Library, as the most extensive and widely used general medical databases for evidence synthesis, and PsycINFO, as a subject specific database – as recommended by Cochrane Handbook [76]. The search strategy combined general and specific terms associated with depression, psychotherapy, and pharmacotherapy. For PubMed and Embase we used a validated methodological filter for systematic reviews [77,78]. We used build-in filters for systematic reviews in Cochrane Library and PsycNet. We combined free-text search terms with structured vocabularies (MeSH and Emtree). For the antidepressant search component, we used the strategy published in Cipriani et al [79]. No time or language restrictions were applied. Search strategies are reported in the S2 Appendix.

Reviews selection

Search results were pooled and deduplicated using EndNote software, following the Bramer Method [80]. The screening was facilitated by Covidence software and occurred in two stages: 1) title and abstract screening with two independent reviewers evaluating titles and abstracts for eligibility based on the inclusion and exclusion criteria; 2) full-text screening, where reviews were assessed independently by two reviewers, and discrepancies resolved through consensus (discussion or consultation with a third reviewer). The PRISMA Flow Diagram is reported in S3 Appendix. In total, we included 153 reviews (see S4 Appendix for list of included reviews and S5 Appendix for excluded reviews with reasons for exclusion).

Data extraction

Data extraction will be conducted independently by two reviewers, using standardized forms in Google Sheets. Disagreements will be resolved through consensus or, if necessary, by consulting a third reviewer. Extracted data will include SRs’ characteristics, populations, interventions, comparators, and information required for qualitative assessments. List of extracted variables is reported in the S6 Appendix. We will also select two meta-analyses per review – one of a binary outcome and one of a continuous outcome, regardless of the measure of effect size (e.g., odds ratio, risk ratio, standardized mean difference). The following predefined selection approach will be applied. We will use a meta-analysis of the primary outcome relevant to efficacy or safety. If multiple comparisons are eligible, we will prioritize the comparison to inactive control conditions. If the primary outcome is not defined or not relevant, we will use the first relevant meta-analysis reported. If the first selected outcome is binary, we will select the first relevant continuous outcome as the second analysis, and vice versa. We will use one meta-analytic result if only one type of outcome is analysed. For the selected meta-analyses, we will extract overall results and the results of included individual studies (numbers of participants in experimental and control groups, means and standard deviations or numbers of events, number of dropouts in both groups). This will be done from forest plots (if available) or original study reports.

Assessment methods and tools

The tools we will use to assess each component of trustworthiness and applicability, and their descriptions are presented in Table 2. Two reviewers will perform assessments independently, and conflicts will be resolved by consensus with a third reviewer. A pilot assessment will be conducted on a sample of studies to ensure consistency. In cases where systematic reviews do not contain a protocol or a list of excluded studies, we will contact the corresponding authors to request this information, as the presence of both is assessed within the so-called critical domains of the AMSTAR 2 tool – domains that, if rated poorly, can significantly affect the overall quality rating. These actions are based on the assumption that the lack of access to a pre-registered protocol or a list of excluded studies, while a limitation, does not necessarily mean that they were not prepared. Allowing authors to share this information helps prevent unfairly negative bias against reviews, particularly older ones conducted before protocol registration and reporting guidelines became widespread. First, we will email the corresponding author using the published contact information. If no response is received within one week, we will follow up by emailing both the corresponding and one other author (preferably first or second). If there is no response after two weeks, we will finalize the procedure and consider the attempt concluded. The absence of other essential information for assessment (e.g., search strategy) in the review report or protocol (whether published or obtained through author contact) will not be supplemented in this manner and will result in a lower rating.

thumbnail
Table 2. Tools to assess trustworthiness and applicability components of evidence from systematic reviews of depression treatments.

https://doi.org/10.1371/journal.pone.0325384.t002

Calculations, outcomes and data analysis

Calculations will be performed for meta-analyses selected at the data extraction stage.

Calculations for robustness analyses

For meta-analysis of 2 x 2 dichotomous outcome trials, the iterative method of Atal et al. [24] will be employed to estimate the fragility of systematic reviews. This will be cross-validated with a meta-analytic extension of EOI (Ellipse of Insignificance) analysis [25] to analytically determine fragility, and where applicable a ROAR (Region of Attainable Redaction) analysis [26] to estimate the effects of missing data.

EOI analysis will be used in our study in two ways. Firstly, we will apply it to data from individual RCTs included in the meta-analyses, extracted from the original trial reports to ascertain on an individual level how robust constituent trials are. Secondly, we will also apply it to aggregate data. For all meta-analyses, the crude risk ratio (RR-Crude) and the Cochran-Mantel-Haenszel risk ratio (RR-CMH) are calculated. When these differ by less than 10%, it is appropriate to treat data as pooled, and from this an EOI analysis can be performed on the pooled data to ascertain what fragility fraction of the aggregated studies. Alongside this, we deploy Atal’s method for estimating meta-analytic fragility. This is in effect a greedy algorithm, which modifies the studies that have the biggest immediate effect on the meta-analytic result, in each step finding the study where flipping an event status would cause the largest movement toward changing the result and modifies and re-evaluates until the threshold is crossed. This greedy approach makes the algorithm much faster than brute-force, but it may not always find the absolute minimum number of changes if a different set of edits elsewhere would have been more optimal. It does however tend to find a unique and minimum set of specific modifications that would flip the results of a meta-analysis, whereas EOI finds the general degree of recoding required to flip conclusions. Thus, Atal’s algorithm is deployed here as a lower absolute bound on fragility while EOI serves to estimate the pooled fragility of all studies.

To assess potential anomalies in the reported results of studies included in the SRs, we will apply the GRIM test [28] to chosen continuous outcomes. For each study, we will extract the relevant summary statistics: sample size, SD and mean and evaluate whether the reported results are consistent with the mathematical expectations derived from these parameters. Studies that fail the GRIM test will be excluded from sensitivity analyses and the meta-analytic results recalculated to assess whether the overall effect size and statistical significance remain robust to their removal.

We will also implement a leave-N-out sensitivity analysis to quantify the sensitivity of the SRs’ results. For this, we will systematically exclude N studies from the analysis, iterating through all possible combinations (up to N = 5). For each recalculated meta-analytic result, we will record whether the significance or direction of the effect changes. The fragility of the result will be defined as the minimum number of excluded studies required to change the overall statistical significance.

Calculations for heterogeneity analyses

We will recalculate selected meta-analytic findings using a random-effects model with the Hartung-Knapp-Sidik-Jonkman (HKSJ) method. The HKSJ method is a random-effects meta-analytic approach that adjusts for small sample sizes and accounts for uncertainty in heterogeneity, making it superior to conventional random-effects models that may underestimate this variability [68,86,87]. In cases where meta-analyses calculated confidence intervals at significance levels other than 95%, we will use the original levels to examine the effect of the HKSJ method on statistical significance changes, and then calculate 95% confidence intervals. All PIs will be calculated at the 95% significance level using the R package meta with the metagen function [88]. In addition, we will calculate probabilities of true effect in a new study to be below null effect and of the opposite size using the metafor package with the pt function, employing a t-distribution with k-2 degrees of freedom [89].

These analyses will be automated using R scripts.

Outcomes

The outcomes for the qualitative parts of our study will be the percentages of systematic reviews in each category, based on the results of assessments using specific tools (e.g., presence or absence of spin, level of risk of bias, use of meta-regression to explore heterogeneity). The outcomes for the quantitative parts will include both categorical outcomes (based on prespecified criteria) and continuous outcomes. Definitions of the most important outcomes are presented in Table 3. Tools for assessments are described in Table 2.

thumbnail
Table 3. Outcomes for the trustworthiness and applicability evaluation.

https://doi.org/10.1371/journal.pone.0325384.t003

We used the CINeMA approach to assess heterogeneity in pairwise meta-analyses, as it relies on forest plot interpretation and does not account for indirectness – making it suitable despite being developed for NMAs. We will also analyse the impact of using the HKSJ method on the imprecision of pooled effect estimates. Like the assessment of heterogeneity’s impact, we will first determine the proportion of meta-analyses of which pooled effect size changed – from significant to non-significant (and vice versa) or showed no change. To gain more practical insight, we will then assess the proportion of meta-analyses for which the assessment of imprecision would change in the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) – a prominent approach for assessing certainty of evidence for healthcare recommendations [91]. Additionally, we will characterize calculated PIs in terms of their width, alone and in relation to corresponding confidence intervals. Finally, results for robustness analysis obtained with EOI will be compared to Atal’s et al. method employing FI for cross-validation.

Data analysis

We will use descriptive statistics and narrative summaries to synthesize the data. To explore associations between outcomes and reviews’ characteristics listed in Table 4 we will use odds ratios. In addition, we will use regression analysis to examine the relationship between the outcomes and the year of publication. We will conduct subgroup analyses to explore how the results vary between groups defined by factors outlined in Table 4. Where possible, we will conduct statistical analyses to explore differences between groups, using non-parametric tests, including Kruskal-Wallis (across 3 groups with post-hoc pairwise comparisons if significant) and Mann-Whitney U Test (for 2 groups). We will use the chi-squared test to analyse the proportion of reviews in each category as described above. Given the exploratory nature of these analyses, no adjustments for multiple testing will be employed.

Data management, open science, and dissemination plan

Upon completion of the analyses and after peer review and accompanying publication, we will make the data and analysis scripts openly available through a suitable data repository or supplementary materials, adhering to open science principles to facilitate transparency and reproducibility.

Ethical considerations

As this study involves the analysis of published data, ethical approval is not required. There are no safety considerations applicable to this study.

Status and timeline

At the time of this protocol submission, the project successfully completed pilot testing of the data extraction processes and assessments, and OSF protocols have been developed and registered. The projected timeline for data collection is May 2025, and the second quarter of 2025 for project completion.

Discussion

This study aims to address challenges in evaluating treatments for depression by conducting a methodologically focused analysis of systematic reviews. We propose a framework to assess trustworthiness and applicability, structured into five components: quality of conduct and reporting, risk of bias, spin in abstract conclusions, robustness of meta-analytical results, and heterogeneity and clinical diversity.

The sample of evidence has both strengths and limitations. It will include a large selection of systematic reviews and meta-analyses, whose findings are expected to be generalizable to a broad population of adults with depression. However, narrowing the scope to psychotherapies and antidepressants may overlook other effective and relevant for clinical practice interventions, such as antipsychotics or transcranial stimulation. Additionally, the restrictive approach to selecting reviews based solely on their eligibility criteria, while beneficial for the feasibility of the study, might exclude certain articles considered seminal. These limitations could be addressed in future research of this kind by focusing on the most influential reviews, such as those informing clinical guidelines – an approach we plan to undertake in subsequent studies. Excluding NMAs from part of the analyses can be seen as another limitation, as these analyses provide indirect evidence and comparisons between multiple interventions. However, direct pairwise comparisons are easier to interpret and rely on simpler, clearer methodological assumptions. NMAs require complex assumptions about transitivity and homogeneity, which are harder to verify and can introduce additional bias. By focusing on direct comparisons, our study design ensures more robust, reliable, and clinically relevant results, free from the added complexities often found in NMAs.

The tools selected for this analysis also have their limitations. The qualitative tools provide a multifaceted perspective on the evidence but occasionally require subjective judgments. To mitigate the influence of such judgments and ensure transparency, we will document and publicly share the rationale behind them. Quantitative tools, on the other hand, rely on various assumptions which will be carefully considered when interpreting the results.

We define a set of outcomes that will enable the comparison and evaluation of the utility and relevance of state-of-the-art meta-research tools, as well as the intuitive and meaningful interpretation of the assessments’ results. Our planned subgroup analyses may help identify directions for investigating reasons of potential methodological challenges. However, it should be noted that all analyses remain observational in nature and are intended as exploratory.

Our analysis, while comprehensive, does not address all methodological challenges encountered in evidence synthesis in mental health. For instance, publication bias is considered a potentially significant factor when drawing conclusions about the efficacy of both medications and psychotherapies [93,94]. However, we have opted not to analyse it due to the limitations of existing methods for its detection and correction, whose validity is particularly affected by between-study heterogeneity [95] – a factor we anticipate to be substantial in the meta-analyses included in our review.

Regarding dissemination plans, upon completion and after peer review and publication, we will make our data and analysis scripts openly available through a suitable data repository, adhering to open science principles. This transparency will facilitate reproducibility and allow other researchers to build upon our work, contributing to improved methodological standards in the field.

Any amendments to the study protocol will be documented transparently. Should adjustments be necessary – such as changes to eligibility criteria or analytical methods – we will update our registered protocol accordingly and report these modifications in our final publication, providing justifications and discussing potential impacts on our findings.

In conclusion, our intended study seeks to evaluate how trustworthy and applicable is the evidence from systematic reviews of depression treatments. Despite inherent limitations, we believe that results of this analysis will be well-positioned to form a basis for future recommendations on how to enhance the methodological rigor of SRs on treatments for depression by addressing key issues related to conduct, reporting and interpretation. Therefore, by highlighting methodological strengths and weaknesses in current SRs, we aim not to criticize the laudable efforts of past reviewers, but to eventually strengthen their toolkit and contribute to developing future methods for evidence evaluation. Such methodological advances are not trivial, often leading to improved treatment strategies for individuals with depression, as evidenced by the continuous developments in evidence-based medicine and research synthesis.

References

  1. 1. Depressive disorder (depression). Accessed 2025 January 29. https://www.who.int/news-room/fact-sheets/detail/depression
  2. 2. McKibbon KA. Evidence-based practice. Bull Med Libr Assoc. 1998;86(3):396–401. pmid:9681176
  3. 3. Tenny S, Varacallo M. Evidence-based medicine. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2024.
  4. 4. Gupta M. Is evidence-based psychiatry ethical? Oxford: Oxford University Press; 2014.
  5. 5. Cleare A, Pariante CM, Young AH, Anderson IM, Christmas D, Cowen PJ, et al. Evidence-based guidelines for treating depressive disorders with antidepressants: a revision of the 2008 British Association for Psychopharmacology guidelines. J Psychopharmacol. 2015;29(5):459–525. pmid:25969470
  6. 6. Parikh SV, Segal ZV, Grigoriadis S, Ravindran AV, Kennedy SH, Lam RW. Canadian Network for Mood and Anxiety Treatments (CANMAT) clinical guidelines for the management of major depressive disorder in adults. II. Psychotherapy alone or in combination with antidepressant medication. J Affect Disord. 2009;117(Suppl 1):S15–25.
  7. 7. Depression in adults: treatment and management. London: National Institute for Health and Care Excellence (NICE); 2022. http://www.ncbi.nlm.nih.gov/books/NBK583074/
  8. 8. World Health Organization. Mental Health Gap Action Programme (mhGAP) guideline for mental, neurological and substance use disorders. Geneva: World Health Organization; 2023: 1.
  9. 9. Gelenberg AJ, Freeman MP, Markowitz JC, Rosenbaum JF, Thase ME, Trivedi MH. Practice guideline for the treatment of patients with major depressive disorder. 2010. https://psychiatryonline.org/pb/assets/raw/sitewide/practice_guidelines/guidelines/mdd-1410197717630.pdf
  10. 10. Cuijpers P. Four decades of outcome research on psychotherapies for adult depression: an overview of a series of meta-analyses. Canadian Psychology/ Psychologie Canadienne. 2017;58(1):7–19.
  11. 11. Luo Y, Chaimani A, Furukawa TA, Kataoka Y, Ogawa Y, Cipriani A, et al. Visualizing the evolution of evidence: cumulative network meta-analyses of new generation antidepressants in the last 40 years. Res Synth Methods. 2021;12(1):74–85. pmid:32352639
  12. 12. Petersson E-L, Forsén E, Björkelund C, Hammarbäck L, Hessman E, Weineland S, et al. Examining the description of the concept “treatment as usual” for patients with depression, anxiety and stress-related mental disorders in primary health care research - A systematic review. J Affect Disord. 2023;326:1–10. pmid:36708952
  13. 13. Cristea IA, Gentili C, Pietrini P, Cuijpers P. Sponsorship bias in the comparative efficacy of psychotherapy and pharmacotherapy for adult depression: meta-analysis. Br J Psychiatry. 2017;210(1):16–23. pmid:27810891
  14. 14. Rutherford BR, Cooper TM, Persaud A, Brown PJ, Sneed JR, Roose SP. Less is more in antidepressant clinical trials: a meta-analysis of the effect of visit frequency on treatment response and dropout. J Clin Psychiatry. 2013;74(7):703–15. pmid:23945448
  15. 15. Weisz JR, Gray JS. Evidence-based psychotherapy for children and adolescents: data from the present and a model for the future. Child Adolesc Ment Health. 2008;13(2):54–65. pmid:32847169
  16. 16. Cuijpers P. Are all psychotherapies equally effective in the treatment of adult depression? The lack of statistical power of comparative outcome studies. Evid Based Ment Health. 2016;19(2):39–42. pmid:26984413
  17. 17. Arroll B, Chin W-Y, Martis W, Goodyear-Smith F, Mount V, Kingsford D, et al. Antidepressants for treatment of depression in primary care: a systematic review and meta-analysis. J Prim Health Care. 2016;8(4):325–34. pmid:29530157
  18. 18. Furukawa TA, Noma H, Caldwell DM, Honyashiki M, Shinohara K, Imai H, et al. Waiting list may be a nocebo condition in psychotherapy trials: a contribution from network meta-analysis. Acta Psychiatr Scand. 2014;130(3):181–92.
  19. 19. Cuijpers P, Miguel C, Harrer M, Plessen CY, Ciharova M, Papola D, et al. Psychological treatment of depression: a systematic overview of a ‘Meta-Analytic Research Domain.’ J Affective Disorders. 2023 Aug 15;335:141–51.
  20. 20. Ge L, Tian J-H, Li Y-N, Pan J-X, Li G, Wei D, et al. Association between prospective registration and overall reporting and methodological quality of systematic reviews: a meta-epidemiological study. J Clin Epidemiol. 2018;93:45–55. pmid:29111471
  21. 21. Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.
  22. 22. Whiting P, Savović J, Higgins JPT, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34. pmid:26092286
  23. 23. Yavchitz A, Ravaud P, Altman DG, Moher D, Hrobjartsson A, Lasserson T, et al. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. J Clin Epidemiol. 2016;75:56–65. pmid:26845744
  24. 24. Atal I, Porcher R, Boutron I, Ravaud P. The statistical significance of meta-analyses is frequently fragile: definition of a fragility index for meta-analyses. J Clin Epidemiol. 2019;111:32–40. pmid:30940600
  25. 25. Grimes DR. The ellipse of insignificance, a refined fragility index for ascertaining robustness of results in dichotomous outcome trials. Boonstra P, Zaidi M, Boonstra P, Jiang F, editors. eLife. 2022;11:e79573.
  26. 26. Grimes DR. Region of attainable redaction, an extension of ellipse of insignificance analysis for gauging impacts of data redaction in dichotomous outcome trials. eLife. 2024;13:e93050.
  27. 27. Grimes DR, Heathers J. The new normal? Redaction bias in biomedical science. Royal Society Open Science. 2021;8(12):211308.
  28. 28. Brown NJL, Heathers JAJ. The GRIM test: a simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychol Personality Science. 2017;8(4):363–9.
  29. 29. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q. 2004;82(4):661–87. pmid:15595946
  30. 30. Siegel JS, Zhong J, Tomioka S, Ogirala A, Faraone SV, Szabo ST. Estimating heterogeneity of treatment effect in psychiatric clinical trials. medRxiv. 2024.
  31. 31. Gagnier JJ, Moher D, Boon H, Beyene J, Bombardier C. Investigating clinical heterogeneity in systematic reviews: a methodologic review of guidance in the literature. BMC Med Res Methodol. 2012;12:111. pmid:22846171
  32. 32. Barbateskovic M, Koster TM, Eck RJ, Maagaard M, Afshari A, Blokzijl F, et al. A new tool to assess Clinical Diversity In Meta-analyses (CDIM) of interventions. J Clin Epidemiol. 2021;135:29–41. pmid:33561529
  33. 33. Chess LE, Gagnier JJ. Applicable or non-applicable: investigations of clinical heterogeneity in systematic reviews. BMC Med Res Methodol. 2016;16:19. pmid:26883215
  34. 34. Uttley L, Quintana DS, Montgomery P, Carroll C, Page MJ, Falzon L, et al. The problems with systematic reviews: a living systematic review. J Clin Epidemiol. 2023;156:30–41. pmid:36796736
  35. 35. Uttley L, Weng Y, Falzon L. Yet another problem with systematic reviews: a living review update. J Clin Epidemiol. 2025;177.
  36. 36. Storman M, Storman D, Jasinska KW, Swierz MJ, Bala MM. The quality of systematic reviews/meta-analyses published in the field of bariatrics: a cross-sectional systematic survey using AMSTAR 2 and ROBIS. Obes Rev. 2020;21(5):e12994. pmid:31997545
  37. 37. Ou SL, Luo J, Wei H, Qin XL, Du SY, Wang S. Safety and efficacy of programmed cell death 1 and programmed death ligand-1 inhibitors in the treatment of cancer: an overview of systematic reviews. Front Immunol. 2022;13:953761.
  38. 38. Pereira A, Martins C, Campos J, Faria S, Notaro S, Poklepović-Peričić T. Critical appraisal of systematic reviews of intervention studies in periodontology using AMSTAR 2 and ROBIS tools. J Clin Exp Dent. 2023:e678–94.
  39. 39. Ferri N, Ravizzotti E, Bracci A, Carreras G, Pillastrini P, Di Bari M. The confidence in the results of physiotherapy systematic reviews in the musculoskeletal field is not increasing over time: a meta-epidemiological study using AMSTAR 2 tool. J Clin Epidemiol. 2024;169:111303. pmid:38402999
  40. 40. Rotta I, Diniz JA, Fernandez-Llimos F. Assessing methodological quality of systematic reviews with meta-analysis about clinical pharmacy services: a sensitivity analysis of AMSTAR-2. Res Social Adm Pharm. 2025;21(2):110–5. pmid:39643474
  41. 41. Karakasis P, Bougioukas KI, Pamporis K, Fragakis N, Haidich A-B. Appraisal methods and outcomes of AMSTAR 2 assessments in overviews of systematic reviews of interventions in the cardiovascular field: a methodological study. Res Synth Methods. 2024;15(2):213–26. pmid:37956538
  42. 42. Rainkie DC, Abedini ZS, Abdelkader NN. Reporting and methodological quality of systematic reviews and meta-analysis with protocols in Diabetes Mellitus Type II: a systematic review. PLoS One. 2020;15(12):e0243091. pmid:33326429
  43. 43. De Santis KK, Lorenz RC, Lakeberg M, Matthias K. The application of AMSTAR2 in 32 overviews of systematic reviews of interventions for mental and behavioural disorders: a cross-sectional study. Res Synth Methods. 2022;13(4):424–33. pmid:34664766
  44. 44. Chung VCH, Wu XY, Feng Y, Ho RST, Wong SYS, Threapleton D. Methodological quality of systematic reviews on treatments for depression: a cross-sectional study. Epidemiol Psychiatr Sci. 2018;27(6):619–27. pmid:28462754
  45. 45. Desaunay P, Eude L-G, Dreyfus M, Alexandre C, Fedrizzi S, Alexandre J, et al. Benefits and risks of antidepressant drugs during pregnancy: a systematic review of meta-analyses. Paediatr Drugs. 2023;25(3):247–65. pmid:36853497
  46. 46. Matthias K, Rissling O, Pieper D, Morche J, Nocon M, Jacobs A. The methodological quality of systematic reviews on the treatment of adult major depression needs improvement according to AMSTAR 2: a cross-sectional study. Heliyon. 2020;6(9):e04776.
  47. 47. Health Quality Ontario. Internet-delivered cognitive behavioural therapy for major depression and anxiety disorders: a health technology assessment. Ont Health Technol Assess Ser. 2019;19(6):1–199. pmid:30873251
  48. 48. Ribeiro ELA, de Mendonça Lima T, Vieira MEB, Storpirtis S, Aguiar PM. Efficacy and safety of aripiprazole for the treatment of schizophrenia: an overview of systematic reviews. Eur J Clin Pharmacol. 2018;74(10):1215–33. pmid:29905899
  49. 49. Mangolini VI, Andrade LH, Lotufo-Neto F, Wang Y-P. Treatment of anxiety disorders in clinical practice: a critical overview of recent systematic evidence. Clinics (Sao Paulo). 2019;74:e1316. pmid:31721908
  50. 50. Stoll M, Mancini A, Hubenschmid L, Dreimüller N, König J, Cuijpers P, et al. Discrepancies from registered protocols and spin occurred frequently in randomized psychotherapy trials-A meta-epidemiologic study. J Clin Epidemiol. 2020;128:49–56. pmid:32828837
  51. 51. Narum S. Antidepressiva til ungdom - en kritisk analyse av bivirkningsbeskrivelser og nytte-risikovurderinger i en RCT En dokumentanalyse. 2018. https://www.duo.uio.no/handle/10852/67187
  52. 52. Gan DZQ, McGillivray L, Han J, Christensen H, Torok M. Effect of engagement with digital interventions on mental health outcomes: a systematic review and meta-analysis. Front Digit Health. 2021;3:764079. pmid:34806079
  53. 53. Gong X, Fenech B, Blackmore C, Chen Y, Rodgers G, Gulliver J. Association between noise annoyance and mental health outcomes: a systematic review and meta-analysis. Int J Environ Res Public Health. 2022;19(5):2696.
  54. 54. Brini S, Brudasca NI, Hodkinson A, Kaluzinska K, Wach A, Storman D. Efficacy and safety of transcranial magnetic stimulation for treating major depressive disorder: An umbrella review and re-analysis of published meta-analyses of randomised controlled trials. Clin Psychol Rev. 2023;100:102236.
  55. 55. Siemens W, Meerpohl JJ, Rohe MS, Buroh S, Schwarzer G, Becker G. Reevaluation of statistically significant meta-analyses in advanced cancer patients using the Hartung-Knapp method and prediction intervals-A methodological study. Res Synth Methods. 2022;13(3):330–41. pmid:34932271
  56. 56. Faggion CM Jr, Atieh MA, Tsagris M, Seehra J, Pandis N. A case study evaluating the effect of clustering, publication bias, and heterogeneity on the meta-analysis estimates in implant dentistry. Eur J Oral Sci. 2024;132(1):e12962. pmid:38030576
  57. 57. Holek M, Bdair F, Khan M, Walsh M, Devereaux PJ, Walter SD, et al. Fragility of clinical trials across research fields: a synthesis of methodological reviews. Contemp Clin Trials. 2020;97:106151. pmid:32942056
  58. 58. Mun KT, Bonomo JB, Liebeskind DS, Saver JL. Fragility index meta-analysis of randomized controlled trials shows highly robust evidential strength for benefit of <3 hour intravenous alteplase. Stroke. 2022;53(6):2069–74.
  59. 59. Anand S, Kainth D. Fragility index of recently published meta-analyses in pediatric urology: a striking observation. Cureus. 2021;13(7):e16225.
  60. 60. Schröder A, Muensterer OJ, Oetzmann von Sochaczewski C. Meta-analyses in paediatric surgery are often fragile: implications and consequences. Pediatr Surg Int. 2021;37(3):363–7. pmid:33454848
  61. 61. Won J. Robustness of meta-analysis results in Cochrane systematic reviews: a case for acupuncture trials. Integr Med Res. 2022;11(4):100890. pmid:36338607
  62. 62. Sorigue M, Kuittinen O. Robustness and pragmatism of the evidence supporting the European Society for Medical Oncology guidelines for the diagnosis, treatment, and follow-up of follicular lymphoma. Expert Rev Hematol. 2021;14(7):655–68.
  63. 63. Huang X, Chen B, Thabane L, Adachi JD, Li G. Fragility of results from randomized controlled trials supporting the guidelines for the treatment of osteoporosis: a retrospective analysis. Osteoporos Int. 2021;32(9):1713–23. pmid:33595680
  64. 64. Tignanelli CJ, Napolitano LM. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA Surg. 2019;154(1):74–9. pmid:30422256
  65. 65. Dey S, Saikia P, Choupoo NS, Das SK. How robust are the evidences that formulate surviving sepsis guidelines? An analysis of fragility and reverse fragility of randomized controlled trials that were referred in these guidelines. Indian J Crit Care Med. 2021;25(7):773–9.
  66. 66. Otalora-Esteban M, Delgado-Ramirez MB, Gil F, Thabane L. Assessing the fragility index of randomized controlled trials supporting perioperative care guidelines: a methodological survey protocol. PLoS One. 2024;19(9):e0310092. pmid:39264894
  67. 67. Gaudino M, Hameed I, Biondi-Zoccai G, Tam DY, Gerry S, Rahouma M, et al. Systematic evaluation of the robustness of the evidence supporting current guidelines on myocardial revascularization using the fragility index. Circ Cardiovasc Qual Outcomes. 2019;12(12):e006017. pmid:31822120
  68. 68. IntHout J, Ioannidis JPA, Rovers MM, Goeman JJ. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open. 2016;6(7):e010247. pmid:27406637
  69. 69. Steele RG, McGuire AB, Kingston N. The meta-analysis application worksheet: a practical guide for the application of meta-analyses to clinical cases. Prof Psychol Res Pr. 2024;55(5):405–16. pmid:39619795
  70. 70. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
  71. 71. Husain-Krautter S, Ellison JM. Late life depression: the essentials and the essential distinctions. Focus (Am Psychiatr Publ). 2021;19(3):282–93. pmid:34690594
  72. 72. Mitchell B, Martin N, Medland SE. Genetic and environmental predictors of treatment resistant depression. European Neuropsychopharmacol. 2024;87:25–6.
  73. 73. Mullen S. Major depressive disorder in children and adolescents. Mental Health Clin. 2018;8(6):275–83.
  74. 74. Field T. Prenatal depression risk factors, developmental effects and interventions: a review. J Preg Child Health. 2017;04(01).
  75. 75. Goodwin GM. Depression and associated physical diseases and symptoms. Dialogues Clin Neurosci. 2006;8(2):259–65. pmid:16889110
  76. 76. Chapter 4: Searching for and selecting studies [Internet]. [cited 2025 April 24]. https://training.cochrane.org/handbook/current/chapter-04.
  77. 77. Avau B, Van Remoortel H, De Buck E. Translation and validation of PubMed and Embase search filters for identification of systematic reviews, intervention studies, and observational studies in the field of first aid. J Med Libr Assoc. 2021;109(4).
  78. 78. Salvador-Oliván JA, Marco-Cuenca G, Arquero-Avilés R. Development of an efficient search filter to retrieve systematic reviews from PubMed. J Med Libr Assoc. 2021;109(4).
  79. 79. Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Lancet. 2018;391(10128):1357–66. pmid:29477251
  80. 80. Bramer WM, Giustini D, De Jonge GB, Holland L, Bekhuis T. De-duplication of database search results for systematic reviews in EndNote. J Med Libr Assoc. 2016;104(3).
  81. 81. Helfer B, Leonardi-Bee J, Mundell A, Parr C, Ierodiakonou D, Garcia-Larsen V, et al. Conduct and reporting of formula milk trials: systematic review. BMJ. 2021;375:n2202. pmid:34645600
  82. 82. Helfer B, Prosser A, Samara MT, Geddes JR, Cipriani A, Davis JM. Recent meta-analyses neglect previous systematic reviews and meta-analyses about the same topic: a systematic examination. BMC Med. 2015;13:82.
  83. 83. Lunny C, Higgins JPT, White IR, Dias S, Hutton B, Wright JM, et al. Risk of Bias in Network Meta-Analysis (RoB NMA) tool. BMJ. 2025;388:e079839.
  84. 84. Bero L, Oostvogel F, Bacchetti P, Lee K. Factors associated with findings of published trials of drug-drug comparisons: why some statins appear more efficacious than others. PLoS Med. 2007;4(6):e184. pmid:17550302
  85. 85. Borenstein M. Avoiding common mistakes in meta-analysis: understanding the distinct roles of Q, I-squared, tau-squared, and the prediction interval in reporting heterogeneity. Res Synth Methods. 2024;15(2):354–68. pmid:37940120
  86. 86. IntHout J, Ioannidis JPA, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14:25. pmid:24548571
  87. 87. Wang Z, Alzuabi MA, Morgan RL, Mustafa RA, Falck-Ytter Y, Dahm P. Different meta-analysis methods can change judgements about imprecision of effect estimates: a meta-epidemiological study. BMJ Evid Based Med. 2023;28(2):126–32.
  88. 88. Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019;22(4):153–60. pmid:31563865
  89. 89. Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Soft. 2010;36(3).
  90. 90. Nikolakopoulou A, Higgins JPT, Papakonstantinou T, Chaimani A, Del Giovane C, Egger M, et al. CINeMA: an approach for assessing confidence in the results of a network meta-analysis. PLoS Med. 2020;17(4):e1003082. pmid:32243458
  91. 91. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650).
  92. 92. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 2009;6(7):e1000097.
  93. 93. Turner EH. Publication bias, with a focus on psychiatry: causes and solutions. CNS Drugs. 2013;27(6):457–68. pmid:23696308
  94. 94. Driessen E, Hollon SD, Bockting CLH, Cuijpers P, Turner EH. Does publication bias inflate the apparent efficacy of psychological treatment for major depressive disorder? A systematic review and meta-analysis of US National Institutes of Health-Funded Trials. PLoS One. 2015;10(9):e0137864. pmid:26422604
  95. 95. van Aert RCM, Wicherts JM, van Assen MALM. Publication bias examined in meta-analyses from psychology and medicine: a meta-meta-analysis. PLoS One. 2019;14(4):e0215052. pmid:30978228