Publication Bias in Recent Meta-Analyses

Introduction Positive results have a greater chance of being published and outcomes that are statistically significant have a greater chance of being fully reported. One consequence of research underreporting is that it may influence the sample of studies that is available for a meta-analysis. Smaller studies are often characterized by larger effects in published meta-analyses, which can be possibly explained by publication bias. We investigated the association between the statistical significance of the results and the probability of being included in recent meta-analyses. Methods For meta-analyses of clinical trials, we defined the relative risk as the ratio of the probability of including statistically significant results favoring the treatment to the probability of including other results. For meta-analyses of other studies, we defined the relative risk as the ratio of the probability of including biologically plausible statistically significant results to the probability of including other results. We applied a Bayesian selection model for meta-analyses that included at least 30 studies and were published in four major general medical journals (BMJ, JAMA, Lancet, and PLOS Medicine) between 2008 and 2012. Results We identified 49 meta-analyses. The estimate of the relative risk was greater than one in 42 meta-analyses, greater than two in 16 meta-analyses, greater than three in eight meta-analyses, and greater than five in four meta-analyses. In 10 out of 28 meta-analyses of clinical trials, there was strong evidence that statistically significant results favoring the treatment were more likely to be included. In 4 out of 19 meta-analyses of observational studies, there was strong evidence that plausible statistically significant outcomes had a higher probability of being included. Conclusions Publication bias was present in a substantial proportion of large meta-analyses that were recently published in four major medical journals.


Introduction
When some study outcomes are more likely to be published than other, the literature that is available to doctors, scientists, and policy makers provides misleading information. The tendency to decide to publish a study based on its results has been long acknowledged as a major threat to the validity of conclusions from medical research [1,2]. During the past 25 years, the phenomenon of research underreporting has been extensively investigated. It is clear that statistically significant results supporting the hypothesis of the researcher often have a greater chance of being published and fully reported [3][4][5][6][7].
Meta-analysis, a statistical approach to estimate a parameter of interest based on multiple studies, plays an essential role in medical research. One consequence of research underreporting is that it influences the sample of studies that is available for a meta-analysis [8,9]. This causes a bias, unless the process of study selection is modeled correctly [10]. Such modeling requires strong assumptions about the nature of the publication bias, especially when the size of a meta-analysis is not very large and when robust techniques cannot be used [11][12][13]. As a result, when publication bias occurs, the validity of the meta-analysis is uncertain.
It is well-known that smaller studies are often characterized by larger effects in published meta-analyses [14][15][16]. Publication bias is one of the possible explanations of this phenomenon [17]. Although a meta-analysis is typically preceded by an investigation of the presence of publication bias, the standard detection methods are characterized by a low power [11,[18][19][20][21][22]. Therefore, the sample of included studies may be unrepresentative of the population of all conducted studies even when publication bias has not been detected. In this study, we investigated whether statistically significant outcomes that showed a positive effect of the treatment (in the case of clinical trials) and plausible statistically significant outcomes (in the case of observational studies and interventional studies) had a greater probability of being included in recent meta-analyses than other outcomes. We considered all meta-analyses of aggregate data that included at least 30 effect sizes from individual studies and were published between 2008 and 2012 in four major general medical journals: BMJ, JAMA, Lancet, and PLOS Medicine. We applied a Bayesian approach, which allows estimation of a parametric function that describes the selection process [23].

Methods
For meta-analyses of clinical trials, we a priori decided to estimate the ratio of the probability of including statistically significant results favoring the treatment to the probability of including other results. For other meta-analyses, we a priori decided to estimate the ratio of the probability of including plausible statistically significant results to the probability of including other results. The definition of plausibility was straightforward and a priori chosen. For meta-analyses of an association between a risk factor and an undesired outcome (disease, mortality, etc.), results showing a positive association were regarded as plausible. For meta-analyses of an association between the absence of a risk factor and an undesired outcome, results showing a negative association were regarded as plausible. In meta-analyses of associations between alcohol consumption and cardiovascular parameters, we estimated the ratio of the probability of including statistically significant results to the probability of including not statistically significant results because both a positive and negative association is biologically plausible [24,25]. A two-sided significance level of 0.05 was assumed. Further in the article, we refer to the ratio of probabilities as the relative risk (RR).

Identification of meta-analyses
We used PubMed to identify meta-analyses published between 2008 and 2012 in four general medical journals (BMJ, JAMA, Lancet, and PLOS Medicine). The term 'meta-analysis' was required to appear in the title or the abstract. Metaanalyses that combined at least 30 estimates from individual studies were considered. Only large meta-analyses were included because a substantial sample size is required to distinguish between the effects of heterogeneity and publication bias [11,26]. We considered only meta-analyses of aggregate data because a selection model based on p-values seemed to be inappropriate to study the complicated process of study selection in individual participant data meta-analyses.

Statistical model
For each identified meta-analysis, we applied a hierarchical selection model [23,27]. In this approach, a weight function is incorporated in a meta-analysis to model the probability that an individual study is included. Similarly as in the case of a standard random effects meta-analysis, the conditional distribution of the observed study effects Y i (i=1, …, N), given the true study effect α i and the within-study variance σ i 2 , is assumed to be normal: f(y i |α i ,σ i 2 )~N(α i ,σ i 2 ). The true study effect comes from a normal distribution: N(µ,τ 2 ), were µ is the mean effect size and τ 2 is the between-study variance. When a selection process is present some studies may fail to enter the meta-analysis sample. To model the process of study selection, the probability that any specific study enters the sample is assumed to be multiplied by a nonnegative weight function. In this case, the observed study effects X j that enter the meta-analysis sample (j=1, …, n, n≤N) have a weighted density: f w (x j |α j ,σ j 2 )=w(x j )f(x j |α j ,σ j 2 )/∫w(x)f(x|α j ,σ j 2 )dx, where w(x) is a weight function. We applied a step weight function that took two values: ϒ 1 for statistically significant results favoring the treatment (in the case of clinical trials) or plausible statistically significant results (in the case of other studies) and ϒ 2 for other results, so that the RR equaled ϒ 1 /ϒ 2 Maximum likelihood estimation is one possible approach to fit the model described above [27,28]. We used Bayesian inference because it produces valid results when the sample size is small [29], allows a straightforward interval estimation, and examination of the sensitivity of the findings to the distribution of the random effects. Similarly to Silliman [23], we used diffuse uniform priors U(0,1) for the parameters of the weight function. We declared a diffuse prior N(0,1000) for the mean effect size and, following a recommendation of Gelman [30], a uniform prior for the between-study standard deviation. We used Gibbs sampling [31] to obtain samples from the posterior distribution of ϒ 1 /ϒ 2 . We applied the algorithm described by Silliman, who considered a general class of hierarchical selection models [23]. In our specific case, the full conditional distributions needed by the Gibbs sampler were: wherec ω,α j,σ j,ϒ = ∫ w x f x α j ,σ j 2 dx, x= (x 1 , ..., x n ), σ= (σ 1 , ..., σ n ), α= (α 1 , ..., α n ), ϒ=(ϒ 1 ,ϒ 2 ), and s is the number of statistically significant results favoring the treatment (in the case of meta-analyses of clinical trials) or plausible statistically significant results (in the case of meta-analysis of other studies). A burn-in of 10000 iterations was sufficient to achieve convergence for all meta-analyses. The estimates were based on the subsequent 50000 iterations. For point estimation, we used the median of the posterior distribution of RR. For interval estimation, we used the 95% equal-tail credible interval (CI). When the posterior probability that the RR exceeded 1 was greater than 0.95, we concluded that there was strong evidence that the RR was greater than 1. An R program that was used to fit the models can be found in Appendix S1.

Quality of the statistical model
In order to evaluate whether the statistical model was suitable for the objectives of the study, we performed simulations. The settings were based on the characteristics of the meta-analyses of clinical trials included in the study (see Appendix S2). The posterior distribution provided reliable information about the size of the RR (Table 1). Although the estimate of the RR based on the median of the posterior distribution was characterized by a substantial variability and was biased in some scenarios, it gave a correct idea about the existence of a publication bias. The model tended to underestimate the RR when the mean effect size was small and overestimate the RR when the mean effect size was large. However, it was able to distinguish the RR from the mean effect size well for all simulation settings, as indicated by the quality of the interval estimates of the RR. For almost all scenarios, the lower bound of the 95% equal-tail credible interval was smaller than the assumed true value of the RR in at least 95% of the simulations. Although the upper bound was sometimes too small when the mean effect size was small, it was greater than the true RR in more than 95% of the simulations for most of the scenarios ( Table 1). The performance of the model did not depend on the size of the between-study heterogeneity. The model was robust to the presence of small study effects (i.e., larger true effects in smaller studies, Table 1).
Additionally, we compared the ability of our model to detect a selection process based on the statistical significance with publication bias methods widely used in medical research: the Egger's test [32], correlation test of Begg and Mazumdar [33], and the trim and fill method [34]. When statistically significant outcomes had a higher probability of being included, the Bayesian selection model showed much higher detection rates compared to the standard methods (Table 2). This difference was especially apparent when small study effects were absent. Furthermore, in contrast to the standard methods, the Bayesian selection model was characterized by low false positive rates, even in the presence of small-study effects ( Table 2).

Sensitivity analysis
In order to investigate the robustness of the findings, two alternative models were considered. Because the conclusions drawn from a hierarchical model may be sensitive to the choice of the prior for the variance of the random effects [35], in the first model, we replaced the uniform prior for τ with a 1/τ 2 prior for τ 2 (for an R program: see Appendix S1). In the second model, the assumption of a normal distribution of α i was relaxed by allowing it to follow a t-distribution. A prior U(2,100) was declared for the number of degrees of freedom. A Winbugs program that was used to fit this model can be found in Appendix S3.

Identification of meta-analyses
Out of 406 articles that were identified in the initial search, 88 articles did not report meta-analyses of an association. We excluded 280 articles because they did not describe a meta-analysis including at least 30 effect sizes. Further, 14 articles did not report any results from a meta-analysis of aggregate data. Finally, four articles were excluded because they did not report the effect sizes from the individual studies and the corresponding author did not respond to a request to provide them. Twenty reports including 49 meta-analyses were used in this study (Figure 1 and Figure 2, references: Appendix S4, raw data available at www.plosone.org).

RR in meta-analyses of clinical trials
We estimated the ratio of the probability of including statistically significant results favoring the treatment to the probability of including other results in 28 large meta-analyses of clinical trials that were described in nine articles published in BMJ, JAMA, Lancet, or PLOS Medicine from 2008 to 2012 (Figure 1). In 25 out of the 28 meta-analyses, the estimate of the RR was greater than 1. In 10 meta-analyses, there was strong evidence that statistically significant results favoring the treatment had a higher probability of being included than other outcomes (Figure 1). Trials that demonstrated the efficacy of adding lidocaine on prevention of pain on injection of propofol were estimated to be between 2.45 and 141 times more likely to be included in the meta-analysis than other trials ( Figure  3A). Studies favoring pretreatment with lidocaine were estimated to have a between 2.21 and 61.5-fold higher probability of being included in the meta-analysis than other studies ( Figure 3B). Changing the prior distribution for the between-study variance and the distribution of the true study effects had little effect on the estimates (Appendix S5).

RR in other meta-analyses
We identified 10 articles describing 19 meta-analyses of observational studies and one article describing two metaanalyses of interventional studies that were published in BMJ, JAMA, Lancet, or PLOS Medicine between 2008 and 2012. In four meta-analyses, there was strong evidence that plausible statistically significant results had a higher probability of being included than other outcomes (Figure 2). Studies that showed a statistically significant positive association between the C −reactive protein level and cardiovascular events were estimated to have a between 6.97 and 30.7-fold higher probability of being included in the meta-analysis than studies showing other outcomes ( Figure 4A). Statistically significant results showing a positive association were estimated to be between 1.92 and 18.1 times more likely to be included in the meta-analysis on the association between child physical abuse and depressive disorders ( Figure 4B). The results were robust  to the choice of the prior distribution for the between-study variance and the assumption about the distribution of the true study effects (Appendix S6).

Discussion
Clinical trials showing statistically significant results favoring the treatment and observational studies showing plausible statistically significant outcomes often had a higher probability of being included in the recent meta-analyses than studies showing other results. The magnitude of the publication bias differed greatly between the meta-analyses and was very large in some cases. For example, for a meta-analysis of the association between the C−reactive protein level and cardiovascular events, statistically significant outcomes showing a positive association were estimated to be between 6.97 and 30.7 times more likely to be included in the analyzed sample than other results.
The effect of the higher probability of including for statistically significant outcomes on the combined estimates is unknown due to a lack of information about the exact nature of the bias. However, it is clear that the fundamental assumption of a lack of systematic bias in the process of study selection was strongly violated. Consequently, the validity of a substantial proportion of the recent meta-analyses published in major general medical journals is uncertain due to the presence of a publication bias. Only in 3 [36][37][38] out of the 14 meta-analyses, in which we found evidence that statistically significant outcomes had a higher probability to be included, the presence of a publication bias was acknowledged in the article.
The study demonstrates an application of an attractive alternative to the standard publication bias detection methods for studying a selection process based on the statistical significance in large meta-analyses. Widely used publication bias methods such as the trim and fill method [34,39], Egger's test [32], rank correlation test [33], and their modifications [18,19,[40][41][42] are based on funnel plot asymmetry. These approaches have two major disadvantages. First, funnel plot asymmetry may be caused by processes other than publication bias, such as between-study variability or small study effects (i.e., larger true effects in smaller studies) [17,32]. As a result, these methods may incorrectly suggest that a publication bias is present [13,22,43,44]. Second, some selection processes introduce little asymmetry to the funnel plot. As a result, widely used publication bias tests often have a low power [18,20,22]. In contrast to these methods, selection models do not rely on RR: relative risk, the ratio of the probability of including statistically significant outcomes favoring the treatment to the probability of including other outcomes (for RR=1, all results had the same probability of being included), SSE: small study effect, I 2 : proportion of the total variability due to heterogeneity, µ: mean effect size. Proportion of meta-analyses, in which publication bias was identified, is presented. For the Bayesian selection model, publication bias was indicated when the posterior probability that the RR was larger than 1 exceeded 95%. For the Egger's test and the rank correlation test, one-sided procedures were used with a 0.05 significance level. For the trim and fill method, publication bias was indicated when the number of missing studies estimated by the R estimator in the first step of the algorithm was greater than 3 [34].
doi: 10.1371/journal.pone.0081823.t002 the funnel plot but incorporate a model for publication bias in the random effects meta-analysis. While standard approaches investigate the association between effect sizes and some measure of precision to draw conclusions about publication bias, selection models allow to directly estimate parameters that describe the selection process.
Several alternatives to the methods based on the funnel plot have been suggested. Iyengar and Greenhouse introduced selection models in meta-analysis [45]. Hedges proposed a class of selection models that incorporated between-study variance [27]. The advantage of these two frequentist methods compared to the class of Bayesian hierarchical selection models described by Silliman [23] is their computational simplicity. We chose the Bayesian approach because it produces valid conclusions when the sample size is small [29], allows a straightforward interval estimation and examination of the sensitivity of the findings to the distribution of the random effects. When a selection process based on the p-values is a point of interest but the weight function is difficult to specify a priori, non-parametric selection models can be used [46,47]. In this study, a parametric model was applied because the aim was to estimate a specific parametric function. Ioannidis and Trikalinos introduced a method based on a comparison of the number of expected and observed statistically significant results in a meta-analysis [48]. A major advantage of this approach is that it does not require a large sample size. An advantage of selection models compared to the method of Ioannidis and Trikalinos is that they allow to take betweenstudy variance into account.
Different selection mechanisms have been considered. Dear and Begg developed a selection model based on the assumption that the probability of publishing can be described with a step function with discontinuities at alternate observed pvalues [46]. Rufibach described a method that imposed a monotonicity constrained on this function [47]. The trim and fill method handles publication bias defined as the absence of studies with most extreme negative estimates [34,39]. The model of Copas and Shi assumes that the selection probability, given the size of the study, is an increasing function of the observed study effect [49]. Ioannidis and Trikalinos developed a test to investigate an excess of statistically significant findings [48]. We investigated whether statistically significant results favoring the treatment had a higher probability of being included in the meta-analyses of clinical trials. We focused on this selection process because its existence in the medical literature is well-documented by empirical studies following research from inception or submission to a regulatory authority [4,6]. In the case of other meta-analyses, we estimated the ratio of the probability of including biologically plausible statistically significant results to the probability of including other results. As demonstrated by the simulation study, our model performed well in detecting a selection process based on the statistical significance and direction of the effect. However, the power of the model to detect publication bias may be lower when a selection mechanism of a different nature occurs.
The main limitation of the study is that we focused on the largest meta-analyses. Possibly, the size of the association between the statistical significance of the results and the probability of including is different for small and medium metaanalyses than for the largest meta-analyses that we considered.
When publication bias is detected, an analyst can attempt to account for it [23,28,39,47,[49][50][51]. Although the methods to conduct meta-analysis in the presence of a publication bias provide a powerful sensitivity analysis tool, their validity depends on the correctness of strong and unverifiable assumptions [12]. In light of the studies on publication bias in medicine, including the one presented here, it is clear that the quality of evidence from medical research greatly benefits from policies that aim to reduce underreporting. Several measures that regulate clinical trials have been recently taken. Since 2005, the International Committee of Medical Journal Editors requires a prospective public registration of clinical trials as a condition for publication. Since 2007, the U.S. Food and Drug Administration has also required the registration of trial results. Similar initiatives are needed for observational studies in order to make a clear distinction between a predefined hypothesis testing and exploratory analysis [52]. A prospective registration of all study protocols including a detailed description of the data analysis, a requirement of consistency between the protocol and the study report, and an obligatory disclosure of the results are recommended to further improve the quality of medical literature.

Figure 2. Publication bias in meta-analyses of studies other than clinical trials.
The RR is the ratio of the probability of including plausible statistically significant results to the probability of including other results. The median of the posterior distribution was used for point estimation. The interval estimate is the 95% equal-tail credible interval. P(RR>1) is the posterior probability that plausible statistically significant results had a higher chance of being included in the meta-analysis than other results.  The higher the value of the density function, the more likely a given value of RR is in light of the prior knowledge (no prior knowledge was assumed) and the data from the meta-analysis. For both meta-analyses, there was much certainty that the RR was greater than 1, indicating that statistically significant results favoring the treatment had a greater probability of being included in the meta-analysis than other results.  The higher the value of the density function, the more likely a given value of RR is in light of the prior knowledge (no prior knowledge was assumed) and the data from the meta-analysis. For both meta-analyses, there was much certainty that the RR was greater than 1, indicating that plausible statistically significant results had a greater probability of being included in the meta-analysis than other results. doi: 10.1371/journal.pone.0081823.g004