Publication bias jeopardizes evidence-based medicine, mainly through biased literature syntheses. Publication bias may also affect laboratory animal research, but evidence is scarce.
To assess the opinion of laboratory animal researchers on the magnitude, drivers, consequences and potential solutions for publication bias. And to explore the impact of size of the animals used, seniority of the respondent, working in a for-profit organization and type of research (fundamental, pre-clinical, or both) on those opinions.
Main Outcome Measure(s)
Median (interquartile ranges) strengths of beliefs on 5 and 10-point scales (1: totally unimportant to 5 or 10: extremely important).
Overall, 454 researchers participated. They considered publication bias a problem in animal research (7 (5 to 8)) and thought that about 50% (32–70) of animal experiments are published. Employees (n = 21) of for-profit organizations estimated that 10% (5 to 50) are published. Lack of statistical significance (4 (4 to 5)), technical problems (4 (3 to 4)), supervisors (4 (3 to 5)) and peer reviewers (4 (3 to 5)) were considered important reasons for non-publication (all on 5-point scales). Respondents thought that mandatory publication of study protocols and results, or the reasons why no results were obtained, may increase scientific progress but expected increased bureaucracy. These opinions did not depend on size of the animal used, seniority of the respondent or type of research.
Non-publication of “negative” results appears to be prevalent in laboratory animal research. If statistical significance is indeed a main driver of publication, the collective literature on animal experimentation will be biased. This will impede the performance of valid literature syntheses. Effective, yet efficient systems should be explored to counteract selective reporting of laboratory animal research.
Citation: ter Riet G, Korevaar DA, Leenaars M, Sterk PJ, Van Noorden CJF, Bouter LM, et al. (2012) Publication Bias in Laboratory Animal Research: A Survey on Magnitude, Drivers, Consequences and Potential Solutions. PLoS ONE 7(9): e43404. doi:10.1371/journal.pone.0043404
Editor: Erik von Elm, IUMSP, University Hospital Lausanne, Switzerland.
Received: March 7, 2012; Accepted: July 19, 2012; Published: September 5, 2012
Copyright: © ter Riet et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors have no support or funding to report.
Competing interests: The authors have declared that no competing interests exist.
Publication bias jeopardizes evidence-based medicine through biased literature syntheses of clinical studies. ,  It is conceivable that non-publication practices affect laboratory animal research too.– In particular, non-reporting of “negative” research findings may hamper progress in laboratory animal research (LAR) through unnecessary duplications of experiments and may lead to premature first-in-man studies. Data on the extent of non-publication in LAR is scarce.– Historically, the outlook on publishing may be different between clinical and laboratory animal research. For example, in his book ‘Introduction à l'étude de la medicine experimentale’, the founding father of experimental physiology, Claude Bernard, argued that “[.] in physiology we must never make average descriptions of experiments because the true relations of phenomena disappear in the average; [.] we must [.] present our most perfect experiment as a type”.  More recently, Lemon and Dunnett, arguing against the use of systematic reviews for LAR, wrote that “no mechanism exists for so called negative results to be published. [.]. This is not just an issue of publication bias [.]. Scientific experiments are designed to test for evidence in favour of a particular experimental hypothesis and to abandon it if insufficient evidence is acquired.”  Against this background, we assessed laboratory animal researchers’ opinions about magnitude, drivers, consequences and potential solutions for publication bias in The Netherlands (575,278 animals used in experiments in 2010) . We explored the impact of animal size, researcher seniority, working in a for-profit organization and type of research on those opinions.
We approached respondents via a two-step procedure: (i) the Dutch professional association of animal welfare officers, and (ii) all animal welfare officers. In March 2011, we sent a standard letter of invitation to participate by email to all animal welfare officers in the Netherlands via one liaison person (ML) who had access to them through their professional association. Since the animal welfare officers have address lists of all researchers that (had) performed LAR in their institutes, they were asked to send the invitation letter to all their animal researchers. The invitation letters () contained the link to the internet-based survey and explained that confidentiality was guaranteed for the respondent and their institute. In the months prior to the survey’s launch, two authors (GtR and LH) had secured informal commitment to the survey from animal welfare officers in ten institutes.
The survey (Appendix S2) addressed five background features: (i) field of expertise, (ii) affiliation, (iii) size of the animals used [small (e.g. birds, rodents, fish, amphibians, reptiles) versus large (otherwise) or both], (iv) seniority of the respondent, as expressed by the number of (co-)authored publications, and (v) the type of the research of the respondent, where we defined pre-clinical research as LAR to investigate if a drug, procedure or treatment may have an effect in humans; other research was deemed basic. These background variables were also used in one-way stratified analyses to assess if results varied by these variables. The estimates for the publication rates were analyzed using bootstrapped quantile regression (200 repetitions) to adjust for the simultaneous effects of the four stratification variables. Items were scored on 5 or 10 point scales, in which 1 indicated totally unimportant and 5 or 10 extremely important.
We used Surveymonkey software (www.surveymonkey.com) and STATA software (version 10.1). Mann-Whitney and Kruskal-Wallis tests (in combination with Tukey’s post-hoc test) were used to assess statistical significance (alpha = 0.05). We applied the Bonferroni correction to adjust for multiple testing.
Through the Dutch professional association of animal welfare officers, all animal welfare officers (n = 39) received the invitation letter and an internet link for the survey. We estimate that between 2,000 and 3,500 laboratory animal researchers received the invitation to participate. Between 17 March and 30 July 2011, 474 (between 14–24%) laboratory animal researchers returned the survey. Fifty-one respondents did not fill in the survey completely. Of these, we excluded 20 because of absence of (at least) their background data. This left 454 participants for the analysis. The variation in the number of respondents across table 1 is caused by the other 31 respondents.
Table 1 shows the main results. Table S1 shows the results stratified by four background variables. On average, those working in not-for-profit institutes (n = 421) estimated that 50 percent (interquartile range (IQR) 35 to 70) of all conducted laboratory animal experiments are published. Researchers in a for-profit environment (n = 21) estimated that only 10 percent (5 to 50) is published. Researchers in not-for-profit institutes reported that 80 percent (60 to 90) of their own work had been published against 10 percent (5 to 39) of the work of researchers in a for-profit environment. Respondents working only with large animals thought that their own work was published in 90 percent of cases (79 to 100) against 75 percent (50 to 90) for those working with small animals only. Results from the multivariable analyses change these results only slightly (Tables S2 and S3). In particular, respondents who had co-authored more than 5 papers estimated publication rates 10 percentage points higher than respondents who had published less (95% CI from 0.8 to 19.1). Respondents working with large animals only estimated the publication rate of work they had been involved in personally 10 percentage points higher (95% CI from 1.1 to 18.9). Statistical non-significance and technical problems are considered to be the main drivers for non-publication. Supervisors, editors, and reviewers were considered responsible for non-publication. As expected, funders were considered more important in a for-profit environment (2 versus 4). Overall, respondents considered publication bias an important problem for LAR (7 (IQR 5–8)) and for research duplication, literature syntheses and well-timed initiation of phase-1clinical trials in humans in particular. Table 1 shows that respondents thought mandatory publication of study protocols or results may help avoid unnecessary duplication, increase validity of literature syntheses and scientific progress, but at the cost of increased bureaucracy. These opinions did not depend on size of the animal used, seniority of the respondent or type of research.
Publication bias is an important problem in laboratory animal research (LAR) according to laboratory animal researchers. We estimate that only fifty percent of LAR is published, but it may be far less in for-profit organizations given that their employees estimated that only ten percent of LAR gets published overall, including their own. Lack of statistical significance, technical problems, the opinions of supervisors and peer reviewers were considered important drivers of non-publication. Respondents thought that mandatory publication of study protocols, research results or the reasons why results could not be obtained may accelerate scientific progress.
To our knowledge, this is the first survey among laboratory animal researchers focusing on publication bias. This survey has several limitations. First, we estimate the response rate to this survey to be between 14 and 24 percent. We do not know to which extent the results are representative for the Dutch LAR community, let alone for the wider LAR community. The number of laboratory animal researchers in The Netherlands is unknown. We were unable to obtain exact information from the institutes on to the number of E-mail addresses to which the survey had been sent. Another difficulty is that such address lists may not always be fully up to date. In particular, researchers who retire or change jobs may be listed in error. Second, the survey was restricted to one country. Third, only few researchers in for-profit organizations participated. Fourth, our results are reminiscent of the joke about surveys on driving ability in which 90% of respondents think that they belong to the group of people whose driving abilities are above-average. Likewise, it seems somewhat paradoxical that our respondents estimate the publication rate of their own work as much higher (in theory, they could have calculated it) than the overall rate. Another explanation may be that the 50% rate mentioned in the introduction to the survey acted as an anchor that made respondents estimate the overall rates as too low. That would imply that a non-publication rate of 20% is closer to the truth. This issue is related to the next one. Fifth, our study investigated researchers’ opinions, which may not reflect the true rate(s) of non-publication. Sixth, due to the large number of statistical significance tests (n = 121), application of the Bonferroni correction for multiple testing (at alpha = 0.05) implies that only p-values below 0.0004 should be considered statistically significant (see also the legend to Table S1). The assessment of the effects of the four stratification variables should be considered explorative. Seventh, we were unable to assess the impact of scientific sub-discipline on the results since the free text field (survey item A.1, Appendix S2) yielded imprecise data with large variation.
Data on non-publication rates in LAR are scarce. Sena et al, using the statistical “trim and fill” technique on a large number of animal experiments on acute ischemic stroke, estimated the non-publication rate to be 13.6 percent which was associated with a 30% overstatement of efficacy.  Evidence from clinical research on humans suggests that between 46 and 67 percent of studies are not published –, and that in those published, positive findings are over-emphasized. ,  The emergence of trial registration, and the joint statement of the International Committee of Medical Journal Editors on publication of randomized trials being conditional on a trial having a public trial registration number may have reduced these numbers.  We agree with Sena et al who argued that “non-publication is unethical since it deprives researchers of the accurate data they need to estimate the potential of novel therapies in clinical trials, but also because the included animals are wasted because they do not contribute to accumulating knowledge. In addition, research syntheses that overstate effects may lead to further unnecessary animal experiments testing poorly founded hypotheses.” .
Measures against the suppression of “negative” results can be categorized from the source, via upstream to more downstream measures. Since, in The Netherlands, all experiments must pass a Institutional Animal Care and Use Committee (IACUC) for ethics approval, IACUCs may play a crucial role in the registration of all LAR and prevention of publication bias. A system ensuring periodic follow-up of each experiment’s fate would reinforce such registration. It may be challenging to build a watertight system that simultaneously minimizes bureaucracy. Application of modern information technology may be crucial. One option to prevent that study results have an effect on the editorial decision is to initially submit manuscripts without any results.  Editors and peer reviewers would judge the importance of submissions through the background, hypotheses and methods sections. This would ensure that acceptance is not conditional on the results. More downstream measures include special journals, journal sections or repositories for “negative” results, such as the Journal of Negative Results in Biomedicine, The All Results Journals and Negative Results in Gynecological Oncology. , ,  In addition, two journals, the Journal of Cerebral Blood Flow and Metabolism and Neurobiology of Aging, feature Negative Results sections with a very similar flavor. ,  The Journal of Cerebral Blood Flow and Metabolism describes this section as follows: “Negative Results is intended to provide a forum for data that did not substantiate [.] a difference between the experimental groups, and/or did not reproduce published findings. Since the net effect of a Negative Result is to discourage repetition, the standards for acceptance as a Negative Result will be highly demanding. Typically, Type II error considerations are mandatory.” .
What are the implications for further research? As we have learnt from randomized trials in humans, follow-up of cohorts of study protocols may help us understand the magnitude and the causes of publication bias in LAR, which in turn may affect the research community’s motivation to deal with it. In the meantime, more research into statistical correction of publication bias seems useful.– Specifically, the comparison of various methods to deal with publication bias statistically, such as the trim and fill , regression-based methods , , , and capture-recapture  may be compared in simulation studies to assess their strengths and weaknesses in various situations.
Statistically significant differences between medians are highlighted with an asterisk (*). An asterisk in the column that corresponds to the first level of one of the four variables (affiliation, size of animals, number of papers published, focus of experiments), refers to the difference between the medians of the first and second level. An asterisk in the second column refers to the difference between the second and third level. An asterisk in the third column refers to the difference between the first and the third level.
All numbers are medians after bootstrapping the analysis 200 times. † Bootstrapped quantile regression on the median, simultaneously adjusting for all four stratification variables, that were modeled as dummy variables. CI denotes confidence interval. The intercept of the fully adjusted model, that is, the estimate for the median proportion of papers published of not-for profit researchers working with small animals, having co-authored between 0 and 5 papers, and working on both fundamental and pre-clinical topics was 50 percent (95% 43.5–56.5).
All numbers are medians after bootstrapping the analysis 200 times. † Bootstrapped quantile regression on the median, simultaneously adjusting for all four stratification variables, that were modeled as dummy variables. CI denotes confidence interval. The intercept of the fully adjusted model, that is, the estimate for the median proportion of papers published of not-for profit researchers working with small animals, having co-authored between 6 and 20 papers, and working on both fundamental and pre-clinical topics was 80 percent (95% 73.9–86.1). § The group that (co-)authored 0–5 studies was excluded from this row because very junior investigators very often had either zero or 100 percent of their papers published.
We thank the Dutch Professional Association of Animal Welfare Officers and its members for their willingness to cooperate and their support for this survey.
Conceived and designed the experiments: GtR DAK LH ML. Performed the experiments: GtR DK ML. Analyzed the data: DK GtR LH. Wrote the paper: GtR DK LH. Critical revision of the manuscript: CJFVN ML PJS LMB RL RPOE. Contributed important intellectual content: CJFVN ML PJS LMB RL RPOE.
- 1. Vedula SS, Bero L, Scherer RW, Dickersin K (2009) Outcome reporting in industry-sponsored trials of gabapentin for off-label use. N Engl J Med 361: 1963–1971. 361/20/1963 [pii];10.1056/NEJMsa0906126 [doi].
- 2. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R (2008) Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 358: 252–260. 358/3/252 [pii];10.1056/NEJMsa065779 [doi].
- 3. Hackam DG, Redelmeier DA (2006) Translation of research evidence from animals to humans. JAMA 296: 1731–1732. 296/14/1731 [pii];10.1001/jama.296.14.1731 [doi].
- 4. Knight J (2003) Negative results: Null and void. Nature 422: 554–555. 10.1038/422554a [doi];422554a [pii].
- 5. Macleod MR, O’Collins T, Howells DW, Donnan GA (2004) Pooling of animal experimental data reveals influence of study design and publication bias.[see comment]. Stroke 35: 1203–1208.
- 6. Sena ES, van der Worp HB, Bath PM, Howells DW, Macleod MR (2010) Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol 8: e1000344. 10.1371/journal.pbio.1000344 [doi].
- 7. Mignini LE, Khan KS (2006) Methodological quality of systematic reviews of animal studies: a survey of reviews of basic research. BMC Med Res Methodol 6: 10.
- 8. Bracken MB (2009) Why animal studies are often poor predictors of human reactions to exposure. J R Soc Med 102: 120–122. 102/3/120 [pii];10.1258/jrsm.2008.08k033 [doi].
- 9. Bracken MB (2009) Why are so many epidemiology associations inflated or wrong? Does poorly conducted animal research suggest implausible hypotheses? Ann Epidemiol 19: 220–224. S1047-2797(08)00359-1 [pii];10.1016/j.annepidem.2008.11.006 [doi].
- 10. Korevaar DA, Hooft L, Ter Riet G (2011) Systematic reviews and meta-analyses of preclinical studies: publication bias in laboratory animal experiments. Lab Anim 45: 225–230. la.2011.010121 [pii];10.1258/la.2011.010121 [doi].
- 11. Bernard C (1865) Introduction à l’étude de la medicine experimentale.
- 12. Lemon R, Dunnett SB (2005) Surveying the literature from animal experiments. BMJ 330: 977–978. 330/7498/977 [pii];10.1136/bmj.330.7498.977 [doi].
- 13. Ministerie van Economische Zaken Landbouw en Innovatie (2010) Rapport nVWA: Zo doende 2010. Jaaroverzicht over dierproeven en proefdieren.
- 14. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR (1991) Publication bias in clinical research. Lancet 337: 867–872. 0140-6736(91)90201-Y [pii].
- 15. Decullier E, Lheritier V, Chapuis F (2005) Fate of biomedical research protocols and publication bias in France: retrospective cohort study. BMJ 331: 19. bmj.38488.385995.8F [pii];10.1136/bmj.38488.385995.8F [doi].
- 16. de Jong JP, Ter Riet G, Willems DL (2010) Two prognostic indicators of the publication rate of clinical studies were available during ethical review. J Clin Epidemiol 63: 1342–1350. S0895-4356(10)00099-5 [pii];10.1016/j.jclinepi.2010.01.018 [doi].
- 17. Rothstein HR, Sutton AJ, Borenstein M (2005) Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments. New York: John Wiley & Sons.
- 18. Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P (2009) Comparison of registered and published primary outcomes in randomized controlled trials. JAMA 302: 977–984. 302/9/977 [pii];10.1001/jama.2009.1242 [doi].
- 19. Boutron I, Dutton S, Ravaud P, Altman DG (2010) Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 303: 2058–2064. 303/20/2058 [pii];10.1001/jama.2010.651 [doi].
- 20. De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, et al.. (2004) Clinical trial registration: a statement from the International Committee of Medical Journal Editors. Ann Intern Med 141: 477–478. 0000605-200409210-00109 [pii].
- 21. Greenland S (2007) Commentary: on ‘quality in epidemiological research: should we be submitting papers before we have the results and submitting more hypothesis generating research?’. Int J Epidemiol 36: 944–945. 36/5/944 [pii];10.1093/ije/dym174 [doi].
- 22. The All Results Journals website. Available: http://www.arjournals.com/ojs/. Accessed 2012 August 8.
- 23. Journal of Negative Results in Biomedicine website. Available: http://www.jnrbm.com/. Accessed 2012 August 8.
- 24. Journal of Cerebral Blood Flow & Metabolism website. Available: http://mc.manuscriptcentral.com/societyimages/jcbfm/JCBFM%20Guide%20to%20Authors.pdf. Accessed 2012 August 8.
- 25. Neurobiology of Aging website. Available: http://www.elsevier.com/wps/find/journaldescription.cws_home/525480/authorinstructions. Accessed 2012 August 8.
- 26. Goodman S (2008) A dirty dozen: twelve p-value misconceptions. Semin Hematol 45: 135–140. S0037-1963(08)00062-0 [pii];10.1053/j.seminhematol.2008.04.003 [doi].
- 27. Goodman SN (1999) Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med 130: 995–1004. 199906150-00008 [pii].
- 28. Goodman SN (1999) Toward evidence-based medical statistics. 2: The Bayes factor. Ann Intern Med 130: 1005–1013. 199906150-00009 [pii].
- 29. Egger M, Davey SG, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315: 629–634.
- 30. Moreno SG, Sutton AJ, Turner EH, Abrams KR, Cooper NJ, et al. (2009) Novel methods to deal with publication biases: secondary analysis of antidepressant trials in the FDA trial registry database and related journal publications. BMJ 339: b2981.
- 31. Bennett DA, Latham NK, Stretton C, Anderson CS (2004) Capture-recapture is a potentially useful method for assessing publication bias. J Clin Epidemiol 57: 349–357. 10.1016/j.jclinepi.2003.09.015 [doi];S0895435603003809 [pii].
- 32. Duval S, Tweedie R (2000) Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56: 455–463.
- 33. Moreno SG, Sutton AJ, Ades AE, Stanley TD, Abrams KR, et al.. (2009) Assessment of regression-based methods to adjust for publication bias through a comprehensive simulation study. BMC Med Res Methodol 9: 2. 1471-2288-9-2 [pii];10.1186/1471-2288-9-2 [doi].
- 34. Moreno SG, Sutton AJ, Thompson JR, Ades AE, Abrams KR, et al.. (2012) A generalized weighting regression-derived meta-analysis estimator robust to small-study effects and heterogeneity. Stat Med. 10.1002/sim.4488 [doi].