The authors have declared that no competing interests exist.
We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results,
All forms of science communication, including traditional journal articles, involve transforming complicated, often messy data into a coherent narrative form. O’Boyle et al [
Forstmeier et al [
The widespread prevalence of Questionable Research Practices (QRPs) is now well documented in psychology [
QRPs refer to activities such as
John et al [
Ecology Journals | Evolution Journals |
---|---|
Trends in Ecology and Evolution | Evolutionary Application |
Ecology Letters | Evolution |
Annual Review of Ecology and Evolution | BMC Evolutionary Biology |
Frontiers in Ecology and the Environment | Evodevo |
Global Change Biology | American Naturalist |
Ecological Monographs | Journal of Evolutionary Biology |
Methods in Ecology and Evolution | Evolutionary Biology |
Journal of Ecology | Evolutionary Ecology |
Global Ecology and Biogeography | Behavioural Ecology |
ISME | |
Journal of Applied Ecology |
Publication bias in this context refers to a bias towards publishing statistically significant, ‘positive’ results and not publishing statistically non-significant (‘negative’ or null results). The bias exists in many sciences [
The intersection of increasing publication bias and a growing publish-or-perish culture in science may well impact the frequency with which researchers employ QRPs [
Simmons et al [
Publication bias in a publish-or-perish research culture incentivises researchers to engage in QRPs, which inflate the false positive rate leading to a less reproducible research literature. In this sense, QRP rates might be indicators of future reproducibility problems. Arguments about the difficulties in directly evaluating the reproducibility of the ecology and evolution literature have been made elsewhere (e.g., Schnitzer & Carson [
The specific aims of our research were to:
Survey ecology and evolution researchers’ own self-reported rate of QRP frequency
Survey ecology and evolution researchers’ estimated rate of QRP use in their field
Compare these rates to those found in other disciplines, particularly psychology, where serious reproducibility problems have been established
Explore, through researchers’ open-ended comments on each QRP in the survey, attitudes, (mis)understandings, pressures and contexts contributing to QRP use in the discipline
We collected the email addresses of corresponding authors from 11 ‘ecology’ and 9 ‘evolutionary biology’ journals (see
We extracted authors’ email addresses articles published in ecology journal (first 10 ecology journals listed in
Before we looked at the initial data, we decided to expand our sample to include evolutionary biology researchers, and add authors from articles from the Journal of Applied Ecology. We collated email addresses from authors of articles in evolutionary biology journal (
We deduplicated our list of email addresses before we sent each survey out to ensure that individual researchers did not receive our survey more than once. We ultimately emailed a total 5386 researchers with a link to our online survey which returned 807 responses (response rate = 15%).
Of the 807 responses, 71% (n = 573) were identified through our ‘ecology’ journal sample and 37% (n = 299) from our ‘evolution’ journal sample. This imbalance is a product of the number of journals in each sample and the order in which email addresses were collected and deduplicated; we first targeted ecology journals, and then decided to add a second group of evolution journals. Recognising that journal classification is only an approximate guide to disciplinary status, we asked researchers to self-identify their discipline; 411 researchers completed this question. Based on this information we made some adjustments to disciplinary classification as follows. First, we classified responses associated with sub-disciplines including the following terms as being made by evolution researchers: ‘evolut*’, ‘behav*’, ‘reproductive’, or ‘sexual’. From the remaining set of descriptions, we classified all responses associated including the following terms as being made by ecology researchers: ‘plant’, ‘*population’, ‘marine biology’, ‘biodiversity’, ‘community’, ‘environment*’, ‘conservation’, ‘ecology’, ‘botany’, ‘mycology’, or ‘zoology’. Researchers who did not use any of these terms and those who did not complete the self-identified sub-discipline question (n = 396) were left in their original journal discipline category as outlined in
Only 69% (558-560/807) of our sample completed the demographic questions at the end of our survey. Of the 560 who completed the gender question, 69% identified as male, 29% as female, 0.2% identified as non-binary and 1% preferred not to say. Of the 558 who completed the career status question, 6% identified as graduate students, 33% as post-doctoral researchers, 24% as midcareer researchers/academics and 37% as senior researchers/academics. The 559 who completed the age question were divided between age categories as follows: under 30 (11.5%), 30–39 (46.7%), 40–49 (25.9%), 50–59 (9.8%), 60–69 (4.8%), and over 70 (1.3%).
Our research practices survey was administered via Qualtrics (Provo, UT, USA).The survey (
Not reporting studies or variables that failed to reach statistical significance (e.g. p ≤0.05) or some other desired statistical threshold.
Not reporting covariates that failed to reach statistical significance (e.g. p ≤0.05) or some other desired statistical threshold.
Reporting an unexpected finding or a result from exploratory analysis as having been predicted from the start.
Reporting a set of statistical models as the complete tested set when other candidate models were also tested.
Rounding-off a p value or other quantity to meet a pre-specified threshold (e.g., reporting p = 0.054 as p = 0.05 or p = 0.013 as p = 0.01).
Deciding to exclude data points after first checking the impact on statistical significance (e.g. p ≤ 0.05) or some other desired statistical threshold.
Collecting more data for a study after first inspecting whether the results are statistically significant (e.g. p ≤ 0.05).
Changing to another type of statistical analysis after the analysis initially chosen failed to reach statistical significance (e.g. p ≤ 0.05) or some other desired statistical threshold.
Not disclosing known problems in the method and analysis, or problems with the data quality, that potentially impact conclusions.
Filling in missing data points without identifying those data as simulated.
Questions 1 to 9 were shown in random order but question 10 was always shown last, because it is particularly controversial and we did not want it to influence the responses to other items. For each of these 10 practices, researchers were asked to:
estimate the percentage of ecology (evolution) researchers who they believe have engaged in this practice on at least one occasion (0–100%)
specify how often they had themselves engaged in the practice (never, once, occasionally, frequently, almost always)
specify how often they believe the practice
At the end of each QRP, researchers had the opportunity to make additional comments under the open-ended question: ‘why do you think this practice should or shouldn’t be used?’.
At the end of the set of 10 QRP questions, researchers were asked “have you ever had doubts about the scientific integrity of researchers in ecology (evolution)?”, and asked to specify the frequency of such doubts, if any, for different sub-groups. Finally, the survey included demographic questions about participants’ career stage, gender, age and sub-discipline, discussed above.
Analyses were preregistered after data collection had commenced but before the data were viewed [
Light columns represent the proportion of evolution researchers and dark columns represent the proportion of ecology researchers who reported having used a practice at least once. The dots show researchers’ mean estimates of suspected use by colleagues in their field. Dots that are much higher than bars may suggest that the QRP is considered particularly socially unacceptable [
Shading indicates the proportion of each use category that identified the practice as acceptable. Error bars are 95% confidence intervals.
n = 555–626.
Questionable Research Practice | Psychology |
Psychology USA |
Ecology | Evolution |
---|---|---|---|---|
Not reporting response (outcome) variables that failed to reach statistical significance |
47.9 |
63.4 |
64.1 |
63.7 |
Collecting more data after inspecting whether the results are statistically significant |
53.2 |
55.9 |
36.9 |
50.7 |
Rounding-off a p value or other quantity to meet a pre-specified threshold |
22.2 |
22.0 |
27.3 |
17.5 |
Deciding to exclude data points after first checking the impact on statistical significance | 39.7 |
38.2 |
24.0 |
23.9 |
Reporting an unexpected finding as having been predicted from the start |
37.4 |
27.0 |
48.5 |
54.2 |
Filling in missing data points without identifying those data as simulated |
2.3 |
0.6 |
4.5 |
2.0 |
#note that these statements began with “in a paper,” in John et al. [
*note that this was referred to as “falsifying data” in John et al. [
Overall, researchers in ecology and evolution reported high levels of Questionable Research Practices (
The responses for ecology and evolution researchers were broadly similar to those from the samples of psychologists studied by John et al. [
Broadly, researchers’ self-reported QRP use was closely related to their estimates of prevalence of QRPs in the scientific community (
It was extremely rare for researchers to report high frequency (‘frequently’, ‘almost always’) use of QRPs. Most reported usage was at low frequency (‘once’, ‘occasionally’), with many researchers reporting they had never engaged in these practices (
Age and career stage were not strong predictors of how frequently researchers used Questionable Research Practices (Kendall’s Tau of 0.05, 95% CI = 0.001–0.069 and 0.04, 95% CI 0.011–0.058 respectively) but there was a considerable correlation between how often participants thought the practice should be used and how often they used it (Kendall’s Tau = 0.6, 95% CI = 0.61–0.65). Those who used practices frequently or almost always were much more likely to indicate that they should be used often.
Researchers in ecology and evolution expressed considerable doubts about their community’s scientific integrity (
Questionable Research Practices | Scientific Misconduct | |||||
---|---|---|---|---|---|---|
Never | Once or Twice | Often | Never | Once or Twice | Often | |
Researchers from other institutions | 8.9 | 56.6 | 34.5 | 39.0 | 55.5 | 5.5 |
Research at your institution | 27.9 | 52.2 | 20.0 | 69.2 | 29.1 | 1.6 |
Graduate student research at your institution | 31.0 | 48.6 | 20.4 | 72.5 | 25.6 | 1.8 |
Senior colleagues or collaborators | 31.5 | 50.8 | 17.7 | 73.3 | 24.7 | 2.0 |
Your own research | 52.2 | 44.6 | 3.2 | 97.9 | 2.0 | 0.0 |
*note that not all researchers answered each component of the table above so the total sample size for each of the cells differs slightly, ranging from 488 to 539 samples per cell
At the end of each QRP question, researchers had the opportunity to make additional comments on the practice. Overall, we were surprised by the proportion of researchers who made comments. For some QRPs half the researchers left comments, and often substantial ones. Here we have summarised the ecology and evolution groups’ comments together, having not detected any major differences between the groups in a qualitative assessment. We interpret the volume of additional comments positively, as evidence of a research community highly engaged with issues of research practice and scientific integrity.
The most frequently offered justifications for engaging in QRPs were: publication bias; pressure to publish; and the desire to present a neat, coherent narrative (
Columns relate to the description of the questionable research practice, complaints respondents made about the practice, indications on why they thought that practice might be tempting, and conditions that respondents identified as justifying the practice.
Description | Complaints | Temptation | Justifications |
QRP 1: Not reporting studies or variables that failed to reach statistical significance |
- increases false positive rate |
- hard to publish non-significant results |
- original method was flawed |
“Sometimes lots of data are collected and tested. Often non-significant variables are thrown out if they're not integral to the story. I think this is okay.” |
|||
Description | Complaints | Temptation | Justifications |
QRP 3: Reporting an unexpected finding as having been predicted |
- it is unethical |
- makes article sexier |
- new hypotheses arise from better understanding of the system |
“well, this is a difficult one—in the statistical sense, this should not happen, but in current times scientists are forced to market their work as best as possible and this is one way to make it more publishable.” |
|||
Description | Complaints | Temptation | Justifications |
QRP 5: Rounding- off a p value or other quantity to meet a pre-specified threshold |
- it is unethical |
- the 0.05 threshold is arbitrary anyway |
- all results are presented |
“Attempts to conform to strict cut-off significance thresholds demonstrate an adherence to conventional practice over understanding of probability (e.g. the difference between p = 0.013 and 0.010 is and should be viewed as trivial).” |
Our results indicate that QRPs are broadly as common in ecology and evolution research as they are in psychology. Of the 807 researchers in our sample, 64% reported cherry picking statistically significant results in at least one publication; 42% reported
Our results are most marked by how similar rates of QRPs were across disciplines, but a couple of differences are worth noting. Ecology researchers were less likely to report ‘collecting more data after inspecting whether the results are statistically significant’ (QRP7) than evolution researchers or psychologists. We suspect this reflects a difference in the constraints of field versus laboratory research, rather than differences in the integrity of the researchers. It is often not physically possible collect more data after the fact in ecology (field sites may be distant, available sites and budgets may be exhausted). This interpretation seems supported by evidence that many ecologists who stated that they had ‘never’ engaged in this practice indicated that they found it acceptable.
The first nine of the QRPs we asked about were certainly controversial practices, generating mixed responses. The tenth is qualitatively different; it essentially asks about data fabrication. The social unacceptability of this practice is well recognised, and we might therefore expect under reporting even in an anonymous survey. The comments volunteered by participants largely reflected this, for example “Is that the science of ‘alternative facts’?” and “It is serious scientific misconduct to report results that were not observed”. The proportion of researchers admitting to this was relatively high in ecology (4.5%) compared to evolution (2.0%), US psychology (2.3%) and Italian psychology (0.6%). However, it’s important to note that our wording of this question was quite different to that in the John et al and Agnoli et al surveys. They asked directly about ‘falsifying data’ whereas we asked a softer, less direct question about ‘filling in missing data points without identifying those data as simulated’. Fiedler et al (2015) found that modified question wording changed QRP reporting rates and we suspect our change to the wording has resulted in an elevated reporting rate. We will not speculate further about ecology researchers reporting a higher rate of this than evolution researchers because the numbers of researchers admitting to this action are very small in both groups and the 95%CIs on these proportions overlap considerably.
Our results contribute to the broader understanding of researchers’ practices in two important ways. First, our results on reported frequency provide new insight into the regularity with which researchers engage in these practices; previous surveys in psychology did not elicit this information and asked only if the practice had been used ‘at least once’. Information about frequency of use allows us to better estimate the disruption these practices may have had on the published literature. We show that while reports of having engaged in QRPs at least once are alarmingly high, virtually no researchers acknowledge using any of the QRPs more than ‘occasionally’. Secondly, our qualitative results offer new understanding of the perceived acceptability of these practices, and common justifications of their use.
Our qualitative analysis highlights the perception of a detrimental influence of the current publish-or-perish culture and rigid format currently required in many ecology and evolution journals. Researchers’ comments revealed that they feel pressure to present a short, cohesive story with statistically significant results that confirm a priori hypotheses, rather than a full (and likely messy) account of the research as it was conceptualised and conducted.
Researchers’ qualitative comments also drew attention to grey areas, where the distinction between QRPs and acceptable practice was less clear. For example, in many ecology and evolution articles no hypotheses are overtly stated but the way the background material is described in the introduction can imply that the result was expected; does this constitute HARKing? Similarly, a number of participants answering QRP 6 stated that, although they had technically changed models after investigating statistical significance, their decision to change models was based on finding an error in the original model or discovering that the data did not match the model assumptions. These participants are recorded as using this QRP but whether or not it was ‘questionable’ in their case is unclear.
Discrepancies between individual researchers’ self-identified QRP use and their estimates of others’ use suggest that certain practices are less socially acceptable. When average estimates of others’ use are much higher than average self-report of the practice, it suggests that the practice is particularly socially undesirable and that self-report measures may underestimate prevalence [
Some key limitations need to be considered when interpreting the results from our study. Firstly, our sample of ecology and evolution researchers might be biased. We contacted only researchers who had published in high impact factor journals, which pre-determined some demographics of our sample. For example, it likely limited the number of graduate students (6%). Our results should be understood as reflecting the practices of post-doctoral, midcareer and senior academic researchers almost exclusively. There is also very likely to be a self-selection bias in our sample of survey respondents. Those who are more confident in their practices–and perhaps more quantitatively confident in general–may have been more likely to respond. If this is the case, then it seems most likely that it would result in an underestimate of QRP rates in the broader ecology and evolution community rather than an overestimate.
Another limitation in the data set is that, in order to assure participants of their anonymity, we did not collect any data on their country of origin. Evidence from Agnoli et al [
Lastly, we collected survey responses between November 2016 and July 2017, it is theoretically possible that the rate of certain QRPs has changed over this time. However, as we ask participants whether they have used any of these practices “never”, “once”, “occasionally”, “frequently”, or “almost always”, we suspect that any behaviour changes in this time period will not be evident in responses to our surveys.
Our results indicate that there is substantial room to improve research practices in ecology and evolution. However, none of these problems are insurmountable. In fact, the correlation we found between acceptability and prevalence of QRPs and the justifications people provided in text (
Some editors in ecology and evolutionary biology have also instigated important changes such as requiring data archiving at a handful of prominent journals [
The use of Questionable Research Practices in ecology and evolution research is high enough to be of concern. The rates of QRPs found in our sample of 807 ecologists and evolutionary biologists are similar to those that have been found in psychology, where the reproducibility rates of published research have been systematically studied and found to be low (36–47% depending on the measure [
(PDF)
(DOCX)
(DOCX)
Fiona Fidler is supported by an Australian Research Council Future Fellowship (FT150100297). We would like to thank Felix Singleton Thorn, Franca Agnoli and three reviewers for feedback on the manuscript and Aurora Marquette for her assistance in collecting contact author addresses.