Skip to main content
  • Loading metrics

Ten simple rules for good research practice

This is a PLOS Computational Biology Methods paper.


The lack of research reproducibility has caused growing concern across various scientific fields [15]. Today, there is widespread agreement, within and outside academia, that scientific research is suffering from a reproducibility crisis [6,7]. Researchers reach different conclusions—even when the same data have been processed—simply due to varied analytical procedures [8,9]. As we continue to recognize this problematic situation, some major causes of irreproducible research have been identified. This, in turn, provides the foundation for improvement by identifying and advocating for good research practices (GRPs). Indeed, powerful solutions are available, for example, preregistration of study protocols and statistical analysis plans, sharing of data and analysis code, and adherence to reporting guidelines. Although these and other best practices may facilitate reproducible research and increase trust in science, it remains the responsibility of researchers themselves to actively integrate them into their everyday research practices.

Contrary to ubiquitous specialized training, cross-disciplinary courses focusing on best practices to enhance the quality of research are lacking at universities and are urgently needed. The intersections between disciplines offer a space for peer evaluation, mutual learning, and sharing of best practices. In medical research, interdisciplinary work is inevitable. For example, conducting clinical trials requires experts with diverse backgrounds, including clinical medicine, pharmacology, biostatistics, evidence synthesis, nursing, and implementation science. Bringing researchers with diverse backgrounds and levels of experience together to exchange knowledge and learn about problems and solutions adds value and improves the quality of research.

The present selection of rules was based on our experiences with teaching GRP courses at the University of Zurich, our course participants’ feedback, and the views of a cross-disciplinary group of experts from within the Swiss Reproducibility Network ( The list is neither exhaustive, nor does it aim to address and systematically summarize the wide spectrum of issues including research ethics and legal aspects (e.g., related to misconduct, conflicts of interests, and scientific integrity). Instead, we focused on practical advice at the different stages of everyday research: from planning and execution to reporting of research. For a more comprehensive overview on GRPs, we point to the United Kingdom’s Medical Research Council’s guidelines [10] and the Swedish Research Council’s report [11]. While the discussion of the rules may predominantly focus on clinical research, much applies, in principle, to basic biomedical research and research in other domains as well.

The 10 proposed rules can serve multiple purposes: an introduction for researchers to relevant concepts to improve research quality, a primer for early-career researchers who participate in our GRP courses, or a starting point for lecturers who plan a GRP course at their own institutions. The 10 rules are grouped according to planning (5 rules), execution (3 rules), and reporting of research (2 rules); see Fig 1. These principles can (and should) be implemented as a habit in everyday research, just like toothbrushing.

Fig 1. The 10 simple rules for GRP grouped into planning, execution, and reporting of research.

GRP, good research practices.

Research planning

Rule 1: Specify your research question

Coming up with a research question is not always simple and may take time. A successful study requires a narrow and clear research question. In evidence-based research, prior studies are assessed in a systematic and transparent way to identify a research gap for a new study that answers a question that matters [12]. Papers that provide a comprehensive overview of the current state of research in the field are particularly helpful—for example, systematic reviews. Perspective papers may also be useful, for example, there is a paper with the title “SARS-CoV-2 and COVID-19: The most important research questions.” However, a systematic assessment of research gaps deserves more attention than opinion-based publications.

In the next step, a vague research question should be further developed and refined. In clinical research and evidence-based medicine, there is an approach called population, intervention, comparator, outcome, and time frame (PICOT) with a set of criteria that can help framing a research question [13]. From a well-developed research question, subsequent steps will follow, which may include the exact definition of the population, the outcome, the data to be collected, and the sample size that is required. It may be useful to find out if other researchers find the idea interesting as well and whether it might promise a valuable contribution to the field. However, actively involving the public or the patients can be a more effective way to determine what research questions matter.

The level of details in a research question also depends on whether the planned research is confirmatory or exploratory. In contrast to confirmatory research, exploratory research does not require a well-defined hypothesis from the start. Some examples of exploratory experiments are those based on omics and multi-omics experiments (genomics, bulk RNA-Seq, single-cell, etc.) in systems biology and connectomics and whole-brain analyses in brain imaging. Both exploration and confirmation are needed in science, and it is helpful to understand their strengths and limitations [14,15].

Rule 2: Write and register a study protocol

In clinical research, registration of clinical trials has become a standard since the late 1990 and is now a legal requirement in many countries. Such studies require a study protocol to be registered, for example, with, the European Clinical Trials Register, or the World Health Organization’s International Clinical Trials Registry Platform. Similar effort has been implemented for registration of systematic reviews (PROSPERO). Study registration has also been proposed for observational studies [16] and more recently in preclinical animal research [17] and is now being advocated across disciplines under the term “preregistration” [18,19].

Study protocols typically document at minimum the research question and hypothesis, a description of the population, the targeted sample size, the inclusion/exclusion criteria, the study design, the data collection, the data processing and transformation, and the planned statistical analyses. The registration of study protocols reduces publication bias and hindsight bias and can safeguard honest research and minimize waste of research [2022]. Registration ensures that studies can be scrutinized by comparing the reported research with what was actually planned and written in the protocol, and any discrepancies may indicate serious problems (e.g., outcome switching).

Note that registration does not mean that researchers have no flexibility to adapt the plan as needed. Indeed, new or more appropriate procedures may become available or known only after registration of a study. Therefore, a more detailed statistical analysis plan can be amended to the protocol before the data are observed or unblinded [23,24]. Likewise, registration does not exclude the possibility to conduct exploratory data analyses; however, they must be clearly reported as such.

To go even further, registered reports are a novel article type that incentivize high-quality research—irrespective of the ultimate study outcome [25,26]. With registered reports, peer-reviewers decide before anyone knows the results of the study, and they have a more active role in being able to influence the design and analysis of the study. Journals from various disciplines increasingly support registered reports [27].

Naturally, preregistration and registered reports also have their limitations and may not be appropriate in a purely hypothesis-generating (explorative) framework. Reports of exploratory studies should indeed not be molded into a confirmatory framework; appropriate rigorous reporting alternatives have been suggested and start to become implemented [28,29].

Rule 3: Justify your sample size

Early-career researchers in our GRP courses often identify sample size as an issue in their research. For example, they say that they work with a low number of samples due to slow growth of cells, or they have a limited number of patient tumor samples due to a rare disease. But if your sample size is too low, your study has a high risk of providing a false negative result (type II error). In other words, you are unlikely to find an effect even if there truly was an effect.

Unfortunately, there is more bad news with small studies. When an effect from a small study was selected for drawing conclusions because it was statistically significant, low power increases the probability that an effect size is overestimated [30,31]. The reason is that with low power, studies that due to sampling variation find larger (overestimated) effects are much more likely to be statistically significant than those that happen to find smaller (more realistic) effects [30,32,33]. Thus, in such situations, effect sizes are often overestimated. For the phenomenon that small studies often report more extreme results (in meta-analyses), the term “small-study effect” was introduced [34]. In any case, an underpowered study is a problematic study, no matter the outcome.

In conclusion, small sample sizes can undermine research, but when is a study too small? For one study, a total of 50 patients may be fine, but for another, 1,000 patients may be required. How large a study needs to be designed requires an appropriate sample size calculation. Appropriate sample size calculation ensures that enough data are collected to ensure sufficient statistical power (the probability to reject the null hypothesis when it is in fact false).

Low-powered studies can be avoided by performing a sample size calculation to find out the required sample size of the study. This requires specifying a primary outcome variable and the magnitude of effect you are interested in (among some other factors); in clinical research, this is often the minimal clinically relevant difference. The statistical power is often set at 80% or larger. A comprehensive list of packages for sample size calculation are available [35], among them the R package “pwr” [36]. There are also many online calculators available, for example, the University of Zurich’s “SampleSizeR” [37].

A worthwhile alternative for planning the sample size that puts less emphasis on null hypothesis testing is based on the desired precision of the study; for example, one can calculate the sample size that is necessary to obtain a desired width of a confidence interval for the targeted effect [3840]. A general framework to sample size justification beyond a calculation-only approach has been proposed [41]. It is also worth mentioning that some study types have other requirements or need specific methods. In diagnostic testing, one would need to determine the anticipated minimal sensitivity or specificity; in prognostic research, the number of parameters that can be used to fit a prediction model given a fixed sample size should be specified. Designs can also be so complex that a simulation (Monte Carlo method) may be required.

Sample size calculations should be done under different assumptions, and the largest estimated sample size is often the safer bet than a best-case scenario. The calculated sample size should further be adjusted to allow for possible missing data. Due to the complexity of accurately calculating sample size, researchers should strongly consider consulting a statistician early in the study design process.

Rule 4: Write a data management plan

In 2020, 2 Coronavirus Disease 2019 (COVID-19) papers in leading medical journals were retracted after major concerns about the data were raised [42]. Today, raw data are more often recognized as a key outcome of research along with the paper. Therefore, it is important to develop a strategy for the life cycle of data, including suitable infrastructure for long-term storage.

The data life cycle is described in a data management plan: a document that describes what data will be collected and how the data will be organized, stored, handled, and protected during and after the end of the research project. Several funders require a data management plan in grant submissions, and publishers like PLOS encourage authors to do so as well. The Wellcome Trust provides guidance in the development of a data management plan, including real examples from neuroimaging, genomics, and social sciences [43]. However, projects do not always allocate funding and resources to the actual implementation of the data management plan.

The Findable, Accessible, Interoperable, and Reusable (FAIR) data principles promote maximal use of data and enable machines to access and reuse data with minimal human intervention [44]. FAIR principles require the data to be retained, preserved, and shared preferably with an immutable unique identifier and a clear usage license. Appropriate metadata will help other researchers (or machines) to discover, process, and understand the data. However, requesting researchers to fully comply with the FAIR data principles in every detail is an ambitious goal.

Multidisciplinary data repositories that support FAIR are, for example, Dryad (, EUDAT (, OSF (, and Zenodo ( A number of institutional and field-specific repositories may also be suitable. However, sometimes, authors may not be able to make their data publicly available for legal or ethical reasons. In such cases, a data user agreement can indicate the conditions required to access the data. Journals highlight what are acceptable and what are unacceptable data access restrictions and often require a data availability statement.

Organizing the study artifacts in a structured way greatly facilitates the reuse of data and code within and outside the lab, enhancing collaborations and maximizing the research investment. Support and courses for data management plans are sometimes available at universities. Another 10 simple rules paper for creating a good data management plan is dedicated to this topic [45].

Rule 5: Reduce bias

Bias is a distorted view in favor of or against a particular idea. In statistics, bias is a systematic deviation of a statistical estimate from the (true) quantity it estimates. Bias can invalidate our conclusions, and the more bias there is, the less valid they are. For example, in clinical studies, bias may mislead us into reaching a causal conclusion that the difference in the outcomes was due to the intervention or the exposure. This is a big concern, and, therefore, the risk of bias is assessed in clinical trials [46] as well as in observational studies [47,48].

There are many different forms of bias that can occur in a study, and they may overlap (e.g., allocation bias and confounding bias) [49]. Bias can occur at different stages, for example, immortal time bias in the design of the study, information bias in the execution of the study, and publication bias in the reporting of research. Understanding bias allows us researchers to remain vigilant of potential sources of bias when peer-reviewing and designing own studies. We summarized some common types of bias and some preventive steps in Table 1, but many other forms of bias exist; for a comprehensive overview, see the Oxford University’s Catalogue of Bias [50].

Table 1. Common types of bias that can affect a research study and some measures that may prevent them.

Here are some noteworthy examples of study bias from the literature: An example of information bias was observed when in 1998 an alleged association between the measles, mumps, and rubella (MMR) vaccine and autism was reported. Recall bias (a subtype of information bias) emerged when parents of autistic children recalled the onset of autism after an MMR vaccination more often than parents of similar children who were diagnosed prior to the media coverage of that controversial and meanwhile retracted study [51]. A study from 2001 showed better survival for academy award-winning actors, but this was due to immortal time bias that favors the treatment or exposure group [52,53]. A study systematically investigated self-reports about musculoskeletal symptoms and found the presence of information bias. The reason was that participants with little computer-time overestimated, and participants with a lot of computer-time spent underestimated their computer usage [54].

Information bias can be mitigated by using objective rather than subjective measurements. Standardized operating procedures (SOP) and electronic lab notebooks additionally help to follow well-designed protocols for data collection and handling [55]. Despite the failure to mitigate bias in studies, complete descriptions of data and methods can at least allow the assessment of risk of bias.

Research execution

Rule 6: Avoid questionable research practices

Questionable research practices (QRPs) can lead to exaggerated findings and false conclusions and thus lead to irreproducible research. Often, QRPs are used with no bad intentions. This becomes evident when methods sections explicitly describe such procedures, for example, to increase the number of samples until statistical significance is reached that supports the hypothesis. Therefore, it is important that researchers know about QRPs in order to recognize and avoid them.

Several questionable QRPs have been named [56,57]. Among them are low statistical power, pseudoreplication, repeated inspection of data, p-hacking [58], selective reporting, and hypothesizing after the results are known (HARKing).

The first 2 QRPs, low statistical power and pseudoreplication, can be prevented by proper planning and designing of studies, including sample size calculation and appropriate statistical methodology to avoid treating data as independent when in fact they are not. Statistical power is not equal to reproducibility, but statistical power is a precondition of reproducibility as the lack thereof can result in false negative as well as false positive findings (see Rule 3).

In fact, a lot of QRP can be avoided with a study protocol and statistical analysis plan. Preregistration, as described in Rule 2, is considered best practice for this purpose. However, many of these issues can additionally be rooted in institutional incentives and rewards. Both funding and promotion are often tied to the quantity rather than the quality of the research output. At universities, still only few or no rewards are given for writing and registering protocols, sharing data, publishing negative findings, and conducting replication studies. Thus, a wider “culture change” is needed.

Rule 7: Be cautious with interpretations of statistical significance

It would help if more researchers were familiar with correct interpretations and possible misinterpretations of statistical tests, p-values, confidence intervals, and statistical power [59,60]. A statistically significant p-value does not necessarily mean that there is a clinically or biologically relevant effect. Specifically, the traditional dichotomization into statistically significant (p < 0.05) versus statistically nonsignificant (p ≥ 0.05) results is seldom appropriate, can lead to cherry-picking of results and may eventually corrupt science [61]. We instead recommend reporting exact p-values and interpreting them in a graded way in terms of the compatibility of the null hypothesis with the data [62,63]. Moreover, a p-value around 0.05 (e.g., 0.047 or 0.055) provides only little information, as is best illustrated by the associated replication power: The probability that a hypothetical replication study of the same design will lead to a statistically significant result is only 50% [64] and is even lower in the presence of publication bias and regression to the mean (the phenomenon that effect estimates in replication studies are often smaller than the estimates in the original study) [65]. Claims of novel discoveries should therefore be based on a smaller p-value threshold (e.g., p < 0.005) [66], but this really depends on the discipline (genome-wide screenings or studies in particle physics often apply much lower thresholds).

Generally, there is often too much emphasis on p-values. A statistical index such as the p-value is just the final product of an analysis, the tip of the iceberg [67]. Statistical analyses often include many complex stages, from data processing, cleaning, transformation, addressing missing data, modeling, to statistical inference. Errors and pitfalls can creep in at any stage, and even a tiny error can have a big impact on the result [68]. Also, when many hypothesis tests are conducted (multiple testing), false positive rates may need to be controlled to protect against wrong conclusions, although adjustments for multiple testing are debated [6971].

Thus, a p-value alone is not a measure of how credible a scientific finding is [72]. Instead, the quality of the research must be considered, including the study design, the quality of the measurement, and the validity of the assumptions that underlie the data analysis [60,73]. Frameworks exist that help to systematically and transparently assess the certainty in evidence; the most established and widely used one is Grading of Recommendations, Assessment, Development and Evaluations (GRADE; [74].

Training in basic statistics, statistical programming, and reproducible analyses and better involvement of data professionals in academia is necessary. University departments sometimes have statisticians that can support researchers. Importantly, statisticians need to be involved early in the process and on an equal footing and not just at the end of a project to perform the final data analysis.

Rule 8: Make your research open

In reality, science often lacks transparency. Open science makes the process of producing evidence and claims transparent and accessible to others [75]. Several universities and research funders have already implemented open science roadmaps to advocate free and public science as well as open access to scientific knowledge, with the aim of further developing the credibility of research. Open research allows more eyes to see it and critique it, a principle similar to the “Linus’s law” in software development, which says that if there are enough people to test a software, most bugs will be discovered.

As science often progresses incrementally, writing and sharing a study protocol and making data and methods readily available is crucial to facilitate knowledge building. The Open Science Framework ( is a free and open-source project management tool that supports researchers throughout the entire project life cycle. OSF enables preregistration of study protocols and sharing of documents, data, analysis code, supplementary materials, and preprints.

To facilitate reproducibility, a research paper can link to data and analysis code deposited on OSF. Computational notebooks are now readily available that unite data processing, data transformations, statistical analyses, figures and tables in a single document (e.g., R Markdown, Jupyter); see also the 10 simple rules for reproducible computational research [76]. Making both data and code open thus minimizes waste of funding resources and accelerates science.

Open science can also advance researchers’ careers, especially for early-career researchers. The increased visibility, retrievability, and citations of datasets can all help with career building [77]. Therefore, institutions should provide necessary training, and hiring committees and journals should align their core values with open science, to attract researchers who aim for transparent and credible research [78].

Research reporting

Rule 9: Report all findings

Publication bias occurs when the outcome of a study influences the decision whether to publish it. Researchers, reviewers, and publishers often find nonsignificant study results not interesting or worth publishing. As a consequence, outcomes and analyses are only selectively reported in the literature [79], also known as the file drawer effect [80].

The extent of publication bias in the literature is illustrated by the overwhelming frequency of statistically significant findings [81]. A study extracted p-values from MEDLINE and PubMed Central and showed that 96% of the records reported at least 1 statistically significant p-value [82], which seems implausible in the real world. Another study plotted the distribution of more than 1 million z-values from Medline, revealing a huge gap from −2 to 2 [83]. Positive studies (i.e., statistically significant, perceived as striking or showing a beneficial effect) were 4 times more likely to get published than negative studies [84].

Often a statistically nonsignificant result is interpreted as a “null” finding. But a nonsignificant finding does not necessarily mean a null effect; absence of evidence is not evidence of absence [85]. An individual study may be underpowered, resulting in a nonsignificant finding, but the cumulative evidence from multiple studies may indeed provide sufficient evidence in a meta-analysis. Another argument is that a confidence interval that contains the null value often also contains non-null values that may be of high practical importance. Only if all the values inside the interval are deemed unimportant from a practical perspective, then it may be fair to describe a result as a null finding [61]. We should thus never report “no difference” or “no association” just because a p-value is larger than 0.05 or, equivalently, because a confidence interval includes the “null” [61].

On the other hand, studies sometimes report statistically nonsignificant results with “spin” to claim that the experimental treatment is beneficial, often by focusing their conclusions on statistically significant differences on secondary outcomes despite a statistically nonsignificant difference for the primary outcome [86,87].

Findings that are not being published have a tremendous impact on the research ecosystem, distorting our knowledge of the scientific landscape by perpetuating misconceptions, and jeopardizing judgment of researchers and the public trust in science. In clinical research, publication bias can mislead care decisions and harm patients, for example, when treatments appear useful despite only minimal or even absent benefits reported in studies that were not published and thus are unknown to physicians [88]. Moreover, publication bias also directly affects the formulation and proliferation of scientific theories, which are taught to students and early-career researchers, thereby perpetuating biased research from the core. It has been shown in modeling studies that unless a sufficient proportion of negative studies are published, a false claim can become an accepted fact [89] and the false positive rates influence trustworthiness in a given field [90].

In sum, negative findings are undervalued. They need to be more consistently reported at the study level or be systematically investigated at the systematic review level. Researchers have their share of responsibilities, but there is clearly a lack of incentives from promotion and tenure committees, journals, and funders.

Rule 10: Follow reporting guidelines

Study reports need to faithfully describe the aim of the study and what was done, including potential deviations from the original protocol, as well as what was found. Yet, there is ample evidence of discrepancies between protocols and research reports, and of insufficient quality of reporting [79,9195]. Reporting deficiencies threaten our ability to clearly communicate findings, replicate studies, make informed decisions, and build on existing evidence, wasting time and resources invested in the research [96].

Reporting guidelines aim to provide the minimum information needed on key design features and analysis decisions, ensuring that findings can be adequately used and studies replicated. In 2008, the Enhancing the QUAlity and Transparency Of Health Research (EQUATOR) network was initiated to provide reporting guidelines for a variety of study designs along with guidelines for education and training on how to enhance quality and transparency of health research. Currently, there are 468 reporting guidelines listed in the network; see the most prominent guidelines in Table 2. Furthermore, following the ICMJE recommendations, medical journals are increasingly endorsing reporting guidelines [97], in some cases making it mandatory to submit the appropriate reporting checklist along with the manuscript.

Table 2. Examples of reporting guidelines for different study types.

The use of reporting guidelines and journal endorsement has led to a positive impact on the quality and transparency of research reporting, but improvement is still needed to maximize the value of research [98,99].


Originally, this paper targeted early-career researchers; however, throughout the development of the rules, it became clear that the present recommendations can serve all researchers irrespective of their seniority. We focused on practical guidelines for planning, conducting, and reporting of research. Others have aligned GRP with similar topics [100,101]. Even though we provide 10 simple rules, the word “simple” should not be taken lightly. Putting the rules into practice usually requires effort and time, especially at the beginning of a research project. However, time can also be redeemed, for example, when certain choices can be justified to reviewers by providing a study protocol or when data can be quickly reanalyzed by using computational notebooks and dynamic reports.

Researchers have field-specific research skills, but sometimes are not aware of best practices in other fields that can be useful. Universities should offer cross-disciplinary GRP courses across faculties to train the next generation of scientists. Such courses are an important building block to improve the reproducibility of science.


This article was written along the Good Research Practice (GRP) courses at the University of Zurich provided by the Center of Reproducible Science ( All materials from the course are available at We appreciated the discussion, development, and refinement of this article within the working group “training” of the SwissRN ( We are grateful to Philip Bourne for a lot of valuable comments on the earlier versions of the manuscript.


  1. 1. Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, et al. Investigating the replicability of preclinical cancer biology. Elife. 2021;10. pmid:34874005
  2. 2. Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018;2:637–44. pmid:31346273
  3. 3. Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351:1433–6. pmid:26940865
  4. 4. Open Science Collaboration. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716. pmid:26315443
  5. 5. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011:712. pmid:21892149
  6. 6. Bespalov A, Barnett AG, Begley CG. Industry is more alarmed about reproducibility than academia. Nature. 2018;563:626. pmid:30487623
  7. 7. Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533:452–4. pmid:27225100
  8. 8. Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020. pmid:32483374
  9. 9. Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F, Awtrey E, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Methods Pract Psychol Sci. 2018;1:337–56.
  10. 10. Medical Research Council, MRC. Good research practice: principles and guidelines. 2012. Available:
  11. 11. Swedish Research Council. Good Research Practice–What Is It? 2006. Available:
  12. 12. Robinson KA, Brunnhuber K, Ciliska D, Juhl CB, Christensen R, Lund H, et al. Evidence-Based Research Series-Paper 1: What Evidence-Based Research is and why is it important? J Clin Epidemiol. 2021;129:151–7. pmid:32979491
  13. 13. Riva JJ, Malik KMP, Burnie SJ, Endicott AR, Busse JW. What is your research question? An introduction to the PICOT format for clinicians. J Can Chiropr Assoc. 2012;56:167–71. Available: pmid:22997465
  14. 14. Schwab S, Held L. Different worlds Confirmatory versus exploratory research. Significance. 2020;17:8–9.
  15. 15. Tukey JW. We need both exploratory and confirmatory. Am Stat. 1980;34:23–5.
  16. 16. Loder E, Groves T, MacAuley D. Registration of observational studies. BMJ. 2010:c950. pmid:20167643
  17. 17. van der Naald M, Wenker S, Doevendans PA, Wever KE, Chamuleau SAJ. Publication rate in preclinical research: a plea for preregistration. BMJ Open Science. 2020;4:e100051. pmid:35047690
  18. 18. Nosek BA, Beck ED, Campbell L, Flake JK, Hardwicke TE, Mellor DT, et al. Preregistration Is Hard. And Worthwhile Trends Cogn Sci. 2019;23:815–8. pmid:31421987
  19. 19. Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci U S A. 2018;115:2600–6. pmid:29531091
  20. 20. Bradley SH, DeVito NJ, Lloyd KE, Richards GC, Rombey T, Wayant C, et al. Reducing bias and improving transparency in medical research: a critical overview of the problems, progress and suggested next steps. J R Soc Med. 2020;113:433–43. pmid:33167771
  21. 21. Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JPA, et al. Biomedical research: increasing value, reducing waste. Lancet. 2014;383:101–4. pmid:24411643
  22. 22. Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet. 2009;374:86–9. pmid:19525005
  23. 23. Yuan I, Topjian AA, Kurth CD, Kirschen MP, Ward CG, Zhang B, et al. Guide to the statistical analysis plan. Paediatr Anaesth. 2019;29:237–42. pmid:30609103
  24. 24. Thomas L, Peterson ED. The value of statistical analysis plans in observational research: defining high-quality research from the start. JAMA. 2012;308:773–4. pmid:22910753
  25. 25. Soderberg CK, Errington TM, Schiavone SR, Bottesini J, Thorn FS, Vazire S, et al. Initial evidence of research quality of registered reports compared with the standard publishing model. Nature Human. Behaviour. 2021:1–8.
  26. 26. Chambers C. What’s next for Registered Reports? Nature. 2019;573:187–9. pmid:31506624
  27. 27. Chambers CD, Mellor DT. Protocol transparency is vital for registered reports. Nat Hum Behav. 2018:791–2. pmid:31558811
  28. 28. Dirnagl U. Preregistration of exploratory research: Learning from the golden age of discovery. PLoS Biol. 2020;18:e3000690. pmid:32214315
  29. 29. McIntosh RD. Exploratory reports: A new article type for Cortex. Cortex. 2017;96:A1–4. pmid:29110814
  30. 30. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76. pmid:23571845
  31. 31. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124. pmid:16060722
  32. 32. van Zwet E, Schwab S, Greenland S. Addressing exaggeration of effects from single RCTs. Significance. 2021;18:16–21.
  33. 33. van Zwet E, Schwab S, Senn S. The statistical properties of RCTs and a proposal for shrinkage. Stat Med. 2021;40:6107–17. pmid:34425632
  34. 34. Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000;53:1119–29. pmid:11106885
  35. 35. H. G. Zhang EZ. CRAN task view: Clinical trial design, monitoring, and analysis. 20 Jun 2021 [cited 3 Mar 2022]. Available:
  36. 36. Champely S. pwr: Basic Functions for Power Analysis. 2020. Available:
  37. 37. Tarigan B, Furrer R, Cherneva K. SampleSizeR: calculate sample sizes within completely randomized design. Open Science. Framework. 2021.
  38. 38. Rothman KJ, Greenland S. Planning Study Size Based on Precision Rather Than Power. Epidemiology. 2018;29:599–603. pmid:29912015
  39. 39. Bland JM. The tyranny of power: is there a better way to calculate sample size? BMJ. 2009;339:b3985. pmid:19808754
  40. 40. Haynes A, Lenz A, Stalder O, Limacher A. presize: An R-package for precision-based sample size calculation in clinical research. J Open Source Softw. 2021;6:3118.
  41. 41. Lakens D. Sample Size Justification. 2021.
  42. 42. Ledford H, Van Noorden R. High-profile coronavirus retractions raise concerns about data oversight. Nature. 2020;582:160. pmid:32504025
  43. 43. Outputs Management Plan—Grant Funding. In: Wellcome [Internet]. [cited 13 Feb 2022]. Available:
  44. 44. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. pmid:26978244
  45. 45. Michener WK. Ten Simple Rules for Creating a Good Data Management Plan. PLoS Comput Biol. 2015;11:e1004525. pmid:26492633
  46. 46. Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928. pmid:22008217
  47. 47. Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919. pmid:27733354
  48. 48. Bero L, Chartres N, Diong J, Fabbri A, Ghersi D, Lam J, et al. The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev. 2018;7:242. pmid:30577874
  49. 49. Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32:51–63. pmid:447779
  50. 50. Catalogue of bias collaboration. Catalogue of Bias. 2019. Available:
  51. 51. Andrews N, Miller E, Taylor B, Lingam R, Simmons A, Stowe J, et al. Recall bias, MMR, and autism. Arch Dis Child. 2002;87:493–4. pmid:12456546
  52. 52. Sylvestre M-P, Huszti E, Hanley JA. Do OSCAR winners live longer than less successful peers? A reanalysis of the evidence. Ann Intern Med. 2006:361–3discussion 392. pmid:16954361
  53. 53. Yadav K, Lewis RJ. Immortal Time Bias in Observational Studies. JAMA. 2021;325:686–7. pmid:33591334
  54. 54. Chang C-HJ, Menéndez CC, Robertson MM, Amick BC 3rd, Johnson PW, del Pino RJ, et al. Daily self-reports resulted in information bias when assessing exposure duration to computer use. Am J Ind Med. 2010;53:1142–9. pmid:20632313
  55. 55. Kwok R. How to pick an electronic laboratory notebook. Nature. 2018;560:269–70. pmid:30082695
  56. 56. Bishop D. Rein in the four horsemen of irreproducibility. Nature. 2019;568:435. pmid:31019328
  57. 57. Held L, Schwab S. Improving the reproducibility of science. Significance. 2020;17:10–1.
  58. 58. Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66. pmid:22006061
  59. 59. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50. pmid:27209009
  60. 60. Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016;70:129–33.
  61. 61. Amrhein V, Greenland S, McShane B. Retire statistical significance. Nature. 2019;567:305–7. pmid:30894741
  62. 62. Cox DR, Donnelly CA. Principles of Applied Statistics. Cambridge University Press; 2011. Available:
  63. 63. Amrhein V, Greenland S. Rewriting results in the language of compatibility. Trends Ecol Evol 2022;0. pmid:35227533
  64. 64. Goodman SN. A comment on replication, p-values and evidence. Stat Med. 1992;11:875–9. pmid:1604067
  65. 65. Held L, Pawel S, Schwab S. Replication power and regression to the mean. Significance. 2020;17:10–1.
  66. 66. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, et al. Redefine statistical significance. Nature Human. Behaviour. 2017;2:6–10. pmid:30980045
  67. 67. Leek JT, Peng RD. Statistics: P values are just the tip of the iceberg. Nature. 2015;520:612. pmid:25925460
  68. 68. Schwab S, Held L. Statistical programming: Small mistakes, big impacts. Significance. 2021;18:6–7.
  69. 69. Althouse AD. Adjust for Multiple Comparisons? It’s Not That Simple. Ann Thorac Surg. 2016;101:1644–5. pmid:27106412
  70. 70. Bender R, Lange S. Adjusting for multiple testing—when and how? J Clin Epidemiol 2001;54: 343–349. pmid:11297884
  71. 71. Greenland S. Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatr Perinat Epidemiol. 2021;35:8–23. pmid:33269490
  72. 72. Nuzzo R. Scientific method: statistical errors. Nature. 2014;506:150–2. pmid:24522584
  73. 73. Goodman SN. Aligning statistical and scientific reasoning. Science. 2016;352:1180–1.
  74. 74. GRADE approach. [cited 3 Mar 2022]. Available:
  75. 75. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nature Human. Behaviour. 2017;1:0021. pmid:33954258
  76. 76. Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013;9:e1003285. pmid:24204232
  77. 77. McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, et al. How open science helps researchers succeed. Elife. 2016;5. pmid:27387362
  78. 78. Schönbrodt F. Training students for the Open Science future. Nat Hum Behav. 2019;3:1031. pmid:31602034
  79. 79. Chan A-W, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;291:2457–65. pmid:15161896
  80. 80. Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979;86:638–41.
  81. 81. Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2012;90:891–904.
  82. 82. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990–2015. JAMA. 2016;315:1141–8. pmid:26978209
  83. 83. van Zwet EW, Cator EA. The significance filter, the winner’s curse and the need to shrink. Stat Neerl. 2021;75:437–52.
  84. 84. Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009:MR000006. pmid:19160345
  85. 85. Altman DG, Martin BJ. Statistics notes: Absence of evidence is not evidence of absence. BMJ. 1995;311:485.
  86. 86. Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA. 2010;303:2058–64. pmid:20501928
  87. 87. Khan MS, Lateef N, Siddiqi TJ, Rehman KA, Alnaimat S, Khan SU, et al. Level and Prevalence of Spin in Published Cardiovascular Randomized Clinical Trial Reports With Statistically Nonsignificant Primary Outcomes: A Systematic Review. JAMA Netw Open. 2019;2:e192622. pmid:31050775
  88. 88. Egger M, Smith GD. Bias in location and selection of studies. BMJ. 1998;316:61–6. pmid:9451274
  89. 89. Nissen SB, Magidson T, Gross K, Bergstrom CT. Publication bias and the canonization of false facts. Elife. 2016;5. pmid:27995896
  90. 90. Grimes DR, Bauch CT, Ioannidis JPA. Modelling science trustworthiness under publish or perish pressure. R Soc Open Sci. 2018;5:171511. pmid:29410855
  91. 91. Goldacre B, Drysdale H, Dale A, Milosevic I, Slade E, Hartley P, et al. COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials. 2019;20:118. pmid:30760329
  92. 92. Pildal J, Chan A-W, Hróbjartsson A, Forfang E, Altman DG, Gøtzsche PC. Comparison of descriptions of allocation concealment in trial protocols and the published reports: cohort study. BMJ. 2005;330:1049. pmid:15817527
  93. 93. Koensgen N, Rombey T, Allers K, Mathes T, Hoffmann F, Pieper D. Comparison of non-Cochrane systematic reviews and their published protocols: differences occurred frequently but were seldom explained. J Clin Epidemiol. 2019;110:34–41. pmid:30822507
  94. 94. Pocock SJ, Collier TJ, Dandreo KJ, de Stavola BL, Goldman MB, Kalish LA, et al. Issues in the reporting of epidemiological studies: a survey of recent practice. BMJ. 2004;329:883. pmid:15469946
  95. 95. Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, et al. A systematic review of comparisons between protocols or registrations and full reports in primary biomedical research. BMC Med Res Methodol. 2018;18:9. pmid:29325533
  96. 96. Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, et al. Reducing waste from incomplete or unusable reports of biomedical research. Lancet. 2014;383:267–76. pmid:24411647
  97. 97. Shamseer L, Hopewell S, Altman DG, Moher D, Schulz KF. Update on the endorsement of CONSORT by high impact factor journals: a survey of journal “Instructions to Authors” in 2014. Trials. 2016;17:301. pmid:27343072
  98. 98. Turner L, Shamseer L, Altman DG, Schulz KF, Moher D. Does use of the CONSORT Statement impact the completeness of reporting of randomised controlled trials published in medical journals? A Cochrane review Syst Rev. 2012;1:60. pmid:23194585
  99. 99. Stevens A, Shamseer L, Weinstein E, Yazdi F, Turner L, Thielman J, et al. Relation of completeness of reporting of health research to journals’ endorsement of reporting guidelines: systematic review. BMJ. 2014;348:g3804. pmid:24965222
  100. 100. Sarafoglou A, Hoogeveen S, Matzke D, Wagenmakers E-J. Teaching Good Research Practices: Protocol of a Research Master Course. Psychology Learning & Teaching. 2020;19:46–59.
  101. 101. Kabitzke P, Cheng KM, Altevogt B. Guidelines and Initiatives for Good Research Practice. Handb Exp Pharmacol. 2019. pmid:31696346