Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database

Tulsi Suchak; Anietie E. Aliu; Charlie Harrison; Reyer Zwiggelaar; Nophar Geifman; Matt Spick

doi:10.1371/journal.pbio.3003152

Peer Review History

Original SubmissionJanuary 15, 2025
15 Jan 2025 Author Response https://doi.org/10.1371/journal.pbio.3003152.r001
18 Jan 2025 Decision Letter - Roland Roberts, Editor Dear Dr Spick, Thank you for submitting your manuscript entitled "Analysis of NHANES-based research identifies risks of data dredging, false discoveries and misleading findings" for consideration as a Research Article by PLOS Biology. Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I'm writing to let you know that we would like to send your submission out for external peer review. However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire. Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Jan 21 2025 11:59PM. If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time. If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns. During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF. Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission. Kind regards, Roli Roberts Roland Roberts, PhD Senior Editor PLOS Biology rroberts@plos.org https://doi.org/10.1371/journal.pbio.3003152.r002
Revision 1
20 Jan 2025 Author Response https://doi.org/10.1371/journal.pbio.3003152.r003
26 Feb 2025 Decision Letter - Roland Roberts, Editor Dear Dr Spick, Thank you for your patience while your manuscript "Analysis of NHANES-based research identifies risks of data dredging, false discoveries and misleading findings" went through peer-review at PLOS Biology. Your manuscript has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by four independent reviewers. You'll see that reviewer #1 thinks that this study is important, but suggests that you use PMIDs instead of citing the questionable papers (another reviewer strongly agreed with this during cross-commenting), requests several points of discussion, and suggests a standardised reviewer template. Reviewer #2 is also very positive, and just has two points for discussion. Reviewer #3 is positive, but has a number of issues that will need to be addressed in order to convince. One is that she doesn’t think that the problems will necessarily be solved by multi-factorial analysis, and the other is that the patterns seen can emerge from research assessment structures in countries like China, without necessarily involving paper mills (this may be possible to address by analysis?). She also thinks that you should dwell more on solutions (cross-validation, pre-registration, guidelines for journal editors). Reviewer #4 is also very positive, and suggests a tweak to your search criteria, suggests that you cite his editorial (other reviewers agreed), and recommends mentioning the analogous concerns raised with Mendelian Randomisation studies. In light of the reviews, which you will find at the end of this email, we are pleased to offer you the opportunity to address the comments from the reviewers in a revision that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments with our Academic Editor aiming to avoid further rounds of peer-review, although we might need to consult with the reviewers, depending on the nature of the revisions. We expect to receive your revised manuscript within 1 month. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we withdraw the manuscript. IMPORTANT - SUBMITTING YOUR REVISION Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript: 1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript. NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually. You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response. 2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted " file type. Resubmission Checklist* When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_Checklist To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record. Please make sure to read the following important policies and guidelines while preparing your revision: Published Peer Review Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details: https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/ PLOS Data Policy Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5 Blot and Gel Data Policy We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements Protocols deposition To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Roli Roberts Roland Roberts, PhD Senior Editor PLOS Biology rroberts@plos.org ---------------------------------------------------------------- REVIEWERS' COMMENTS: Reviewer #1: [identifies herself as Jennifer A. Byrne] This manuscript describes an important problem, namely the possible exploitation of NHANES-based research data for the scaled production of low value research manuscripts. The manuscript describes a steep recent rise in the number of articles published using NHANES data. It is important for these results to be communicated rapidly, as it would seem that the numbers of such manuscripts could continue to rise, possibly overwhelming editorial and peer review processes at some journals. I should specify that my expertise lies outside the types of statistical analyses performed, so I wasn't able to critically evaluate all the data. Major issues: Page 6: I strongly suggest to not cite the studies listed in Tables 2 and 3, but to instead use PubMed ID's within Tables and text. Paper mills are likely to value to citations to their work, so citing many problematic papers inadvertently supports this model. Page 10: "The challenges analysed here are different to those for manufactured manuscripts using falsified data"- this warrants further discussion. Arguably, some challenges are the same, ie fabricated and genuine yet low value/derivative manuscripts can both report unreliable results, both classes of manuscripts waste editor and peer reviewer time. It would be worth explaining the similarities and differences between manuscripts generated from AI-ready datasets versus those that are entirely manufactured from fabricated data, so that any specific issues are clearer. Page 10: "dedicated statistical reviewers"- many journals would indeed benefit from statistics reviewers, however, their time could be easily consumed by low value manuscripts. The authors then call for better manuscript screening processes, which are essential for expert reviewers to focus attention on quality submissions. This point should be expanded, eg by referencing recent publications https://doi.org/10.1210/clinem/dgaf036 (this very recent editorial mentions similar manuscripts to those described here), also https://lipidworld.biomedcentral.com/articles/10.1186/s12944-024-02284-w. More emphasis should be placed on the importance of desk rejections to save editorial and peer reviewer time, as per https://doi.org/10.1210/clinem/dgaf036. Stender and colleagues have proposed a peer reviewer template to recommend rejection of low value 2SMR manuscripts- this could be built upon to provide similar resources for reviewers of NHANES-based manuscripts. This would seem to be a valuable addition. Minor issues: Page 1: Manuscript subtitle "Unethical research practices"- could this be "Questionable", "Problematic"? Research ethics doesn't seem to be a focus of the manuscript. Page 2: The abstract is quite long and includes some details that could be omitted. Figure 1: The authors could check the n values shown, eg 426-6 does not equal 417. Figure 1: legend refers to a "systematic review", whereas the text refers to "meta-analysis" (lines 124, 221, 307). Figure 2: The legend does not describe the colours used, some of which are used more than once. Are the colours significant? Page 5: "chow test" (used twice), "Chi-square test" are not mentioned in the Methods. Page 5: "biobank" is used as a keyword to show recent increases in data-driven research (Figure 3B), yet most human health biobanks provide biospecimens and associated data, and hence support laboratory research that's not primarily data-driven. The authors could therefore rethink the use of "biobank" as a control keyword. Figure 3: Colours used are not defined. Figure 5: Suggest "Publication count" for Y axis, also "publications" in legend. Page 8: Suggest inserting a paragraph break at line 194- paragraph is currently quite long. Page 10: "often peer reviewers will be the last"- this is an important point, however the cited reference from 2008 would have been unlikely to specifically refer to paper mills. This sentence could be reworded for clarity. Page 10: "early warning lists"- it's unclear what's intended here. Some readers will associate this term with the CAS journal early warning lists, eg: https://ewl.fenqubiao.com/#/en/early-warning-journal-list-2024. Reviewer #2: [identifies himself as Nikolaus C Netzer MD PhD] I completely support this manuscript by all means, agree in all points with the authors and think it is extremely important to get the message out. I have a few minor requests based on my own experience with the topic. 1. I know that GB is not a part of the EU anymore, still research strings are tight. I would shortly discuss in the paper the EU AI in research act. Especially since some influential economical experts recently criticize the EU for being to restrictive with AI regulations and that it would therefore harm the economical and scientific progress and success of EU countries. I think this critizism is only money driven and lacks any ethical aspects of the topic (Netzer NC. Artificial intelligence - the Janus-faced tool in our hands. Sleep Breath. 2024 Oct;28(5):1861-1862. doi: 10.1007/s11325-024-03129-7. Epub 2024 Aug 5. PMID: 39098968; PMCID: PMC11449974; https//research and innovationec.europa-eu./research area/industrial research and innovation/ Artificial Intelligence (AI) in Science - European Commission (europa.eu) ). 2. The problem with single parameter correlations is not only that they might deliver false results. I found a MR NHANES manuscript as reviewer, where the AI hallucinated by simply turning the hypothesis around from two single parameters. The first result was NHANES data of depression in subjects correlated with data of sleep apnea and the conclusion of the paper (the AI) was sleep apnea leads to depression. That's scientifically not proven (most patients have daytime sleepiness, but do not develop a depression from that) but possible. Then I got the same manuscript again to review with the authors in reversed order, turning the conclusion around (same correlation), that depression leads to sleep apnea. Physiologically and pathophysiologically total nonsense. I described that problem in the above listed editorial in Sleep and Breathing. Maybe you want to build that problem with reversed single parameter correlation into your arguments respectively discussion or introduction. Reviewer #3: [identifies herself as Dorothy Bishop] Summary After a move to encourage researchers to adopt open data, it is becoming clear that unintended negative consequence is data dredging of open datasets that leads to pointless analyses being published. The authors demonstrate this phenomenon with the NHANES dataset, showing that there has been a rapid growth in papers reporting associations between single predictors and single outcomes. A high proportion of the recent paper originate from China. The authors argue that this research is flawed because it does not take into account the multifactorial origins of most health conditions, and that AI-ready datasets may be misused by paper mills. They conclude with some recommendations for overcoming this situation. Overall evaluation The authors have conducted a systematic analysis of papers from the NHANES dataset, and provide useful empirical evidence to support the view that single-factor analyses have skyrocketed in recent years. This is useful but I see two aspects that I am not convinced by and that I'd like to see addressed: (a) the need for multifactor analyses and (b) the involvement of paper mills. In addition, I think the recommendations could be a lot stronger. Multifactor analysis While I agree that single factor accounts of most health conditions are implausible, I'm not sure the problem identified here would be solved by researchers adopting multi-factorial analyses. It could be argued that it would just make the problem worse, given that the number of potential associations increases multiplicatively as more factors are included in the analysis. In other words, unless analyses are pre-registered and theoretically or empirically motivated, encouraging authors to just include yet more variables in the analysis will make matters worse rather than better, because the chance of false positive findings increases dramatically. So I think the focus on single vs multiple factors is a bit misleading here. Involvement of paper mills Paper mills were mentioned at several points during the article, but no evidence was provided for their involvement in the explosion of single factor papers. Given that these mostly originate in China, where the incentives for publishing in Western journals have changed dramatically, it's not clear that we are necessarily seeing the result of paper mill activities. See this article that discusses how incentives can lead to misconduct but which does not feature paper mills as a mechanism: Zhang, X., & Wang, P. (2024). Research misconduct in China: Towards an institutional analysis. Research Ethics, 17470161241247720. https://doi.org/10.1177/17470161241247720 Diagnosing paper mills is not a straightforward business, and I can see it would be beyond the scope of this article to attempt to do such an analysis. Nevertheless, it might be feasible to look at two indicators: the geographical diversity of authors, and the use of non-institutional emails, both of which have been identified as red flags for paper mills: Van Noorden, R. (2023). How big is science's fake-paper problem? Nature, 623(7987), 466-467. https://doi.org/10.1038/d41586-023-03464-x. Alternatively, the paper could be rewritten to make it clear that the statements about paper mills are conjecture and not supported by evidence. Recommendations for action An additional recommendation would be to provide access to just half of the dataset, which could be used for exploratory analysis. The researcher would then have to show that the results replicated in the other half - a kind of cross-validation sample. Also, the OpenSafely model used for accessing NHS data provides good protection against data dredging: Nab, L., Schaffer, A. L., Hulme, W., DeVito, N. J., Dillingham, I., Wiedemann, M., Andrews, C. D., Curtis, H., Fisher, L., Green, A., Massey, J., Walters, C. E., Higgins, R., Cunningham, C., Morley, J., Mehrkar, A., Hart, L., Davy, S., Evans, D., … Goldacre, B. (2024). OpenSAFELY: A platform for analysing electronic health records designed for reproducible research. Pharmacoepidemiology and Drug Safety, 33(6), e5815. https://doi.org/10.1002/pds.5815 Requiring authors to pre-register their research hypothesis and making data access contingent on this sounds similar to the option 2 discussed by the authors, but I was not clear how far that required explicit specification of a hypothesis. In genetics, where data dredging led to a decade or so of non-replicable candidate gene studies, it became mandatory to replicate the results in a fresh sample. This is similar to the cross-validation idea above, but may be more suited to a case where the sample size is small. The kind of rapid growth in a methodology described here is reminiscent of what has been reported in other areas: Stender, S., Gellert-Kristensen, H., & Smith, G. D. (2024). Reclaiming mendelian randomization from the deluge of papers and misleading findings. Lipids in Health and Disease, 23(1), 286. https://doi.org/10.1186/s12944-024-02284-w. Those authors had pretty stringent recommendations for journal editors that might be applicable also in this case. More minor points p 2 line 26, "Second,..." this is not a sentence; just needs rewriting line 108. I wasn't sure what was meant by the sentence starting "There was a substantial.." para using Chow test and chi square test on p 5. The authors have criticised others for reporting p-values after data dredging, but the statistical results reported here have a similar flavour. That is, it is not clear that there was a hypothesis, in which case p-values seem inappropriate. I think the trends are very obvious in any case and that purely descriptive data would be fine here. p 6, line 146. I was puzzled as to how the FDR correction was applied. This eventually became clear after reading the methods. It would make more sense if the Methods preceded the Results, if the journal allows this. In any case, it seems that the use of FDR correction involves some strong assumptions that may not be justified. One issue is that we don't know how many tests were conducted by went unreported. And as the authors note later on, determining the number of potential hypotheses is not trivial. This is one reason why I think the key issue is not so much whether you have single factor or multi factor analyses, but rather whether you have clearly defined a priori hypotheses. p 8 line 172. Again, I don't like referring to a "statistically significant increase" when no hypothesis has been stated. Sorry for being pedantic, but I'd really prefer to just have a descriptive statement here, maybe referring to a N-fold increase. p 8 line 181. para on single-factor design. This is where I think you need to be careful to not encourage multifactor data dredging. A multifactor analysis should be motivated either by reference to prior literature and/or by theoretical considerations; just throwing lots of variables into the analysis would make matters worse I think. p 10 ;I suggest new header at end of para 1. Reviewer #4: [identifies himself as Stefan Stender] This study analyzed the rapid growth in publications stemming from single-factor association analyses of the publicly available National Health and Nutrition Examination Survey (NHANES) cohort. A literature search using a set of criteria identified 341 NHANES-derived papers from 2014-2024, each proposing an association between a predictor and a health condition. The number of papers per year is rising rapidly in recent years. The authors find evidence of data dredging and hypothesizing after results are known ('HARKI'ing'). They highlight the potential of these formulatic NHANES-papers being made (wholly or partly) by AI-methods and/or paper-mills. Finally, the authors describe a set of best practices to address the concern of this type paper flooding the literature. The study deals with an important and very timely topic. The paper is well written. The findings and arguments are compelling. I have a few minor comments: 1) The search criteria are quite specific: 'NHANES AND (correlation* OR association*) AND (cross-sectional OR population)' I realize this is probably required to limit the number of hits. However, I think it's worth reporting the total number of NHANES papers identified just with simple search 'NHANES' by year: https://pubmed.ncbi.nlm.nih.gov/?term=NHANES&sort=pubdate&timeline=expanded This is striking. The number per year was rising steadily from 2000-2017, plateauing from 2018 to 2023 at around 4700 per year, and then spiking to N=7818 in 2024! After 60 days of 2025 we are already at N=1485. So, the projected number of NHANES papers for 2025 is probably > 10000! In other words, there is evidence that the explosion in NHANES papers is on a much larger scale than the numbers reported from the articles identified by the authors using the narrow search criteria. I think this is worth pointing out. 2) The issue with NHANES papers was to my knowledge first noted in an editorial in 2024: https://link.springer.com/article/10.1007/s11325-024-03129-7 This editorial should be cited. 3) A very similar issue is plaguing the field of Mendelian randomization (MR), where we see an explosion in number of papers using publicly available summary data in recent years. This explosion is driven by papers based on two-samples MR (2SMR), where both the exposure and outcome data are based on summary statistics from published GWAS. The rise is entirely driven by papers from China, and there is evidence that some are being produced by paper-mill like factories and using AI. The mass-produced 2SMR has been the focus of several recent editorials and comments: PMID: 39244551, PMID: 39311417, PMID: 39407214. It would be relevant to highlight this analogous situation in the discussion of the present manuscript. https://doi.org/10.1371/journal.pbio.3003152.r004
Revision 2
10 Mar 2025 Author Response Attachments Attachment Submitted filename: ReviewResponse_PBIOLOGY-D-25-00160R1_5Mar2025.pdf https://doi.org/10.1371/journal.pbio.3003152.r005
19 Mar 2025 Decision Letter - Roland Roberts, Editor Dear Dr Spick, Thank you for your patience while we considered your revised manuscript "Analysis of NHANES-based research identifies risks of data dredging, false discoveries and misleading findings" for publication as a Research Article at PLOS Biology. This revised version of your manuscript has been evaluated by the PLOS Biology editors and the Academic Editor. Based on our Academic Editor's assessment of your revision, we are likely to accept this manuscript for publication, provided you satisfactorily address the following data and other policy-related requests. IMPORTANT - please attend to the following: a) Please change your Title to something that brings the importance oft the advance more to the fore. We suggest: "Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database" b) Please address my Data Policy requests below; specifically, we need you to supply the numerical values underlying Figs 3AB, 4, 5, either as a supplementary data file or as a permanent DOI’d deposition. c) Please cite the location of the data clearly in all relevant main and supplementary Figure legends, e.g. “The data underlying this Figure can be found in S1 Data” or “The data underlying this Figure can be found in https://zenodo.org/records/XXXXXXXX d) Please make any custom code available, either as a supplementary file or as part of your data deposition. As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript. In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly. We expect to receive your revised manuscript within two weeks. To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following: - a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list - a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.) - a track-changes file indicating any changes that you have made to the manuscript. NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines: https://journals.plos.org/plosbiology/s/supporting-information Published Peer Review History Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details: https://plos.org/published-peer-review-history/ Press Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article. Protocols deposition To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please do not hesitate to contact me should you have any questions. Sincerely, Roli Roberts Roland Roberts, PhD Senior Editor rroberts@plos.org PLOS Biology ------------------------------------------------------------------------ DATA POLICY: You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797 Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms: 1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore). 2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication. Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: Figs 3AB, 4, 5. NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values). IMPORTANT: Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend. Please ensure that your Data Statement in the submission system accurately describes where your data can be found. ------------------------------------------------------------------------ CODE POLICY Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found. Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content ------------------------------------------------------------------------ DATA NOT SHOWN? - Please note that per journal policy, we do not allow the mention of "data not shown", "personal communication", "manuscript in preparation" or other references to data that is not publicly available or contained within this manuscript. Please either remove mention of these data or provide figures presenting the results and the data underlying the figure(s). ------------------------------------------------------------------------ https://doi.org/10.1371/journal.pbio.3003152.r006
Revision 3
4 Apr 2025 Author Response Attachments Attachment Submitted filename: ReviewResponse_PBIOLOGY-D-25-00160R2_22Mar2025.pdf https://doi.org/10.1371/journal.pbio.3003152.r007
4 Apr 2025 Decision Letter - Roland Roberts, Editor Dear Matt, Thank you for the submission of your revised Meta-Research Article "Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, Marcus Munafo, I'm pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes. Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process. PRESS: We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf. We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/. Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. Best wishes, Roli Roland G Roberts, PhD, PhD Senior Editor PLOS Biology rroberts@plos.org https://doi.org/10.1371/journal.pbio.3003152.r008

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .