Data Availability
Introduction
PLOS journals require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication. When specific legal or ethical restrictions prohibit public sharing of a data set, authors must indicate how others may obtain access to the data.
When submitting a manuscript, authors must provide a Data Availability Statement describing compliance with PLOS' data policy. If the article is accepted for publication, the Data Availability Statement will be published as part of the article.
Acceptable data sharing methods are listed below, accompanied by guidance for authors as to what must be included in their Data Availability Statement and how to follow best practices in research reporting.
PLOS believes that sharing data fosters scientific progress. Data availability allows and facilitates:
- Validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses;
- Reproducibility of research;
- Efforts to ensure data are archived, increasing the value of the investment made in funding scientific research;
- Reduction of the burden on authors in preserving and finding old data, and managing data access requests;
- Citation and linking of research data and their associated articles, enhancing visibility and ensuring recognition for authors, data producers and curators.
Publication is conditional on compliance with this policy. If restrictions on access to data come to light after publication, we reserve the right to post a Correction, an Editorial Expression of Concern, contact the authors' institutions and funders, or, in extreme cases, retract the publication.
PLOS journals highlight repository use with our Accessible Data icon, an experimental feature that appears on any research article with a link to selected repositories in its Data Availability Statement.
Minimal Data Set Definition
Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods. Additionally, PLOS requires that authors comply with field-specific standards for preparation, recording, and deposition of data when applicable.
For example, authors should submit the following data:
- The values behind the means, standard deviations and other measures reported;
- The values used to build graphs;
- The points extracted from images for analysis.
Authors do not need to submit their entire data set if only a portion of the data was used in the reported study. Also, authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed.
PLOS does not permit references to “data not shown.” Authors should deposit relevant data in a public data repository or provide the data in the manuscript.
We require authors to provide sample image data in support of all reported results (e.g. for immunohistochemistry images, fMRI images, etc.), either with the submission files or in a public repository.
For manuscripts submitted to PLOS Biology, PLOS ONE, PLOS Climate, PLOS Water, PLOS Global Public Health or PLOS Mental Health on or after July 1, 2019, authors must provide original, uncropped and minimally adjusted images supporting all blot and gel results reported in the article’s figures and Supporting Information files. Whilst it is not necessary to provide original images at time of initial submission, we will require these files during the peer review process or before a submission can be accepted for publication.
When reviewing concerns arising after publication in relation to images shown, we may request available underlying data for any image files depicted in the article, as needed to resolve the concern(s).
Acceptable Data Sharing Methods
Deposition within data repository (strongly recommended)
All data and related metadata underlying reported findings should be deposited in appropriate public data repositories, unless already provided as part of a submitted article. Repositories may be either subject-specific repositories that accept specific types of structured data, or cross-disciplinary generalist repositories that accept multiple data types.
If field-specific standards for data deposition exist, PLOS requires authors to comply with these standards. Authors should select repositories appropriate to their field of study (for example, ArrayExpress or GEO for microarray data; GenBank, EMBL, or DDBJ for gene sequences).
The Data Availability Statement must list the name of the repository or repositories as well as digital object identifiers (DOIs), accession numbers or codes, or other persistent identifiers for all relevant data. Authors should also provide licensing information, where available.
Data citation
PLOS encourages authors to cite any publicly available research data in their reference list. References to data sets (data citations) must include a persistent identifier (such as a DOI). Citations of data sets, when they appear in the reference list, should include the minimum information recommended by DataCite and follow journal style.
Example:
Andrikou C, Thiel D, Ruiz-Santiesteban JA, Hejnol A. Active mode of excretion across digestive tissues predates the origin of excretory organs. 2019. Dryad Digital Repository. https://doi.org/10.5061/dryad.bq068jr.
PLOS supports the data citation roadmap for scientific publishers developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11 and the NIH BioCADDIE program.
Data in Supporting Information files
Although authors are encouraged to directly deposit data in appropriate repositories, data can be included in Supporting Information files. When including data in Supporting Information files, authors should submit data in file formats that are standard in their field and allow wide dissemination. If there are currently no standards in the field, authors should maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDFs or images when providing tabulated data).
Upon publication, PLOS uploads all Supporting Information files associated with an article to the figshare repository to increase compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).
Supporting Information files are published exactly as provided and are not copyedited. Each file should be less than 20 MB.
Data Management Plans
Some funding agencies have policies on the preparation and sharing of Data Management Plans (DMPs), and authors who receive funding from some agencies may be required to prepare DMPs as a condition of grants.
PLOS encourages authors to prepare DMPs before conducting their research and encourages authors to make those plans available to editors, reviewers and readers who wish to assess them.
The following resources may also be consulted for guidance on DMPs:
- Funders and institutions
- Digital Curation Centre
- DMPTool
- Data Stewardship Wizard
Acceptable Data Access Restrictions
PLOS recognizes that, in some instances, authors may not be able to make their underlying data set publicly available for legal or ethical reasons. This data policy does not overrule local regulations, legislation or ethical frameworks. Where these frameworks prevent or limit data release, authors must make these limitations clear in the Data Availability Statement at the time of submission. Acceptable restrictions on public data sharing are detailed below.
Please note it is not acceptable for an author to be the sole named individual responsible for ensuring data access.
Third-party data
For studies involving third-party data, we encourage authors to share any data specific to their analyses that they can legally distribute. PLOS recognizes, however, that authors may be using third-party data they do not have the rights to share. When third-party data cannot be publicly shared, authors must provide all information necessary for interested researchers to apply to gain access to the data.
- A description of the data set and the third-party source
- If applicable, verification of permission to use the data set
- All necessary contact information others would need to apply to gain access to the data
Authors should properly cite and acknowledge the data source in the manuscript. Please note, if data have been obtained from a third-party source, we require that other researchers would be able to access the data set in the same manner as the authors.
Human research participant data and other sensitive data
For studies involving human research participant data or other sensitive data, we encourage authors to share de-identified or anonymized data. However, when data cannot be publicly shared, we allow authors to make their data sets available upon request.
- Explain the restrictions in detail (e.g., data contain potentially identifying or sensitive patient information)
- Provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent
General guidelines for human research participant data
Prior to sharing human research participant data, authors should consult with an ethics committee to ensure data are shared in accordance with participant consent and all applicable local laws.
Data sharing should never compromise participant privacy. It is therefore not appropriate to publicly share personally identifiable data on human research participants. The following are examples of data that should not be shared:
- Name, initials, physical address
- Internet protocol (IP) address
- Specific dates (birth dates, death dates, examination dates, etc.)
- Contact information such as phone number or email address
- Location data
Data that are not directly identifying may also be inappropriate to share, as in combination they can become identifying. For example, data collected from a small group of participants, vulnerable populations, or private groups should not be shared if they involve indirect identifiers (such as sex, ethnicity, location, etc.) that may risk the identification of study participants.
Steps necessary to protect privacy may include de-identifying data, adding noise, or blocking portions of the database. Where this is not possible, data sharing could be restricted by license agreements directed specifically at privacy concerns. Additional guidance on preparing human research participant data for publication, including information on how to properly de-identify these data, can be found here:
The following resources may also be consulted for guidance on sharing human research participant data:
- Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk
- European Medicines Agency: Publication and access to clinical-trial data
- US National Institutes of Health: Protecting the Rights and Privacy of Human Subjects
- Canadian Institutes of Health Research Best Practices for Protecting Privacy in Health Research
- UK Data Archive: Anonymisation Overview
- Australian National Data Service: Ethics, Consent and Data Sharing
Guidelines for qualitative data
For studies analyzing data collected as part of qualitative research, authors should make excerpts of the transcripts relevant to the study available in an appropriate data repository, within the paper, or upon request if they cannot be shared publicly. If even sharing excerpts would violate the agreement to which the participants consented, authors should explain this restriction and what data they are able to share in their Data Availability Statement.
See the Qualitative Data Repository for more information about managing and depositing qualitative data.
Other sensitive data
Some data that do not describe human research participants may also be sensitive and inappropriate to share. For studies analyzing other types of sensitive data, authors should share data as appropriate after consulting established field guidelines and all applicable local laws. Examples of sensitive data that may be subject to restrictions include, but are not limited to, data from field studies in protected areas, locations of sensitive archaeological sites, and locations of endangered or threatened species.
Additional help
Please contact the journal office (globalpubhealth@plos.org) if:
- You have concerns about the ethics or legality of sharing your data
- Your institution does not have an established point of contact to field external requests for access to sensitive data
- You feel unable to share data for reasons not specified above
Unacceptable Data Access Restrictions
PLOS journals will not consider manuscripts for which the following factors influence authors’ ability to share data:
- Authors will not share data because of personal interests, such as patents or potential future publications.
- The conclusions depend solely on the analysis of proprietary data. We consider proprietary data to be data owned by individuals, organizations, funders, institutions, commercial interests, or other parties that the data owners will not share. If proprietary data are used and cannot be accessed by others in the same manner by which the authors obtained them, the manuscript must include an analysis of publicly available data that validates the study’s conclusions so that others can reproduce the analysis and build on the study’s findings.
FAQs
General questions
Why do we not allow an author to be the only point of contact for fielding requests for access to restricted data?
When possible, we recommend authors deposit restricted data to a repository that allows for controlled data access. If this is not possible, directing data requests to a non-author institutional point of contact, such as a data access or ethics committee, helps guarantee long term stability and availability of data. Providing interested researchers with a durable point of contact ensures data will be accessible even if an author changes email addresses, institutions, or becomes unavailable to answer requests.
When was the current data policy implemented?
The data policy was implemented on March 3, 2014. Any paper submitted before that date will not have a Data Availability Statement. For all manuscripts submitted or published before this date, data must be made available upon reasonable request.
What if my article does not contain any data?
All articles must include a Data Availability Statement but some submissions, such as Registered Report Protocols and Lab or Study Protocol articles, may not contain data. For manuscripts that do not report data, authors must state in their Data Availability Statement that their article does not report data and the data availability policy is not applicable to their article.
Depositing data
What if I cannot provide accession numbers or DOIs for my data set at submission?
Authors may submit their manuscript and include placeholder language in their Data Availability Statement indicating that accession numbers and/or DOIs will be made available after acceptance. The journal office will contact authors prior to publication to ask for this information and will hold the paper until it is received.
Providing private data access to reviewers and editors during the peer review process is acceptable. Many repositories permit private access for review purposes, and have policies for public release at publication.
Is PLOS integrated with any repositories?
PLOS partners with repositories to support data sharing and compliance with the PLOS data policy. Our submission system is integrated with partner repositories to ensure that the article and its underlying data are paired, published together and linked. Current partners include Dryad and FlowRepository.
Partner repositories may have a data submission fee. PLOS is not able to cover this fee and authors are under no obligation to use any specific repository. PLOS does not gain financially from our association with any integrated partners.
Additionally, PLOS uploads all Supporting Information files associated with an article to the figshare repository to increase compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).
How do I deposit data with a data repository integration partner?
When authors deposit data in the integrated repository, they receive a provisional data set DOI along with a private reviewer URL link. Upon submission to PLOS, authors should include the data set DOI in the Data Availability Statement. They should also provide the reviewer URL, which will permit restricted access to the data during peer review. If a manuscript is editorially accepted by a PLOS journal, the publication of the article and public release of the data set will be automatically coordinated.
I cannot afford the cost of depositing a large amount of data. What should I do?
PLOS encourages authors to investigate all options and to contact their institutions if they have difficulty providing access to the data underlying the research. There are several repositories recommended by PLOS that specialize in handling large data sets.
What are acceptable licenses for my data deposition?
If authors use repositories with stated licensing policies, the policies should not be more restrictive than the Creative Commons Attribution (CC BY) license.
PLOS Data Advisory Board
PLOS has formed an external board of advisors across many fields of research published in PLOS journals. This board will work with us to develop community standards for data sharing across various fields, provide input and advice on especially complex data-sharing situations submitted to the journals, define data-sharing compliance, and proactively work to refine our policy. If you have any questions or feedback, we welcome you to write to us at data@plos.org.
Greg Barsh |
Stephen Koslow |
Marc Robinson-Rechavi |