Advertisement

Data Availability

Introduction

PLOS journals require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication. When specific legal or ethical restrictions prohibit public sharing of a data set, authors must indicate how others may obtain access to the data.

When submitting a manuscript, authors must provide a Data Availability Statement describing compliance with PLOS' data policy. If the article is accepted for publication, the Data Availability Statement will be published as part of the article.

Acceptable data sharing methods are listed below, accompanied by guidance for authors as to what must be included in their Data Availability Statement and how to follow best practices in research reporting.

PLOS believes that sharing data fosters scientific progress. Data availability allows and facilitates:

  • Validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses;
  • Reproducibility of research;
  • Efforts to ensure data are archived, increasing the value of the investment made in funding scientific research;
  • Reduction of the burden on authors in preserving and finding old data, and managing data access requests;
  • Citation and linking of research data and their associated articles, enhancing visibility and ensuring recognition for authors, data producers and curators.

Publication is conditional on compliance with this policy. If restrictions on access to data come to light after publication, we reserve the right to post a Correction, an Editorial Expression of Concern, contact the authors' institutions and funders, or, in extreme cases, retract the publication.

 

Minimal Data Set Definition

Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods. Additionally, PLOS requires that authors comply with field-specific standards for preparation, recording, and deposition of data when applicable.

Related policy:  Materials and Software Sharing

For example, authors should submit the following data:

  • The values behind the means, standard deviations and other measures reported;
  • The values used to build graphs;
  • The points extracted from images for analysis.

Authors do not need to submit their entire data set if only a portion of the data was used in the reported study. Also, authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed.

PLOS does not permit references to “data not shown.” Authors should deposit relevant data in a public data repository or provide the data in the manuscript.

We require authors to provide sample image data in support of all reported results (e.g. for immunohistochemistry images, fMRI images, etc.), either with the submission files or in a public repository.

 

Acceptable Data Sharing Methods

Deposition within data repository (strongly recommended)

All data and related metadata underlying reported findings should be deposited in appropriate public data repositories, unless already provided as part of a submitted article. Repositories may be either subject-specific repositories that accept specific types of structured data, or cross-disciplinary generalist repositories that accept multiple data types.

If field-specific standards for data deposition exist, PLOS requires authors to comply with these standards. Authors should select repositories appropriate to their field of study (for example, ArrayExpress or GEO for microarray data; GenBank, EMBL, or DDBJ for gene sequences).

The Data Availability Statement must list the name of the repository or repositories as well as digital object identifiers (DOIs), accession numbers or codes, or other persistent identifiers for all relevant data.

Additional guidance on acceptable repositories can be found here.

Data citation

PLOS encourages authors to cite any publicly available research data in their reference list. References to data sets (data citations) must include a persistent identifier (such as a DOI). Citations of data sets, when they appear in the reference list, should include the minimum information recommended by DataCite and follow journal style.

Example:
Andrikou C, Thiel D, Ruiz-Santiesteban JA, Hejnol A. Active mode of excretion across digestive tissues predates the origin of excretory organs. 2019. Dryad Digital Repository. https://doi.org/10.5061/dryad.bq068jr.

PLOS supports the data citation roadmap for scientific publishers developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11 and the NIH BioCADDIE program.

Data in Supporting Information files

Although authors are encouraged to directly deposit data in appropriate repositories, data can be included in Supporting Information files. When including data in Supporting Information files, authors should submit data in file formats that are standard in their field and allow wide dissemination. If there are currently no standards in the field, authors should maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDFs or images when providing tabulated data).

Upon publication, PLOS uploads all Supporting Information files associated with an article to the figshare repository to increase compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).

Supporting Information files are published exactly as provided and are not copyedited. Each file should be less than 20 MB.

Data Management Plans

Some funding agencies have policies on the preparation and sharing of Data Management Plans (DMPs), and authors who receive funding from some agencies may be required to prepare DMPs as a condition of grants.

PLOS encourages authors to prepare DMPs before conducting their research and encourages authors to make those plans available to editors, reviewers and readers who wish to assess them.

The following resources may also be consulted for guidance on DMPs:

 

Acceptable Data Access Restrictions

PLOS recognizes that, in some instances, authors may not be able to make their underlying data set publicly available for legal or ethical reasons. This data policy does not overrule local regulations, legislation or ethical frameworks. Where these frameworks prevent or limit data release, authors must make these limitations clear in the Data Availability Statement at the time of submission. Acceptable restrictions on public data sharing are detailed below.

Please note it is not acceptable for an author to be the sole named individual responsible for ensuring data access.

Third-party data

For studies involving third-party data, we encourage authors to share any data specific to their analyses that they can legally distribute. PLOS recognizes, however, that authors may be using third-party data they do not have the rights to share. When third-party data cannot be publicly shared, authors must provide all information necessary for interested researchers to apply to gain access to the data.

For any third-party data that the authors cannot legally distribute, they should include the following information in their Data Availability Statement upon submission:
  • A description of the data set and the third-party source
  • If applicable, verification of permission to use the data set
  • All necessary contact information others would need to apply to gain access to the data

Authors should properly cite and acknowledge the data source in the manuscript. Please note, if data have been obtained from a third-party source, we require that other researchers would be able to access the data set in the same manner as the authors.

Human research participant data and other sensitive data

For studies involving human research participant data or other sensitive data, we encourage authors to share de-identified or anonymized data. However, when data cannot be publicly shared, we allow authors to make their data sets available upon request.

If there are ethical or legal restrictions on sharing a sensitive data set, authors should provide the following information within their Data Availability Statement upon submission:
  • Explain the restrictions in detail (e.g., data contain potentially identifying or sensitive patient information)
  • Provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent

General guidelines for human research participant data

Prior to sharing human research participant data, authors should consult with an ethics committee to ensure data are shared in accordance with participant consent and all applicable local laws.

Data sharing should never compromise participant privacy. It is therefore not appropriate to publicly share personally identifiable data on human research participants. The following are examples of data that should not be shared: 

  • Name, initials, physical address
  • Internet protocol (IP) address
  • Specific dates (birth dates, death dates, examination dates, etc.)
  • Contact information such as phone number or email address
  • Location data

Data that are not directly identifying may also be inappropriate to share, as in combination they can become identifying. For example, data collected from a small group of participants, vulnerable populations, or private groups should not be shared if they involve indirect identifiers (such as sex, ethnicity, location, etc.) that may risk the identification of study participants. 

Steps necessary to protect privacy may include de-identifying data, adding noise, or blocking portions of the database. Where this is not possible, data sharing could be restricted by license agreements directed specifically at privacy concerns. Additional guidance on preparing human research participant data for publication, including information on how to properly de-identify these data, can be found here:

 The following resources may also be consulted for guidance on sharing human research participant data:

Guidelines for qualitative data

For studies analyzing data collected as part of qualitative research, authors should make excerpts of the transcripts relevant to the study available in an appropriate data repository, within the paper, or upon request if they cannot be shared publicly. If even sharing excerpts would violate the agreement to which the participants consented, authors should explain this restriction and what data they are able to share in their Data Availability Statement.

See the Qualitative Data Repository for more information about managing and depositing qualitative data.

Other sensitive data

Some data that do not describe human research participants may also be sensitive and inappropriate to share. For studies analyzing other types of sensitive data, authors should share data as appropriate after consulting established field guidelines and all applicable local laws. Examples of sensitive data that may be subject to restrictions include, but are not limited to, data from field studies in protected areas, locations of sensitive archaeological sites, and locations of endangered or threatened species.

Additional help

Please contact the journal office (plospathogens@plos.org) if:

  • You have concerns about the ethics or legality of sharing your data
  • Your institution does not have an established point of contact to field external requests for access to sensitive data
  • You feel unable to share data for reasons not specified above

 

 

Unacceptable Data Access Restrictions

PLOS journals will not consider manuscripts for which the following factors influence authors’ ability to share data:

  • Authors will not share data because of personal interests, such as patents or potential future publications.
  • The conclusions depend solely on the analysis of proprietary data. We consider proprietary data to be data owned by individuals, organizations, funders, institutions, commercial interests, or other parties that the data owners will not share. If proprietary data are used and cannot be accessed by others in the same manner by which the authors obtained them, the manuscript must include an analysis of publicly available data that validates the study’s conclusions so that others can reproduce the analysis and build on the study’s findings.

 

FAQs

General questions

Why do we not allow an author to be the only point of contact for fielding requests for access to restricted data?

When possible, we recommend authors deposit restricted data to a repository that allows for controlled data access. If this is not possible, directing data requests to a non-author institutional point of contact, such as a data access or ethics committee, helps guarantee long term stability and availability of data. Providing interested researchers with a durable point of contact ensures data will be accessible even if an author changes email addresses, institutions, or becomes unavailable to answer requests.

When was the current data policy implemented?

The data policy was implemented on March 3, 2014. Any paper submitted before that date will not have a Data Availability Statement. For all manuscripts submitted or published before this date, data must be made available upon reasonable request.

Depositing data

What if I cannot provide accession numbers or DOIs for my data set at submission?

Authors may submit their manuscript and include placeholder language in their Data Availability Statement indicating that accession numbers and/or DOIs will be made available after acceptance. The journal office will contact authors prior to publication to ask for this information and will hold the paper until it is received.

Providing private data access to reviewers and editors during the peer review process is acceptable. Many repositories permit private access for review purposes, and have policies for public release at publication.

Is PLOS integrated with any repositories?

PLOS partners with repositories to support data sharing and compliance with the PLOS data policy. Our submission system is integrated with partner repositories to ensure that the article and its underlying data are paired, published together and linked. Current partners include Dryad and FlowRepository.

Partner repositories may have a data submission fee. PLOS is not able to cover this fee and authors are under no obligation to use any specific repository. PLOS does not gain financially from our association with any integrated partners.

Additionally, PLOS uploads all Supporting Information files associated with an article to the figshare repository to increase compliance with the FAIR principles (Findable, Accessible, Interoperable, Reusable).

How do I deposit data with a data repository integration partner?

When authors deposit data in the integrated repository, they receive a provisional data set DOI along with a private reviewer URL link. Upon submission to PLOS, authors should include the data set DOI in the Data Availability Statement. They should also provide the reviewer URL, which will permit restricted access to the data during peer review. If a manuscript is editorially accepted by a PLOS journal, the publication of the article and public release of the data set will be automatically coordinated.

I cannot afford the cost of depositing a large amount of data. What should I do?

PLOS encourages authors to investigate all options and to contact their institutions if they have difficulty providing access to the data underlying the research. There are several repositories recommended by PLOS that specialize in handling large data sets.

What are acceptable licenses for my data deposition?

If authors use repositories with stated licensing policies, the policies should not be more restrictive than the Creative Commons Attribution (CC BY) license.

 

PLOS Data Advisory Board

PLOS has formed an external board of advisors across many fields of research published in PLOS journals. This board will work with us to develop community standards for data sharing across various fields, provide input and advice on especially complex data-sharing situations submitted to the journals, define data-sharing compliance, and proactively work to refine our policy. If you have any questions or feedback, we welcome you to write to us at data@plos.org.

Greg Barsh
Philip Bourne
Jake Carlson
Bob Cook
Andy Farke
Paul Gardner
Sam Gilbert
Carole Goble
Melissa Haendel
Amy Hodge
Lisa Johnston
James Kazura

​Stephen Koslow
Sune Lehmann
Michael Lichten
Christopher Lortie
Malcolm Macleod
Harmit Malik
Jo McEntyre
James Meador
Gregory Petsko
William Phillips
Andreas Prlić
José Ramasco

Marc Robinson-Rechavi
Hank Seifert
Ida Sim
Patricia Soranno
Jason Swedlow
Hua Tang
Brett Tyler
Ethan White
Jelte Wicherts
James Wilson
Yu-Feng Zang
Jean Zenklusen

Webpage reorganized on December 5, 2019. The substance of the data policy has not changed unless otherwise noted.

Give Feedback