Advertisement

Data Availability

The following policy applies to all of PLOS journals, unless otherwise noted.

PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.

When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article.

Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.

Methods acceptable to PLOS journals with respect to data sharing are listed below, accompanied by guidance for authors as to what must be indicated in their data availability statement and how to follow best practices in reporting. If authors did not collect data themselves but used another source, this source must be credited as appropriate. Authors who have questions or difficulties with the policy, or readers who have difficulty accessing data, are encouraged to contact the relevant journal office or data@plos.org.

Acceptable Data-Sharing Methods

Data deposition (strongly recommended)

All data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository, unless already provided as part of the submitted article. Repositories may be either subject-specific (where these exist) and accept specific types of structured data, or generalist repositories that accept multiple data types, such as Dryad. Guidance on acceptable repositories is included below.

The Data Availability Statement must specify that data are deposited publicly and list the name(s) of repositories along with digital object identifiers or accession numbers for the relevant datasets. In some cases authors may not be able to obtain DOIs or accession numbers until the manuscript is accepted; in these cases, the authors must provide these numbers at acceptance. In all other cases, these numbers must be provided at submission.

Data in Supporting Information files

For smaller datasets and certain data types, authors may upload data as Supporting Information files accompanying the manuscript. (See also additional information regarding appropriate use of Supporting Information files.) Authors should take care to maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDF when providing tabulated data).

If data deposition or provision in Supporting Information is not ethical or legal (e.g., underlying data pose privacy or legal concerns, or include human participants), the following two methods may be acceptable alternatives, subject to case-by-case evaluation:

Data made available to all interested researchers upon request

The Data Availability Statement must specify “Data available on request” and identify the group to which requests should be submitted (e.g., a named data access committee or named ethics committee). The reasons for restrictions on public data deposition must also be specified. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

Data available from third party

In the case of a primary dataset that was not originally generated by the authors of the submitted manuscript, appropriate data sharing may require that interested researchers obtain third-party data independently from the named original source. In this case, the Data Availability Statement must state the source of the data with full citation and, if the dataset cannot be provided, indicate “Data available from (named source).” The reasons for restrictions on public data deposition must also be specified.

Unacceptable Data Access Restrictions

PLOS journals will not consider manuscripts for which the following factors influence ability to share data:

  • Authors will not share data because of personal interests, such as patents or potential future publications.
  • The conclusions depend solely on the analysis of proprietary data (e.g., data owned by commercial interests, or copyrighted data). If proprietary data are used, the manuscript must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.

Explanatory Notes and Guidance

A compilation of frequently asked questions about the PLOS Data Policy is available and is updated periodically.

Definition of data that must be shared

PLOS defines the “minimal dataset” to consist of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Core descriptive data, methods, and study results should be included within the main paper, regardless of data deposition. PLOS does not accept references to “data not shown”. Editors and reviewers may require particular data types for certain articles on a case-by-case basis. Authors who have datasets too large for sharing via repositories or uploaded files should contact the relevant journal for advice.

Guidance on data repositories

PLOS requires that authors comply with field-specific standards for preparation and recording of data and select repositories appropriate to their field, for example deposition of microarray data in ArrayExpress or GEO; deposition of gene sequences in GenBank, EMBL or DDBJ; and deposition of ecological data in Dryad. Authors are encouraged to select repositories that meet accepted criteria as trustworthy digital repositories, such as criteria of the Centre for Research Libraries or Data Seal of Approval. Large, international databases are more likely to persist than small, local ones. Copyright licensing for data held in repositories may be unclear. If authors use repositories with stated licensing policies the policies should not be more restrictive than CC BY.

Guidance on sharing datasets that derive from clinical studies or other work involving human participants

For studies involving human participants, data must be handled so as to not compromise study participants' privacy. PLOS recommends that researchers follow established guidance and applicable local laws in ensuring they do not compromise participant privacy. Resources which researchers may consult for guidance include:

Steps necessary to protect privacy may include de-identification, blocking portions of the database, or license agreements directed specifically at privacy concerns. Authors should indicate, as part of the ethics statement, the ways in which the study participants' privacy was preserved. If license agreements apply, authors should note the process necessary for other researchers to obtain a license.

FAQs for Data Policy

Policy overview

Why does PLOS have a data policy?

PLOS believes that making data available fosters scientific progress. Data availability allows and facilitates:

  • Validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses
  • Reproducibility of research
  • Efforts to ensure data are archived, increasing the value of the investment made in funding scientific research
  • Reduction of the burden on authors in unearthing old data, retaining old hard drives and answering email requests
  • Easier citation of data as well as research articles, enhancing visibility and ensuring recognition for authors

PLOS understands that some authors may not want to share data, just as some choose not to make their articles available Open Access, but we believe that authors publish their work precisely in order to allow others to benefit from it. More importantly, researchers want to see their work used and cited by others.

To what data does this policy apply?

The policy applies to the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire dataset, or the raw data collected during an investigation, just the data specific to the conclusions at present.

My manuscript was under consideration or published at PLOS before March 3, 2014. Does it need to adhere to the new policy?

Manuscripts submitted prior to March 3rd, 2014, will not be required to include a Data Availability statement. However, for manuscripts submitted or published before this date, data must be available upon reasonable request.

Exceptions

What are the exceptions to making the data publicly available?

  • Data cannot be publicly available for ethical or legal reasons, e.g., public availability would compromise patient privacy.
  • Data deposition could present some other threat, such as specific locations of fossil deposits or endangered species.
  • Data were obtained from a third party, i.e., the authors did not generate the primary dataset themselves.

Please contact the relevant journal office or data@plos.org if you have concerns about the ethics or legality of sharing your data.

Does the Data Policy override ethical compliance or national privacy standards?

The policy is not intended to overrule local regulations, legislation or ethical frameworks. Where these frameworks prevent or limit data release, authors should make these limitations clear in the Data Availability Statement at the time of submission. PLOS is very keen to work with the relevant bodies to help educate researchers on their local obligations and how they might need to adapt or declare limitations on data access when they publish their work.

Will papers with the conclusions based on third-party proprietary data be considered?

PLOS will not consider submissions from which the conclusions are based on proprietary data. We consider proprietary data to be data that the data owners will not share, e.g., data from a pharmaceutical company that will share the data only with regulatory agencies for purposes of drug approval, but not with researchers. We will not consider submissions in which the conclusions are drawn from data that cannot be made available to researchers in the same manner by which the authors obtained it.

I don't have the right to share all the data from an analysis of a study. What should I do?

In this case, authors should share the data specific to their analysis that they can legally distribute; they do not need to share the data from the study that they cannot legally share. If an author does not own the data set or have the right to distribute it, they must include in the Data Availability Statement a contact to whom researchers may send data requests. Please note that authors are responsible for ensuring that data will be available from the data owner, in the same manner the authors obtained the data, post publication.

Clinical data

My study analyzes qualitative data and the patients did not consent to have their full transcripts made publicly available. What should I do?

The data policy exception related to privacy concerns pertains in this case. However, if requested, at least the excerpts of the transcripts relevant to the study would need to be shared. In this case, authors should include the contact information where requests may be sent in their Data Availability Statement, and state that excerpts of data are available on request. If even sharing excerpts would violate what the patients consent, then please inform the relevant journal office.

My study was conducted in humans and my minimum dataset includes information on individuals. What should I do?

Adherence to the PLOS data policy must never breach patient confidentiality. Authors should ensure that the data shared are in accordance with patient consent. Authors should provide only the data that are used in the specific study. Individual patient data should not include the following personal data (see Publication and access to clinical-trial data and Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers):

  • name, initials, address including full or partial postal code;
  • telephone or fax numbers of contact information, email address;
  • unique identifying numbers, vehicle identifiers, medical device identifiers;
  • Web or internet protocol addresses;
  • biometric data, facial photograph or comparable image, audiotapes;
  • names of relatives;
  • dates related to an individual, including birthdate.

The following may not be appropriate to include depending on what other information is provided:

  • place of treatment or health professional responsible for care;
  • gender;
  • rare disease or treatment;
  • sensitive data, such as illicit drug use or risky behavior;
  • place of birth;
  • socioeconomic data, such as occupation or place of work, income, or education household;
  • family composition;
  • anthropometric measures;
  • number of pregnancies;
  • ethnicity;
  • year of birth or age;
  • verbatim transcripts or responses.

Also potentially inappropriate to include, depending on the type of information provided, are data on population sizes of less than 100 or those with small numerators, e.g., event counts less than 3. (Information from Hrynaszkiewicz I, Norton M L, Vickers A J, Altman D G. (2010). Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ 340:c181. http://www.bmj.com/content/340/bmj.c181.long)

I do not have the right to share data except with other collaborators whose work will help the research, what should I do?

We hope that data will be available to all interested researchers, but we do understand that ethical and legal restrictions may prohibit this. We hope that institutions will recognize the importance of preserving data and making it available, given concerns over data preservation and reproducibility, and support the researchers at their institutions. However, since many institutions do not have committees in place to help with this process, we will work with authors to find a solution in the meantime. Please contact the relevant journal office or data@plos.org if this is the case.

My funding agency or government law only permits sharing of human participant data with researchers with whom they have a written agreement. What should I do?

Please provide the contact information of the individuals and institution(s) where an interested researcher would need to apply to start the process to gain access to the data in the Data Availability Statement. If you have questions, please contact the relevant journal or data@plos.org.

Depositing data

What if I cannot provide accession numbers or DOIs for my data set at submission?

If this is the case, authors may submit their manuscript and note in their Data Availability statement that their accession numbers or DOIs will not be made available until acceptance. The journal office will contact you at acceptance to provide this information, and will hold your paper upon acceptance until we receive these identifiers for your dataset.

Providing ‘private’ access for reviewers and editors during the peer review process is acceptable. Many repositories permit private access for review purposes, and have policies for public release at publication. If this is not possible, authors can provide the data via other means, such as zipped files via email, Dropbox etc. Please contact the relevant journal office or data@plos.org for assistance.

Is PLOS integrated with any repositories?

PLOS has a Data Repository Integration Partner Program that integrates our submission process with partner data repositories to better support data sharing and author compliance of the PLOS data policy. Our submission system is integrated with partner repositories to ensure that the article and its underlying data are paired, published together and linked. The integration facilitates deposition of data alongside article submission, which may also facilitate consideration and peer review of submissions.

Current partners include Dryad and figshare. We are expanding the current selection of partners to integrate with more data centers. PLOS is repository agnostic; provided that data centers meet our baseline criteria (license and availability, reliability, preservation) that ensure trustworthiness and good stewardship of data, we would accept data submitted in those locations.

Partner repositories may have a data submission fee. PLOS is not able to cover this fee and authors are under no obligation to use any specific repository. PLOS does not gain financially from our association with any integrated partners. More information on the program can be found here.

How do I deposit data with a data repository integration partner?

Once an author deposits data in the integrated repository, s/he receives a provisional dataset DOI along with a private reviewer URL link. Upon submission to PLOS, authors must include the data DOI into the Data Availability Statement. They should also provide the reviewer URL, which will permit restricted access to the data during peer review. If a manuscript is editorially accepted by a PLOS journal, the publication of the article and public release of the dataset will be automatically coordinated.

I cannot afford the cost of depositing a very large amount of data. What should I do?

PLOS encourages authors to investigate all options and to contact their institutions if they have difficulty providing access to the data underlying the research. Authors facing these challenges are encouraged to submit their manuscript and PLOS will work with them to find a solution. If this is the case, please email the relevant journal office.

What are acceptable licenses for my data deposition?

Data should be covered by a CC BY license or a less restrictive license.

Submitting to PLOS

What is the data availability statement and what should I write?

Upon submission to a PLOS journal, authors are asked to enter the location and availability of their data in Editorial Manager. What is written in this text box will be published as is, should the paper be accepted.

If data are freely available, we ask that authors note this and state the location of their data:

  • Within the paper, supporting information files, in a public repository (include DOI, accession)

If data are freely available and owned by a third party, please state:

  • The owner of the data set where requests may be sent to

Note: If data have been obtained from a third party, we require that any researcher will be able to obtain the data set in the same manner by which the authors obtained it.

If there are any approved restrictions on the data set, for ethical or legal reasons, please state:

  • The availability of the data;
  • A brief description of the ethical or legal restrictions on the data set;
  • A contact to whom requests for the data may be sent.

What data are required and what is meant by minimal data set?

We ask that authors make available the data that are necessary to replicate the findings in the present manuscript. Please submit the following data:

  • The values behind the means, standard deviations and other measures reported;
  • The values used to build graphs;
  • The points extracted from images for analysis.

NOTE:

  • Authors are not required to make all images available, but we do require a sample Western Blot, Immunohistochemistry image, fMRI image, etc. to be included with the submission files or in a public repository.
  • Authors are not required to submit the entire dataset, or absolutely all raw data collected during an investigation, but they must provide the portion that is relevant to the specific study.

What format should I use for my data?

The file format used to submit data should follow the standards in the field. If there are currently no standards in the field, please submit the data in an accessible format from which data can be efficiently extracted (e.g., Excel rather than PDF).

How do I submit data as Supporting Information files?

Upon submission and at revision, authors have the opportunity to upload Supporting Information files. There is a 10 MB limit per file, but that is unlikely to be exceeded with Excel files or anything similar. If the files do exceed this amount, authors should zip or otherwise compress the files before submission.

In choosing between supplementary files and a repository, please refer to our blog post on uses of supplementary files.

What if data are found to not be accessible or other issues are found after publication?

PLOS will follow up with the authors and take action as necessary. PLOS reserves the right to issue corrections, notifications or retractions when authors do not comply with our policies.