PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article.
Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.
Methods acceptable to PLOS journals with respect to data sharing are listed below, accompanied by guidance for authors as to what must be indicated in their data availability statement and how to follow best practices in reporting. If authors did not collect data themselves but used another source, this source must be credited as appropriate. Authors who have questions or difficulties with the policy, or readers who have difficulty accessing data, are encouraged to contact the relevant journal office or firstname.lastname@example.org.
The data policy was implemented on March 3, 2014. Any paper submitted before that date will not have a data availability statement. However for all manuscripts submitted or published before this date, data must be available upon reasonable request.
Data deposition (strongly recommended)
All data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository, unless already provided as part of the submitted article. Repositories may be either subject-specific (where these exist) and accept specific types of structured data, or generalist repositories that accept multiple data types, such as Dryad. Guidance on acceptable repositories is included below.
The Data Availability Statement must specify that data are deposited publicly and list the name(s) of repositories along with digital object identifiers or accession numbers for the relevant data sets. Read more about accession numbers.
Data in supporting information files
For smaller data sets and certain data types, authors may upload data as supporting information files accompanying the manuscript. (See also additional information regarding appropriate use of supporting information files.) Authors should take care to maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDF when providing tabulated data).
If data deposition or provision in supporting information is not ethical or legal (i.e., underlying data pose privacy or legal concerns e.g., where data might reveal the identity or location of participants), the following two methods may be acceptable alternatives, subject to case-by-case evaluation:
Data made available to all interested researchers upon request
The Data Availability Statement must specify “Data available on request” and identify the group to which requests should be submitted (e.g., a named data access committee or named ethics committee). The reasons for restrictions on public data deposition must also be specified. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.
Data available from a third party
We consider third-party data to be data not owned by the authors. Authors should share any data specific to their analysis that they can legally distribute. If an author does not own the data set, they must include in the Data Availability Statement all necessary contact information where an interested researcher would need to apply to gain access to the relevant data.
If permission was required to use a third-party data set (e.g., very large unpublished genome data or similar), authors must include the third-party source and verification of permission in the Data Availability Statement, as well as proper acknowledgment in the manuscript.
Please note that authors are responsible for ensuring that data will be available from the data owner post-publication, in the same manner as the authors obtained the data.
PLOS journals will not consider manuscripts for which the following factors influence ability to share data:
- Authors will not share data because of personal interests, such as patents or potential future publications.
- The conclusions depend solely on the analysis of proprietary data, whether these data are owned by the authors, by their funders or institutions, or by other parties. We consider proprietary data to be data owned by commercial interests, or copyrighted data that the data owners will not share, e.g., data from a pharmaceutical company that will share the data only with regulatory agencies for purposes of drug approval, but not with researchers. If proprietary data are used and cannot be accessed by others (in the same manner by which the authors obtained them), the manuscript must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.
A compilation of frequently asked questions about the PLOS Data Policy is available and is updated periodically.
Definition of data that must be shared
PLOS defines the “minimal data set” to consist of the data set used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire data set if only a portion of the data were used in the reported study. Also, authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed.
Please note that PLOS does not permit references to “data not shown.” Authors should provide the relevant data within the manuscript, the Supporting Information files, or in a public repository. If the data are not a core part of the research study being presented, we ask that authors remove any references to these data.
Guidance on sharing data sets that derive from clinical studies or other work involving human participants
For studies involving human participants, data must be handled so as to not compromise study participants' privacy. PLOS recommends that researchers follow established guidance and applicable local laws in ensuring they do not compromise participant privacy. Resources which researchers may consult for guidance include:
- US National Institutes of Health: Protecting the Rights and Privacy of Human Subjects
- Canadian Institutes of Health Research Best Practices for Protecting Privacy in Health Research
- UK Data Archive: Anonymisation Overview
- Australian National Data Service: Ethics, Consent and Data Sharing
Steps necessary to protect privacy may include de-identification, blocking portions of the database, or license agreements directed specifically at privacy concerns. Authors should indicate, as part of the ethics statement, the ways in which the study participants' privacy was preserved. If license agreements apply, authors should note the process necessary for other researchers to obtain a license.
PLOS requires that authors comply with field-specific standards for preparation and recording of data and select repositories appropriate to their field, for example deposition of microarray data in ArrayExpress or GEO; deposition of gene sequences in GenBank, EMBL or DDBJ; and deposition of ecological data in Dryad. Authors are encouraged to select repositories that meet accepted criteria as trustworthy digital repositories.
PLOS has identified a set of established repositories below, which are recognized and trusted within their respective communities. Additionally, the Registry of Research Data Repositories (Re3Data) is a full scale resource of registered repositories across subject areas. Re3Data provides information on an array of criteria to help researchers identify the ones most suitable for their needs (licensing, certificates & standards, policy, etc.). Authors are encouraged to select the repository most appropriate for their research. PLOS does not dictate repository selection for the data access policy. If authors use repositories with stated licensing policies the policies should not be more restrictive than the Creative Commons Attribution (CC BY) license. Read more about our content license.
If no specialized community-endorsed open repository exists, institutional repositories that use open licenses permitting free and unrestricted use or public domain, and that adhere to best practices pertaining to responsible data sharing, sustainable digital preservation, proper citation, and openness are also suitable for data deposition.
- Database of Genomic Variants Archive (DGVa)
- DNA DataBank of Japan (DDBJ)
- EBI Metagenomics
- EMBL Nucleotide Sequence Database (ENA)
- European Variation Archive (EVA)
- NCBI Sequence Read Archive (SRA)
- NCBI Trace Archive
- Biological General Repository for Interaction Datasets (BioGRID)
- Database of Interacting Proteins (DIP)
- The European Genome-phenome Archive (EGA)
- IntAct Molecular Interaction Database
- Gene Expression Omnibus (GEO)
- GPM DB
- Proteomics Identifications (PRIDE)
- Biological Magnetic Resonance Data Bank (BMRB)
- Crystallography Open Database (COD)
- Coherent X-ray Imaging Data Bank (CXIDB)
- Electron Microscopy Data Bank (EMDB)
- Protein Circular Dichroism Data Bank (PCDDB)
- Worldwide Protein Data Bank (wwPDB)
- Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI)
- Eukaryotic Pathogen Database Resources (EuPathDB)
- Mouse Genome Informatics (MGI)
- Rat Genome Database (RGD)
- The Arabidopsis Information Resource (TAIR)
- Zebrafish Model Organism Database (ZFIN)
- Integrated Taxonomic Information System (ITIS)
- Global Biodiversity Information Facility (GBIF)
- NCBI Taxonomy
- The Knowledge Network for Biocomplexity
- Influenza Research Database
- National Addiction & HIV Data Archive Program (NAHDAP)
- National Database for Autism Research (NDAR)
- The Cancer Imaging Archive (TCIA)
- Virtual Skeleton Database (SICAS medical image repository)
- Kinetic Models of Biological Systems (KiMoSys)
- The Mass spectrometry Interactive Virtual Environment (MassIVE)
- Australian Antarctic Data Centre (AADC)
- Cold and Arid Regions Science Data Center (CARD)
- Long Term Ecological Research (LTER) Network Data Repository
- National Climatic Data Center (NCDC)
- National Environmental Research Council Data Centres (NERC)
- Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
- Reaction Database Standard Search Interface
- SIMBAD Astronomical Database
- UK Solar System Data Centre
- World Data Center for Climate at DRKZ (WDCC)
PLOS would like to thank the open access Nature Publishing Group journal, Scientific Data, for their own list of recommended repositories.
Why does PLOS have a data policy?
PLOS believes that making data available fosters scientific progress. Data availability allows and facilitates:
- Validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses
- Reproducibility of research
- Efforts to ensure data are archived, increasing the value of the investment made in funding scientific research
- Reduction of the burden on authors in unearthing old data, retaining old hard drives and answering email requests
- Easier citation of data as well as research articles, enhancing visibility and ensuring recognition for authors
PLOS understands that some authors may not want to share data, just as some choose not to make their articles available Open Access, but we believe that authors publish their work precisely in order to allow others to benefit from it. More importantly, researchers want to see their work used and cited by others.
We hope that data will be publicly available to all interested researchers, but we do understand that ethical and legal restrictions may prohibit this. The policy is not intended to overrule local regulations, legislation or ethical frameworks. Where these frameworks prevent or limit data release, authors should make these limitations clear in the Data Availability Statement at the time of submission.
Possible exceptions to making data publicly available include:
Data cannot be made publicly available for ethical or legal reasons, e.g., public availability would compromise patient confidentiality or participant privacy.
Data deposition could present some other threat, such as revealing the locations of fossil deposits, endangered species, or farms/other animal enclosures etc.
We hope that institutions will recognize the importance of preserving data and making it available, especially given concerns over data preservation and reproducibility, and that they will support their researchers in making data available. We encourage researchers and their institutions to consider whether a Data Access Committee could be convened to hold data and respond to requests for data. Since many institutions do not have committees in place to help with this process, we will work with authors to try to identify a solution in the meantime.
Please contact the relevant journal office or email@example.com to discuss:
if you feel unable to share data for reasons not specified above, or
if you have concerns about the ethics or legality of sharing your data.
My study uses proprietary data; what should I do?
We consider proprietary data to be data owned by commercial interests, or copyrighted data that the data owners will not share, e.g., data from a pharmaceutical company that will share the data only with regulatory agencies for purposes of drug approval, but not with researchers.
PLOS will not consider submissions where the conclusions depend solely on the analysis of proprietary data, whether these data are owned by the authors, by their funders or institutions, or by other parties. If proprietary data are used and cannot be accessed by others (in the same manner by which the authors obtained it), the manuscript must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.
My study analyzes qualitative data and the participants did not consent to have their full transcripts made publicly available. What should I do?
The data policy exception related to privacy concerns pertains in this case. However, if requested, at least the excerpts of the transcripts relevant to the study would need to be shared. In this case, authors should include the contact information where requests may be sent in their Data Availability Statement, and state that excerpts of data are available on request. If even sharing excerpts would violate the agreement to which the participants consented, then please inform the relevant journal office.
My study was conducted in humans and my minimum data set includes information on individuals. What should I do?
Adherence to the PLOS data policy must never breach patient confidentiality. Authors should ensure that the data shared are in accordance with patient consent. Authors should provide only the data that are used in the specific study. Individual patient data should not include the following personal data (see Publication and access to clinical-trial data and Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers):
- name, initials, address including full or partial postal code;
- telephone or fax numbers of contact information, email address;
- unique identifying numbers, vehicle identifiers, medical device identifiers;
- Web or internet protocol addresses;
- biometric data, facial photograph or comparable image, audiotapes;
- names of relatives;
- dates related to an individual, including birthdate.
The following may not be appropriate to include depending on what other information is provided:
- place of treatment or health professional responsible for care;
- rare disease or treatment;
- sensitive data, such as illicit drug use or risky behavior;
- place of birth;
- socioeconomic data, such as occupation or place of work, income, or education household;
- family composition;
- anthropometric measures;
- number of pregnancies;
- year of birth or age;
- verbatim transcripts or responses.
Also potentially inappropriate to include, depending on the type of information provided, are data on population sizes of less than 100 or those with small numerators, e.g., event counts less than 3. (Information from Hrynaszkiewicz I, Norton M L, Vickers A J, Altman D G. (2010). Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ 340:c181. http://www.bmj.com/content/340/bmj.c181.long)
The data from my study relates to a potential medicine that will be submitted to the European Medicines Agency (EMA) for approval. Do I need to wait until after the approvals process to make the data from my article available?
The data shared according to the PLOS policy likely represents only a small proportion of the evidence submitted to the EMA for approval and so should not interfere with approvals processes of the EMA. The EMA’s Policy 0070, that specifies data release after authorization, applies only to the data held by the regulatory agency submitted as part of a marketing authorization application. As such, the PLOS data policy is fully compatible with the data sharing polices of the EMA. Therefore, authors should make the data underlying the findings described in their manuscript available at the time of publication.
What if I cannot provide accession numbers or DOIs for my data set at submission?
If this is the case, authors may submit their manuscript and note in their Data Availability Statement that their accession numbers or DOIs will not be made available until acceptance. The journal office will contact you at acceptance to provide this information, and will hold your paper upon acceptance until we receive these identifiers for your data set.
Providing ‘private’ access for reviewers and editors during the peer review process is acceptable. Many repositories permit private access for review purposes, and have policies for public release at publication. If this is not possible, authors can provide the data via other means, such as zipped files via email, Dropbox etc. Please contact the relevant journal office or firstname.lastname@example.org for assistance.
Is PLOS integrated with any repositories?
PLOS has a Data Repository Integration Partner Program that integrates our submission process with partner data repositories to better support data sharing and author compliance of the PLOS data policy. Our submission system is integrated with partner repositories to ensure that the article and its underlying data are paired, published together and linked. The integration facilitates deposition of data alongside article submission, which may also facilitate consideration and peer review of submissions.
Current partners include Dryad and FlowRepository. We are expanding the current selection of partners to integrate with more data centers. PLOS is repository agnostic; provided that data centers meet our baseline criteria (license and availability, reliability, preservation) that ensure trustworthiness and good stewardship of data, we would accept data submitted in those locations.
Partner repositories may have a data submission fee. PLOS is not able to cover this fee and authors are under no obligation to use any specific repository. PLOS does not gain financially from our association with any integrated partners. More information on the program can be found here.
How do I deposit data with a data repository integration partner?
Once an author deposits data in the integrated repository, s/he receives a provisional data set DOI along with a private reviewer URL link. Upon submission to PLOS, authors must include the data DOI into the Data Availability Statement. They should also provide the reviewer URL, which will permit restricted access to the data during peer review. If a manuscript is editorially accepted by a PLOS journal, the publication of the article and public release of the data set will be automatically coordinated.
I cannot afford the cost of depositing a very large amount of data. What should I do?
PLOS encourages authors to investigate all options and to contact their institutions if they have difficulty providing access to the data underlying the research. Authors facing these challenges are encouraged to submit their manuscript and PLOS will work with them to find a solution. If this is the case, please email the relevant journal office.
What are acceptable licenses for my data deposition?
Data should be covered by a CC BY license or a less restrictive license.
What is the data availability statement and what should I write?
Upon submission to a PLOS journal, authors are asked to enter the location and availability of their data in the submission system. What is written in this text box will be published as is, should the paper be accepted.
If data are freely available, we ask that authors note this and state the location of their data:
- Within the paper, supporting information files, in a public repository (include DOI, accession)
If data are freely available and owned by a third party, please state:
- The owner of the data set where requests may be sent to
Note: If data have been obtained from a third party, we require that any researcher will be able to obtain the data set in the same manner by which the authors obtained it.
If there are any approved restrictions on the data set, for ethical or legal reasons, please state:
- The availability of the data;
- A brief description of the ethical or legal restrictions on the data set;
- A contact to whom requests for the data may be sent.
What data are required and what is meant by minimal data set?
PLOS defines the “minimal data set” to consist of the data set used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire data set, or the raw data collected during an investigation. Please submit the following data:
- The values behind the means, standard deviations and other measures reported;
- The values used to build graphs;
- The points extracted from images for analysis.
- Authors are not required to make all images available, but we do require a sample Western Blot, Immunohistochemistry image, fMRI image, etc. to be included with the submission files or in a public repository.
- Please note that PLOS does not permit references to “data not shown.” Authors should provide the relevant data within the manuscript, the Supporting Information files, or in a public repository. If the data are not a core part of the research study being presented, we ask that authors remove any references to these data.
What format should I use for my data?
The file format used to submit data should follow the standards in the field. If there are currently no standards in the field, please submit the data in an accessible format from which data can be efficiently extracted (e.g., Excel rather than PDF).
How do I submit data as supporting information files?
Upon submission and at revision, authors have the opportunity to upload supporting information files. There is a 10 MB limit per file, but that is unlikely to be exceeded with Excel files or anything similar. If the files do exceed this amount, authors should zip or otherwise compress the files before submission.
In choosing between supporting information files and a repository, please refer to our blog post on uses of supporting information files.
What if data are found to not be accessible or other issues are found after publication?
PLOS will follow up with the authors and take action as necessary. PLOS reserves the right to issue corrections, notifications or retractions when authors do not comply with our policies.