PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS's policy. If the article is accepted for publication, the data availability statement will be published as part of the final article.
Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.
Methods acceptable to PLOS journals with respect to data sharing are listed below, accompanied by guidance for authors as to what must be indicated in their data availability statement and how to follow best practices in reporting. If authors did not collect data themselves but used another source, this source must be credited as appropriate. Authors who have questions or difficulties with the policy, or readers who have difficulty accessing data, are encouraged to contact the relevant journal office or firstname.lastname@example.org.
The data policy was implemented on March 3, 2014. Any paper submitted before that date will not have a data availability statement. However for all manuscripts submitted or published before this date, data must be available upon reasonable request.
Data deposition (strongly recommended)
All data and related metadata underlying the findings reported in a submitted manuscript should be deposited in an appropriate public repository, unless already provided as part of the submitted article. Repositories may be either subject-specific (where these exist) and accept specific types of structured data, or generalist repositories that accept multiple data types, such as Dryad. Guidance on acceptable repositories is included below.
The Data Availability Statement must specify that data are deposited publicly and list the name(s) of repositories along with digital object identifiers or accession numbers for the relevant data sets. Read more about accession numbers.
Data in supporting information files
For smaller data sets and certain data types, authors may upload data as supporting information files accompanying the manuscript. (See also additional information regarding appropriate use of supporting information files.) Authors should take care to maximize the accessibility and reusability of the data by selecting a file format from which data can be efficiently extracted (for example, spreadsheets are preferable to PDF when providing tabulated data).
If data deposition or provision in supporting information is not ethical or legal (i.e., underlying data pose privacy or legal concerns e.g., where data might reveal the identity or location of participants), the following two methods may be acceptable alternatives, subject to case-by-case evaluation:
Data made available to all interested researchers upon request
The Data Availability Statement must specify “Data available on request” and identify the group to which requests should be submitted (e.g., a named data access committee or named ethics committee). The reasons for restrictions on public data deposition must also be specified. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.
Data available from a third party
We consider third-party data to be data not owned by the authors. Authors should share any data specific to their analysis that they can legally distribute. If an author does not own the data set, they must include in the Data Availability Statement all necessary contact information where an interested researcher would need to apply to gain access to the relevant data.
If permission was required to use a third-party data set (e.g., very large unpublished genome data or similar), authors must include the third-party source and verification of permission in the Data Availability Statement, as well as proper acknowledgment in the manuscript.
Please note that authors are responsible for ensuring that data will be available from the data owner post-publication, in the same manner as the authors obtained the data.
PLOS journals will not consider manuscripts for which the following factors influence ability to share data:
- Authors will not share data because of personal interests, such as patents or potential future publications.
- The conclusions depend solely on the analysis of proprietary data, whether these data are owned by the authors, by their funders or institutions, or by other parties. We consider proprietary data to be data owned by commercial interests, or copyrighted data that the data owners will not share, e.g., data from a pharmaceutical company that will share the data only with regulatory agencies for purposes of drug approval, but not with researchers. If proprietary data are used and cannot be accessed by others (in the same manner by which the authors obtained them), the manuscript must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.
A compilation of frequently asked questions about the PLOS Data Policy is available and is updated periodically.
Definition of data that must be shared
PLOS defines the “minimal data set” to consist of the data set used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire data set if only a portion of the data were used in the reported study. Also, authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed.
Please note that PLOS does not permit references to “data not shown.” Authors should provide the relevant data within the manuscript, the Supporting Information files, or in a public repository. If the data are not a core part of the research study being presented, we ask that authors remove any references to these data.
Guidance on sharing data sets that derive from clinical studies or other work involving human participants
For studies involving human participants, data must be handled so as to not compromise study participants' privacy. PLOS recommends that researchers follow established guidance and applicable local laws in ensuring they do not compromise participant privacy. Resources which researchers may consult for guidance include:
- US National Institutes of Health: Protecting the Rights and Privacy of Human Subjects
- Canadian Institutes of Health Research Best Practices for Protecting Privacy in Health Research
- UK Data Archive: Anonymisation Overview
- Australian National Data Service: Ethics, Consent and Data Sharing
Steps necessary to protect privacy may include de-identification, blocking portions of the database, or license agreements directed specifically at privacy concerns. Authors should indicate, as part of the ethics statement, the ways in which the study participants' privacy was preserved. If license agreements apply, authors should note the process necessary for other researchers to obtain a license.
PLOS requires that authors comply with field-specific standards for preparation and recording of data and select repositories appropriate to their field, for example deposition of microarray data in ArrayExpress or GEO; deposition of gene sequences in GenBank, EMBL or DDBJ; and deposition of ecological data in Dryad. Authors are encouraged to select repositories that meet accepted criteria as trustworthy digital repositories.
PLOS has identified a set of established repositories below, which are recognized and trusted within their respective communities. For further information on environmental and biomedical science repositories and field standards, we suggest utilizing BioSharing; we have also created a BioSharing page of PLOS recommended data repositories. Additionally, the Registry of Research Data Repositories (Re3Data) is a full scale resource of registered repositories across subject areas. Both BioSharing and Re3Data provide information on an array of criteria to help researchers identify the repositories most suitable for their needs (licensing, certificates and standards, policy, etc.).
Authors are encouraged to select the repository most appropriate for their research. PLOS does not dictate repository selection for the data access policy. If authors use repositories with stated licensing policies, the policies should not be more restrictive than the Creative Commons Attribution (CC BY) license. More information about the content license can be found in our licenses and copyright policy.
If no specialized community-endorsed open repository exists, institutional repositories that use open licenses permitting free and unrestricted use or public domain, and that adhere to best practices pertaining to responsible data sharing, sustainable digital preservation, proper citation, and openness are also suitable for data deposition.
- Database of Genomic Variants Archive (DGVa)
- DNA DataBank of Japan (DDBJ)
- EBI Metagenomics
- EMBL Nucleotide Sequence Database (ENA)
- European Variation Archive (EVA)
- NCBI Sequence Read Archive (SRA)
- NCBI Trace Archive
- Biological General Repository for Interaction Datasets (BioGRID)
- Database of Interacting Proteins (DIP)
- The European Genome-phenome Archive (EGA)
- IntAct Molecular Interaction Database
- Gene Expression Omnibus (GEO)
- GPM DB
- Proteomics Identifications (PRIDE)
- Biological Magnetic Resonance Data Bank (BMRB)
- Crystallography Open Database (COD)
- Coherent X-ray Imaging Data Bank (CXIDB)
- Electron Microscopy Data Bank (EMDB)
- Protein Circular Dichroism Data Bank (PCDDB)
- Worldwide Protein Data Bank (wwPDB)
- Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI)
- Eukaryotic Pathogen Database Resources (EuPathDB)
- Mouse Genome Informatics (MGI)
- Rat Genome Database (RGD)
- The Arabidopsis Information Resource (TAIR)
- Zebrafish Model Organism Database (ZFIN)
- Integrated Taxonomic Information System (ITIS)
- Global Biodiversity Information Facility (GBIF)
- NCBI Taxonomy
- The Knowledge Network for Biocomplexity
- Influenza Research Database
- National Addiction & HIV Data Archive Program (NAHDAP)
- National Database for Autism Research (NDAR)
- The Cancer Imaging Archive (TCIA)
- Virtual Skeleton Database (SICAS medical image repository)
- Kinetic Models of Biological Systems (KiMoSys)
- The Mass spectrometry Interactive Virtual Environment (MassIVE)
- Australian Antarctic Data Centre (AADC)
- Cold and Arid Regions Science Data Center (CARD)
- Long Term Ecological Research (LTER) Network Data Repository
- National Climatic Data Center (NCDC)
- National Environmental Research Council Data Centres (NERC)
- Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
- Reaction Database Standard Search Interface
- SIMBAD Astronomical Database
- UK Solar System Data Centre
- World Data Center for Climate at DRKZ (WDCC)
- Inter-university Consortium for Political and Social Research (ICPSR)
- Qualitative Data Repository
- Swedish National Data Service
- Data Archiving and Networking Services (DANS)
PLOS would like to thank the open access Nature Publishing Group journal, Scientific Data, for their own list of recommended repositories.
Why does PLOS have a data policy?
PLOS believes that making data available fosters scientific progress. Data availability allows and facilitates:
- Validation, replication, reanalysis, new analysis, reinterpretation or inclusion into meta-analyses
- Reproducibility of research
- Efforts to ensure data are archived, increasing the value of the investment made in funding scientific research
- Reduction of the burden on authors in unearthing old data, retaining old hard drives and answering email requests
- Easier citation of data as well as research articles, enhancing visibility and ensuring recognition for authors
PLOS understands that some authors may not want to share data, just as some choose not to make their articles available Open Access, but we believe that authors publish their work precisely in order to allow others to benefit from it. More importantly, researchers want to see their work used and cited by others.
We hope that data will be publicly available to all interested researchers, but we do understand that ethical and legal restrictions may prohibit this. The policy is not intended to overrule local regulations, legislation or ethical frameworks. Where these frameworks prevent or limit data release, authors should make these limitations clear in the Data Availability Statement at the time of submission.
Possible exceptions to making data publicly available include:
Data cannot be made publicly available for ethical or legal reasons, e.g., public availability would compromise patient confidentiality or participant privacy.
Data deposition could present some other threat, such as revealing the locations of fossil deposits, endangered species, or farms/other animal enclosures etc.
We hope that institutions will recognize the importance of preserving data and making it available, especially given concerns over data preservation and reproducibility, and that they will support their researchers in making data available. We encourage researchers and their institutions to consider whether a Data Access Committee could be convened to hold data and respond to requests for data. Since many institutions do not have committees in place to help with this process, we will work with authors to try to identify a solution in the meantime.
Please contact the relevant journal office or email@example.com to discuss:
if you feel unable to share data for reasons not specified above, or
if you have concerns about the ethics or legality of sharing your data.
My study uses proprietary data; what should I do?
We consider proprietary data to be data owned by commercial interests, or copyrighted data that the data owners will not share, e.g., data from a pharmaceutical company that will share the data only with regulatory agencies for purposes of drug approval, but not with researchers.
PLOS will not consider submissions where the conclusions depend solely on the analysis of proprietary data, whether these data are owned by the authors, by their funders or institutions, or by other parties. If proprietary data are used and cannot be accessed by others (in the same manner by which the authors obtained it), the manuscript must include an analysis of public data that validates the conclusions so that others can reproduce the analysis and build on the findings.
My study analyzes qualitative data and the participants did not consent to have their full transcripts made publicly available. What should I do?
The data policy exception related to privacy concerns pertains in this case. However, if requested, at least the excerpts of the transcripts relevant to the study would need to be shared. In this case, authors should include the contact information where requests may be sent in their Data Availability Statement, and state that excerpts of data are available on request. If even sharing excerpts would violate the agreement to which the participants consented, then please inform the relevant journal office.
My study was conducted in humans and my minimum data set includes information on individuals. What should I do?
Adherence to the PLOS data policy must never breach patient confidentiality. Authors should ensure that the data shared are in accordance with patient consent. Authors should provide only the data that are used in the specific study. Individual patient data should not include the following personal data (see Publication and access to clinical-trial data and Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers):
- name, initials, address including full or partial postal code;
- telephone or fax numbers of contact information, email address;
- unique identifying numbers, vehicle identifiers, medical device identifiers;
- Web or internet protocol addresses;
- biometric data, facial photograph or comparable image, audiotapes;
- names of relatives;
- dates related to an individual, including birthdate.
The following may not be appropriate to include depending on what other information is provided:
- place of treatment or health professional responsible for care;
- rare disease or treatment;
- sensitive data, such as illicit drug use or risky behavior;
- place of birth;
- socioeconomic data, such as occupation or place of work, income, or education household;
- family composition;
- anthropometric measures;
- number of pregnancies;
- year of birth or age;
- verbatim transcripts or responses.
Also potentially inappropriate to include, depending on the type of information provided, are data on population sizes of less than 100 or those with small numerators, e.g., event counts less than 3. (Information from Hrynaszkiewicz I, Norton M L, Vickers A J, Altman D G. (2010). Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers. BMJ 340:c181. http://www.bmj.com/content/340/bmj.c181.long)
The data from my study relates to a potential medicine that will be submitted to the European Medicines Agency (EMA) for approval. Do I need to wait until after the approvals process to make the data from my article available?
The data shared according to the PLOS policy likely represents only a small proportion of the evidence submitted to the EMA for approval and so should not interfere with approvals processes of the EMA. The EMA’s Policy 0070, that specifies data release after authorization, applies only to the data held by the regulatory agency submitted as part of a marketing authorization application. As such, the PLOS data policy is fully compatible with the data sharing polices of the EMA. Therefore, authors should make the data underlying the findings described in their manuscript available at the time of publication.
What if I cannot provide accession numbers or DOIs for my data set at submission?
If this is the case, authors may submit their manuscript and note in their Data Availability Statement that their accession numbers or DOIs will not be made available until acceptance. The journal office will contact you at acceptance to provide this information, and will hold your paper upon acceptance until we receive these identifiers for your data set.
Providing ‘private’ access for reviewers and editors during the peer review process is acceptable. Many repositories permit private access for review purposes, and have policies for public release at publication. If this is not possible, authors can provide the data via other means, such as zipped files via email, Dropbox etc. Please contact the relevant journal office or firstname.lastname@example.org for assistance.
Is PLOS integrated with any repositories?
PLOS has a Data Repository Integration Partner Program that integrates our submission process with partner data repositories to better support data sharing and author compliance of the PLOS data policy. Our submission system is integrated with partner repositories to ensure that the article and its underlying data are paired, published together and linked. The integration facilitates deposition of data alongside article submission, which may also facilitate consideration and peer review of submissions.
Current partners include Dryad and FlowRepository. We are expanding the current selection of partners to integrate with more data centers. PLOS is repository agnostic; provided that data centers meet our baseline criteria (license and availability, reliability, preservation) that ensure trustworthiness and good stewardship of data, we would accept data submitted in those locations.
Partner repositories may have a data submission fee. PLOS is not able to cover this fee and authors are under no obligation to use any specific repository. PLOS does not gain financially from our association with any integrated partners. More information on the program can be found here.
How do I deposit data with a data repository integration partner?
Once an author deposits data in the integrated repository, s/he receives a provisional data set DOI along with a private reviewer URL link. Upon submission to PLOS, authors must include the data DOI into the Data Availability Statement. They should also provide the reviewer URL, which will permit restricted access to the data during peer review. If a manuscript is editorially accepted by a PLOS journal, the publication of the article and public release of the data set will be automatically coordinated.
I cannot afford the cost of depositing a very large amount of data. What should I do?
PLOS encourages authors to investigate all options and to contact their institutions if they have difficulty providing access to the data underlying the research. Authors facing these challenges are encouraged to submit their manuscript and PLOS will work with them to find a solution. If this is the case, please email the relevant journal office.
What are acceptable licenses for my data deposition?
Data should be covered by a CC BY license or a less restrictive license.
What is the data availability statement and what should I write?
Upon submission to a PLOS journal, authors are asked to enter the location and availability of their data in the submission system. What is written in this text box will be published as is, should the paper be accepted.
If data are freely available, we ask that authors note this and state the location of their data:
- Within the paper, supporting information files, in a public repository (include DOI, accession)
If data are freely available and owned by a third party, please state:
- The owner of the data set where requests may be sent to
Note: If data have been obtained from a third party, we require that any researcher will be able to obtain the data set in the same manner by which the authors obtained it.
If there are any approved restrictions on the data set, for ethical or legal reasons, please state:
- The availability of the data;
- A brief description of the ethical or legal restrictions on the data set;
- A contact to whom requests for the data may be sent.
What data are required and what is meant by minimal data set?
PLOS defines the “minimal data set” to consist of the data set used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire data set, or the raw data collected during an investigation. Please submit the following data:
- The values behind the means, standard deviations and other measures reported;
- The values used to build graphs;
- The points extracted from images for analysis.
- Authors are not required to make all images available, but we do require a sample Western Blot, Immunohistochemistry image, fMRI image, etc. to be included with the submission files or in a public repository.
- Please note that PLOS does not permit references to “data not shown.” Authors should provide the relevant data within the manuscript, the Supporting Information files, or in a public repository. If the data are not a core part of the research study being presented, we ask that authors remove any references to these data.
What format should I use for my data?
The file format used to submit data should follow the standards in the field. If there are currently no standards in the field, please submit the data in an accessible format from which data can be efficiently extracted (e.g., Excel rather than PDF).
How do I submit data as supporting information files?
Upon submission and at revision, authors have the opportunity to upload supporting information files. There is a 10 MB limit per file, but that is unlikely to be exceeded with Excel files or anything similar. If the files do exceed this amount, authors should zip or otherwise compress the files before submission.
In choosing between supporting information files and a repository, please refer to our blog post on uses of supporting information files.
What if data are found to not be accessible or other issues are found after publication?
PLOS will follow up with the authors and take action as necessary. PLOS reserves the right to issue corrections, notifications or retractions when authors do not comply with our policies.
What is the data availability policy at PLOS Genetics and how does it relate to other PLOS journals?
Open access to data that underlies the results we publish has been a cornerstone of PLOS since its inception. The “new” data policy (described in an editorial in PLOS Medicine) supports that principle by requiring authors to specify what and how data will be shared at the time of submission to all PLOS journals; this information then appears as a “data availability statement” with the published article.
PLOS Genetics fully agrees with the PLOS-wide policy that “all data underlying the findings” of a manuscript should be publicly available; however, many manuscripts published by PLOS Genetics feature genome-scale data and/or human subjects for which there are special issues. We are particularly interested in promoting usability in addition to accessibility. These issues and more are discussed in a recent PLOS Genetics editorial.
How will PLOS Genetics implement the data availability policy?
In addition to listing what, how, and where data will be shared, the data availability statement should explain and justify any restrictions on data sharing. For example, authors of a human GWAS or meta-GWAS might state something like, “Summary statistics (p values, betas, and covariates for all SNPs that were tested) are provided as a tab-delimited text file in the Supporting Information. To the extent permitted by existing consents, individual genotypes and phenotypes are being submitted to dbGAP and EGA, and a complete set of identifiers to these data will be provided as a Supplementary Table in the final version of the manuscript.”
The data availability statement will be evaluated at the time of submission by editorial staff to confirm that the manuscript conforms to our data availability policy, and that there is sufficient information available to allow meaningful review. Occasionally, questions about those issues will require communication between the editorial staff and authors, and, rarely, communication with a senior editor will be needed, e.g., when potential restrictions on data availability are excessive or not adequately justified, or when there are concerns that data sharing does not meet minimal standards.
Can I state that some data is not shown, available on request, and/or just put it up on our lab website?
No. We recommend that any data used to support the manuscript findings be presented as a component of the published manuscript, in the Supporting Information, or deposited in a publicly accessible repository.
How will the journal respond to breaches of the data sharing policy?
PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors’ institutions and funders, or in extreme cases to retract the publication.
How is data-sharing taken into account during the review process?
Reviewers will be asked to consider how information provided in the data availability statement affects the potential impact of the manuscript with regard to functional utility. The goal is to more explicitly and consistently emphasize this important aspect of the review process that impacts directly on the vision of PLOS to make research results usable as well as accessible. Thus, in addition to the minimal standards of sharing required for publication, the extent to which the data can be used by other scientists, e.g., the format, the metadata, the extent of analysis, will be considered as part of the editorial evaluation of a manuscript’s potential impact and value to the community.
Are the extent, mechanisms, and utility of data sharing appropriate aspects for a reviewer to consider when assessing the suitability of a manuscript for publication in PLOS Genetics?
Yes. To the extent that papers constitute resources for future scientific discovery, data sharing is clearly relevant to the ultimate impact and interest that any given paper may generate. Readily accessible, well formatted, clearly documented, and comprehensive datasets that facilitate new, often open-ended, analyses would constitute a major strength to a manuscript under evaluation, increase the potential impact of the work, and therefore influence the reviewers' and editors' assessment of the strength of advance.
Is it possible for the data availability policy to result in rejection of an otherwise high-quality, appropriate manuscript?
Yes, if it is determined (by reviewers or editors) that the extent or mechanisms of data sharing are insufficient for replication and evaluation of the results in a manuscript. While we would work to determine a policy-compliant data sharing solution that is feasible and appropriate within the context of the study, we recognize that in very rare cases there may exist circumstances where such a solution is impossible, and such circumstances would prevent publication.
How should ethical or legal restrictions to data sharing be described in the “data availability statement”?
We intend that data availability statements serve not only as a mechanism to evaluate individual manuscripts, but also as a platform for the community to develop consensus and standards for the sometimes thorny balance between privacy/regulatory burdens and the goal of complete open access. To this end, we ask authors to provide a succinct justification and rationale for ANY restrictions on complete open access. In general, we do not need or want to review IRB approvals or consent documents, although in unusual situations, e.g., a consent process that explicitly restricts sharing of all summary statistics (and, in our opinion, runs counter to current community standards), we may ask for additional information.
Can you make an exception for intellectual property?
No. We respect the balance that commercial entities need to strike between securing intellectual property and publishing, we enthusiastically encourage submissions by scientists in the commercial sector, and we also understand that some members of the academic community rely heavily on partnerships with private industry as the basis for their research. Nonetheless, Open Access is a foundational principle at PLOS Genetics. For manuscripts based on data that underlies intellectual property, we encourage protection of the IP prior to submission so that the data can be fully shared. Potential exceptions and any licensing requirements must be fully disclosed in the data availability statement.
Is submitting my human subject data to dbGAP (or EGA or other controlled access repositories) sufficient?
Controlled access repositories with data access committees are useful when there are legitimate privacy concerns, as often exists for individual level genotype and phenotype data collected from an individual who did not wish to have his/her complete data openly shared. However, controlled access can also compromise usability, and the extent to which true open access is provided to data that does not represent a substantive threat to privacy will be considered in the context of potential impact of the work during the review process. Controlled access mechanisms used for data in PLOS Genetics should be transparent and free of any potential conflict of interest, e.g., data access committees must operate independently from the scientists who have generated the data.
What does PLOS Genetics expect for experimental genetic studies in model organisms?
Individual experiment-level data that underlies quantitative summary statistics in tables or graphs (e.g., tetrad classes, genotype ratios, fraction of cells exhibiting a particular phenotype, band intensity on a gel or blot) should be made available in table or spreadsheet format. In most cases, this can be provided as an item of Supporting Information during the submission process. For very large datasets, it may be necessary to submit the data separately to a public repository (such as DRYAD). For example, for a graph that reports an average value +/- standard deviation (such as foci per cell, or relative levels of a protein measured on a Western blot) versus time or under different conditions, the reporting requirement would be met by a supplementary table containing the ensemble of individual values for each time point or condition.
What is expected for GWAS studies?
With rare exception, summary statistics (p values, betas, allele frequencies, and/or other related measures) for all SNVs should be available without restriction to reviewers and readers; such data are necessary to evaluate reproducibility and quality of the study and to enable useful new analyses (e.g., disease meta-analyses, pathway or annotation enrichment analyses, etc.) In most cases, individual level genotype and phenotype data will require some level of restricted access; the approach and justification should be described in the data availability statement.
We expect that most such studies will involve restricted access to individual level genotype and phenotype data due to privacy concerns that are inherent in human subjects research. At the same time, and at a minimum, some level of basic genetic information will be required for evaluation and interpretation, and must be shared. This will generally include the genomic locations and specific alleles for all variants relevant to the conclusions at hand.
For studies that evaluate common variants, typical GWAS expectations apply (i.e., summary statistics for all SNVs). For studies that evaluate rare variants, we expect all candidate variants identified by the methods of choice to be explicitly presented and shared. For example, in “gene burden” studies involving gene, locus, or other aggregate tests of variant association with a phenotype, we would expect all mutations identified in all significantly associated genes or loci, and preferably summary measures attached to all variants in all genes that were counted or evaluated. In rare disease diagnosis or gene discovery efforts, we expect all candidates subject to manual review to be shared, including those ultimately deemed irrelevant, particularly in light of the idea that concluding that a specific variant is causal often requires evaluating the presence or absence of other plausible candidate variants.
In addition to these minimal requirements, and as stated in the PLOS Genetics editorial, we strongly encourage authors to use strategies for data sharing that maximize public access while still protecting privacy. For example, depositing causal and candidate variants to resources like ClinVar allows other groups to systematically identify and compare variants they identify to previously discovered variants attached to some minimal phenotype. Typically, such variants are unlinked by sample ID and attached to only general phenotypic terms so as to impede individual-level reconstruction and reduce privacy exposure risks to extremely low levels.
What is expected for functional genomic studies in human samples (e.g., RNA-Seq, ChIP-Seq, etc.)?
When consent allows, we expect all sequence reads to be deposited into an appropriate publicly accessible database along with complete “finished products”; for example, all peaks and their associated statistics for a ChIP-Seq study, or all transcript models and their associated expression levels for an RNA-Seq study. For studies in which consent does not allow unrestricted sharing, we expect at a minimum the locations and summary statistics for all features (peaks, transcripts, etc.) directly relevant to the claims at hand, along with enough information about other loci (randomly selected or otherwise) to allow readers and reviewers to evaluate the global quality and accuracy of the data and any claims about significance in a whole-genome context. As stated above, sharing of specific data can often be achieved with a minimal risk to privacy by unlinking variants or transcripts from the individual/sample IDs in which they were identified.
As described in the PLOS Genetics editorial, the minimal amount of data required for to evaluate a manuscript sharing will necessarily depend on the substance of the claims and conclusions. For example, a novel method for quantifying RNA or identifying transcription-factor binding sites is certain to require unrestricted access to all sequence reads. On the other hand, applying standard methods to examine genomic regulatory mechanisms in a large cohort of affected individuals may not require full sharing of sequence reads assuming aggregate quality measures are available and sufficient numbers of specific examples are shared so as to allow readers to evaluate general data quality, accuracy, and significance.
What is expected for next-generation sequencing studies (genetic, functional genomic, or otherwise) in model organisms?
We expect all reads and key intermediate or finished products (e.g., variant lists, peaks, or transcript level measurements) to be made publicly available.
Isn’t the statement about summary statistics (for, e.g., GWAS and next-gen sequencing studies) at odds with the PLOS-wide policy that all data must be shared?
As described in the PLOS Genetics editorial, the question of what constitutes “raw data” is dynamic, context-dependent, and largely dictated by community standards and the associated implications for functional utility. Further, it must be recognized that unrestricted sharing is often impossible due to ethical, legal, and privacy concerns, but that such restrictions do not necessarily prevent meaningful or useful result and data sharing consistent with the PLOS Genetics mission. At the same time, it is important to recognize that the larger genetics community can use data in unforeseen and ingenious ways; thus, erring on the side of sharing more not less should be the norm.
What if my open access data question isn’t covered in this FAQ?
The FAQ is intended to be a dynamic document, providing not only guidelines but also serving as a platform to develop solutions for the community. Specific questions regarding submitted manuscripts should be directed to the editor handling the submission via the journal office. General questions, clarifications, or suggestions of additional topics to address in this FAQ are welcomed; if you would like to contribute, please contact the journal staff (email@example.com).
PLOS has formed an external board of advisors across many fields of research published in PLOS journals. This board will work with us to develop community standards for data sharing across various fields, provide input and advice on especially complex data-sharing situations submitted to the journals, define data-sharing compliance, and proactively work to refine our policy. If you have any questions or feedback, we welcome you to write to us at firstname.lastname@example.org.