Researcher Perspectives on Publication and Peer Review of Data

Data “publication” seeks to appropriate the prestige of authorship in the peer-reviewed literature to reward researchers who create useful and well-documented datasets. The scholarly communication community has embraced data publication as an incentive to document and share data. But, numerous new and ongoing experiments in implementation have not yet resolved what a data publication should be, when data should be peer-reviewed, or how data peer review should work. While researchers have been surveyed extensively regarding data management and sharing, their perceptions and expectations of data publication are largely unknown. To bring this important yet neglected perspective into the conversation, we surveyed ∼ 250 researchers across the sciences and social sciences– asking what expectations“data publication” raises and what features would be useful to evaluate the trustworthiness, evaluate the impact, and enhance the prestige of a data publication. We found that researcher expectations of data publication center on availability, generally through an open database or repository. Few respondents expected published data to be peer-reviewed, but peer-reviewed data enjoyed much greater trust and prestige. The importance of adequate metadata was acknowledged, in that almost all respondents expected data peer review to include evaluation of the data’s documentation. Formal citation in the reference list was affirmed by most respondents as the proper way to credit dataset creators. Citation count was viewed as the most useful measure of impact, but download count was seen as nearly as valuable. These results offer practical guidance for data publishers seeking to meet researcher expectations and enhance the value of published data.

Your responses will be recorded anonymously, unless you choose to provide your contact information. In that case, information linking you to your responses will be restricted to only two researchers. We will maintain confidentiality when reporting survey results. Results will be compiled in a peerreviewed publication, presentation(s), and report(s). Absolute confidentiality cannot be guaranteed, because research documents are subject to subpoena.
There is no direct benefit to you anticipated from participation in this study. However, results from this study will be used in the development of services to encourage and support sharing of research data.
Your participation in this research is voluntary, and you may decline to participate without risk. While it is useful to be complete in your responses to the survey, you are free to withdraw from the survey at any time.

Are you familiar with any data journals? If so, what titles?
A data journal is a journal that publishes datasets and/or data papers. A data paper describes a dataset, including the collection and processing methods used and the rationale, but doesn't provide any analysis or attempt to draw any conclusions.

Data Sharing
6. How important is it to share data that underlies a published study? Mark only one oval. 10. Which of the following did you include with your data? Check all that apply. Check all that apply.
A traditional research paper based on the data (with analysis and conclusions) A data paper describing the data (without analysis or conclusions) Informal text describing the data Formal metadata describing the data (e.g. as XML) Computer code used to process or generate the data Shared with no additional documentation Other: 11. Have you been listed as an author on a peerreviewed paper in the last 5 years? Mark only one oval.

Yes
No Not sure / Not applicable 12. Have you generated any data in the last 5 years? Mark only one oval.

No
Not sure / Not applicable 13. If yes, roughly what percentage of the data is publicly available? (e.g. in a public database, as supplemental material, or on your lab's website) Include newly generated data that is not yet public, but that will be made available soon.
14. Has your data been reused by anyone outside your research group / collaborators? * (e.g. reanalysed, used to draw new conclusions, or incorporated into a larger dataset) Mark only one oval. 22. If a colleague described one dataset as "published" and another as "shared", how might you expect the "published" dataset to differ from the "shared" one? Check all that apply. Check all that apply.
Openly available, without contacting the author(s)

Deposited in a database or repository
Assigned a unique identifier, such as a DOI A traditional research paper is based on the data A data paper (without conclusions) describes the data Packaged with a thorough description of the data Packaged with formal metadata describing the data (e.g. as XML) Dataset is "peer reviewed" I don't see any difference Other: 23. If a colleague described a dataset as "peer reviewed", which of the following would you expect to have been part of the process? Check all that apply. Check all that apply.

Collection and processing methods were evaluated
Descriptive text is thorough enough to use or replicate the dataset Necessary metadata is standardized (e.g., in XML) Technical details have been checked (e.g., no missing files, no missing values) Plausibility considered, based on area expertise Novelty/impact considered Other:

Assessing data
24. If you were thinking about using someone else's dataset in your work, how much confidence in the data would each of these things inspire? Mark only one oval per row.

Complete confidence
High confidence Some confidence

Little confidence
No confidence Described in a traditional paper (with conclusions) Described in a data paper (description only) Peerreview of the dataset Reused by others 25. In assessing a dataset's value/impact, how useful is each of the following metrics?
Mark only one oval per row.