Citation: Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
Published: September 2, 2008
Copyright: © 2008 Piwowar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported in part by contracts from the National Cancer Institute Center for Bioinformatics caBIG Program to the University of Pittsburgh (#79207CBS10) and University of Pennsylvania (# 79580CBS10), and in part by National Library of Medicine Training Grant Number 5T15-LM007059-19. The funders had no role in the decision to submit the article or in its preparation.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: AHC, academic health center; caBIG, Cancer Biomedical Informatics Grid; IRB, institutional review board; NIH, National Institutes of Health
Provenance: Not commissioned; externally peer reviewed
Sharing biomedical research and health care data is important but difficult. Recognizing this, many initiatives facilitate, fund, request, or require researchers to share their data [1–5]. These initiatives address the technical aspects of data sharing, but rarely focus on incentives for key stakeholders . Academic health centers (AHCs) have a critical role in enabling, encouraging, and rewarding data sharing. The leaders of medical schools and academic-affiliated hospitals can play a unique role in supporting this transformation of the research enterprise. We propose that AHCs can and should lead the transition towards a culture of biomedical data sharing.
Benefits of Data Sharing for Academic Health Centers
The benefits of data sharing and reuse have been widely reported. We summarize them here, from the perspective of an AHC.
The predominant benefit of data sharing is accelerated scientific progress. Advances are clearly valuable to an AHC when translated into improved patient outcomes, reduced research costs, and decreased time in moving discoveries from the bench to the bedside.
Of more immediate benefit to AHCs and their researchers, sharing data increases the visibility and relevance of research output. Sharing data generates opportunities for additional publications through collaboration, and may increase the citation rate of primary publications . Since publication history and citation impact are often considered in future funding decisions, these benefits are likely to accelerate research programs, and thus enhance the reputation of the academic institutions.
Data sharing can also benefit an AHC in its roles of educator and employer. Health care professionals trained in clinical informatics  benefit from exposure to real-world data. By embracing data sharing goals, an AHC becomes more appealing to cutting-edge researchers , and thereby more able to recruit the talent required for future successes.
Finally, the widespread adoption of a data sharing culture needs leaders , and thus provides an opportunity for AHCs to demonstrate excellence.
A Leadership Role
Despite the anticipated benefits, sharing research data has yet to be widely adopted in biomedicine [11,12]. Through their interwoven roles in education, research, and policy, AHCs can lead the development of best practices for establishing a data sharing culture. Practical steps with potentially powerful impact are discussed below and summarized in Box 1.
Box 1: Recommendations for Academic Health Centers to Encourage Data Sharing
- Commit to sharing research data as openly as possible, given privacy constraints. Streamline IRB, technology transfer, and information technology policies and procedures accordingly.
- Recognize data sharing contributions in hiring and promotion decisions, perhaps as a bonus to a publication's impact factor. Use concrete metrics when available.
- Educate trainees and current investigators on responsible data sharing and reuse practices through class work, mentorship, and professional development. Promote a framework for deciding upon appropriate data sharing mechanisms.
- Encourage data sharing practices as part of publication policies. Lobby for explicit and enforceable policies in journal and conference instructions, to both authors and peer reviewers.
- Encourage data sharing plans as part of funding policies. Lobby for appropriate data sharing requirements by funders, and recommend that they assess a proposal's data sharing plan as part of its scientific contribution.
- Fund the costs of data sharing, support for repositories, adoption of sharing infrastructure and metrics, and research into best practices through federal grants and AHC funds.
- Publish experiences in data sharing to facilitate the exchange of best practices.
Measure, recognize, and reward data sharing contributions.
The lack of recognition incentives is regarded as a crucial and unresolved obstacle to establishing a data sharing culture [13,14]. All research institutions, including AHCs, should develop and track metrics for data sharing contributions as part of their academic research environments. Data sharing contributions should be explicitly considered during hiring, tenure, and promotion decisions , perhaps by providing a bonus to a publication's impact factor if the authors have shared the raw research data. Promotion committees should encourage investigators to list their shared datasets on their CVs, in their grant applications, and anywhere they communicate information about their research accomplishments.
Department chairs should encourage their faculty to monitor the purposes for which their data are reused. This would allow investigators to quantify the value of their contribution, as well as personally motivate future sharing . To this end, we encourage the development and general adoption of a data sharing citation index, a concrete metric for tracking the reuse and citation of datasets, as envisioned by the Cancer Biomedical Informatics Grid (caBIG) Data Sharing and Intellectual Capital Workspace and others [17,18].
Integrate data sharing education into curricula and practice.
Data sharing must be articulated as a foundational principle of research conduct. Standardized and comprehensive education is likely to be an important factor in decreasing data withholding ; data sharing should be included in the curricula of introductory research courses and throughout mentored research. Discussing the ethics of data sharing in clinical and translational research during medical training and graduate research studies can cement a deeper “appreciation that sharing of raw data may lead to techniques or findings or further research that could help alleviate human distress” . Simultaneously, education must appropriately place data sharing within the context of the federal regulations that guard protected health information [20,21] and the ethical obligation to maintain patient privacy by highlighting the distinction between openly sharable scientific data and protected health information.
Addressing these subjects at institution-wide colloquia, as case studies in ethics seminars, or as satellite symposia  will provide scientists an opportunity to hear viewpoints they might not otherwise consider. Topics could include the ethical obligation to patients to both maintain privacy and achieve the maximum authorized scientific benefit [19,23,24], the personal struggles felt by investigators when trusting peers to be responsible in data reuse , and the impact of reorienting discussions from data ownership to data control .
AHCs also play a vital role in educating researchers about the consumer side of the data sharing relationship—responsible data reuse. AHC policies, best-practice guidelines, and guided mentorship can help new trainees take advantage of the enormous opportunities when reusing data while avoiding misappropriation and misinterpretation. Furthermore, understanding the needs and benefits of data reuse will inspire investigators to share their own data with the documentation and annotations that make it most useful for future reuse.
Recommend best-practice mechanisms for data sharing.
As biomedical funders begin to require data sharing plans, they often leave the mechanism for data sharing unspecified. Although this choice provides valuable flexibility, the myriad of options can be daunting for investigators. The choice is important: an appropriate mechanism is crucial for effective and rewarding data sharing.
An AHC's office of research can help its investigators choose best-practice solutions by recommending a framework for evaluating data sharing alternatives. To develop such a framework, IRB (institutional review board) directors, chief privacy and security officers, chief information and technology officers, technology transfer officers, and a wide range of patient advocates and investigators must articulate the trade-offs inherent in various models from the perspectives of privacy, security, intellectual property, scalability, openness, and equity across the complete spectrum of stakeholders . We illustrate three dimensions of these trade-offs in Table 1, and recommend several excellent reviews for further reading [27–29].
Fund and maintain infrastructure for data sharing.
Education, training, and support are needed again once a scientist has decided to share data. Investigators may appreciate detailed suggestions on what to include in a data sharing plan, such as those provided by the National Institutes of Health (NIH)  and caBIG . Mentorship and training through the institution's research office are also crucial when estimating a data sharing budget, since “currently, these costs are chronically underestimated and under-awarded” . This funding is crucial to pay for the process of sharing data.
It is often difficult for investigators to decide where to share types of data that do not have a public, centralized, and well-recognized database. We recommend that research leadership in AHCs support solutions that optimize data persistence, visibility, ease of interpretation and integration, privacy, accountability, and openness. Such solutions could involve participating in data sharing collaborative projects, choosing information technology solutions that facilitate data sharing and provide required access logs, hosting data sources that do not have a more appropriate home, adopting syntactic and semantic standards , providing consultation to investigators who need help sharing their research effectively, encouraging participation in professional societies such as the HealthGrid (http://www.healthgrid.org/), or lobbying for national networked infrastructure .
Revise policies and guidelines to reflect data sharing goals.
We encourage AHCs to recognize the importance of data sharing across the organization, and then take steps to harmonize all relevant policies and guidelines with their data sharing goals. Many of the issues are clear, such as ensuring that data sharing goals are consistent with material transfer agreements, industrial partnerships, intellectual property policies, technology-transfer guidelines, IRB review criteria, and de-identification tools and policies. Other issues are often overlooked. For example, AHCs need to ensure that data sharing agreements contain appropriate remedies and are enforced whenever investigators are unwilling or unable to fulfill their commitments .
Today's spirit of translational research does not stop at the boundaries of the AHC. Departments of physics and computer science have a successful history of data sharing and may be able to provide guidance. Other departments within science, engineering, business, librarianship, and law are addressing the same issues; it may be possible to forge alliances that advance data sharing. Involving key officials at the University level, such as Vice Presidents of Research and university legal counsel, could yield more consistent policies across campus.
Engage national leadership in data sharing decisions.
AHCs are actively involved with many members of the biomedical community. Firmly establishing a data sharing culture will require joint efforts between AHCs, funders, publishers, academic societies, industry, legislators, patient advocates, clinicians, and researchers. We recommend that AHC faculty and staff leverage their roles in the community to promote philosophies and policies that facilitate data sharing. This could involve promoting new funding mechanisms to support data sharing and data archiving , working with journal editors to raise the level of data sharing deemed appropriate and necessary for publication , supporting legislation to encourage privacy-protected data sharing , developing standards for appropriate reuse of health care data [26,37], establishing grant review guidelines for evaluating data sharing plans as part of the scientific contribution of a proposal, expanding NIH guidance and support for data sharing across all data types , encouraging the study of incentives for team science , developing methods to quantify the extent and impact of data sharing and reuse, and finally, encouraging programs and funding that enable investigators to share data with accuracy, accountability, responsibility, and recognition . We further recommend that AHCs publish their experiences in data sharing to facilitate the development of best practices.
We recognize that there are real and perceived impediments to sharing biomedical research data. Some individual donors may have personal interests in privacy and confidentiality that exceed their desire to contribute to new methods of detecting and treating disease. Investigators may restrict access to data to maximize their professional and economic benefit. Academic health centers may view data sharing as a threat to intellectual property, possibly impeding entrepreneurial spin-offs and technology transfers that bring revenue and act as incubators for future research. AHCs may also worry that the data could be used to critique their health care practices rather than advance the research frontier. Industrial sponsorship can hinder plans for sharing data, and the regulatory environment may necessitate stringent oversight to ensure compliance and minimize risk.
These issues can and must be addressed as we work to embrace a data sharing culture. The hurdles may not be as high as we think: 99% of senior technology transfer officers at highly funded NIH universities agree that academic scientists should freely share data with other academic scientists after publication . The systems and architectures in Table 1 provide a future vision of research in which data are more universally available and interoperable. Recent initiatives for making research publications freely available [42–45] demonstrate a political and academic commitment “to help advance science and improve human health”  by widely sharing research results.
Academic health centers will benefit by leading the transition towards a culture of biomedical data sharing. More widespread awareness of these benefits can motivate key stakeholders to take concrete steps to enable, inspire, and reward data sharing within and beyond their institutions.
We thank the members of the caBIG Data Sharing and Intellectual Capital Workspace for their insightful comments.
- 1. National Institutes of Health (2003) Final NIH statement on sharing research data [NOT-OD-03-032]. Available: http://grants.nih.gov/grants/guide/notice-files/not-od-03-032.html. Accessed 30 July 2008.
- 2. National Institutes of Health (2007) Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies [NOT-OD-07-088]. Available: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed 30 July 2008.
- 3. caBIG Strategic Planning Workspace (2007) The Cancer Biomedical Informatics Grid (caBIG): Infrastructure and applications for a worldwide research community. Stud Health Technol Inform 12: 330–334.
- 4. Grethe JS, Baru C, Gupta A, James M, Ludaescher B, et al. (2005) Biomedical informatics research network: Building a national collaboratory to hasten the derivation of new understanding and treatment of disease. Stud Health Technol Inform 112: 100–109.
- 5. Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data.
- 6. [No authors listed] (2005) Let data speak to data. Nature 438: 531–531.
- 7. Piwowar HA, Day RS, Fridsma DB (2007) Sharing detailed research data is associated with increased citation rate. PLoS ONE 2: e308. doi:10.1371/journal.pone.0000308.
- 8. American Medical Informatics Association (2007) AMIA 10x10 Program. Available: http://www.amia.org/10x10. Accessed 30 July 2008.
- 9. Butler D (2007) Data sharing: The next generation. Nature 446: 10–11.
- 10. [No authors listed] (2007) Time for leadership. Nat Biotechnol 25: 821.
- 11. Blumenthal D, Campbell EG, Gokhale M, Yucel R, Clarridge B, et al. (2006) Data withholding in genetics and the other life sciences: Prevalences and predictors. Acad Med 81: 137–145.
- 12. Teeters J, Harris K, Millman K, Olshausen B, Sommer F (2008) Data sharing for computational neuroscience. Neuroinformatics 6: 47–55.
- 13. [No authors listed] (2007) Got data. Nat Neurosci 10: 931–931.
- 14. [No authors listed] (2007) Compete, collaborate, compel. Nat Genet 39: 931.
- 15. Davies HD, Langley JM, Speert DP (1996) Rating authors' contributions to collaborative research: The PICNIC survey of university departments of pediatrics. Pediatric Investigators' Collaborative Network on Infections in Canada. Can Med Assoc J 155: 877–882.
- 16. Rashid A, Ling K, Tassone R, Resnick P, Kraut R, et al. (2006) Motivating participation by displaying the value of contribution.
- 17. Altman M, King G (2007) A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine 13: 3/4. Available: http://gking.harvard.edu/files/cite.pdf. Accessed 5 August 2008.
- 18. Piwowar HA, Chapman WW (2008) Envisioning a data reuse registry [poster]. AMIA 2008 Annual Symposium. Available: http://sharescienceideas.wikispaces.com/Data+Reuse+Registry. Accessed 5 August 2008.
- 19. Vickers A (2006) Whose data set is it anyway? Sharing raw data from randomized trials. Trials 7: 15.
- 20. US Department of Health and Human Services (2003) Standards for privacy of individually identifiable health information and security standards for the protection of electronic protected health information (HIPAA privacy and security rules). 45 CFR Parts 160 and 164. Available: http://www.hipaadvisory.com/REGS/finalprivacy/. Accessed 30 July 2008.
- 21. US Department of Health and Human Services: Office for Human Research Protections (2005) Basic HHS policy for protection of human research subjects. 45 CFR Part 46 Subpart A. Available: http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm. Accessed 30 July 2008.
- 22. Liu Y, Ascoli GA (2007) Value added by data sharing: Long-term potentiation of neuroscience research: A commentary on the 2007 SfN Satellite Symposium on Data Sharing. Neuroinformatics 5: 143–145.
- 23. Foster MW, Sharp RR (2007) Share and share alike: Deciding how to distribute the scientific and social benefits of genomic data. Nat Rev Genet 8: 633–639.
- 24. Fienberg SE (1994) Sharing statistical data in the biomedical and health sciences: Ethical, institutional, legal, and professional dimensions. Annu Rev Public Health 15: 1–18.
- 25. (2001) ‘Send me all of your reagents and ideas. We want to work on the same experiments.’ By Caveman. J Cell Sci 114: 1037–1038. 25. [No authors listed].
- 26. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, et al. (2007) Toward a national framework for the secondary use of health data: An American Medical Informatics Association White Paper. J Am Med Inform Assoc 14: 1–9.
- 27. Committee on Responsibilities of Authorship in the Biological Sciences, National Research Council (2003) Sharing publication-related data and materials: Responsibilities of authorship in the life sciences. The National Academy of Sciences. Available: http://www.nap.edu/catalog.php?record_id=10613. Accessed 30 July 2008.
- 28. Sinnott RO, Macdonald A, Lord PW, Ecklund D, Jones A (2005) Large-scale data sharing in the life sciences: Data standards, incentives, barriers and funding models (The Joint Data Standards Study). The Biotechnology and Biological Sciences Research Council, The Department of Trade and Industry, The Joint Information Systems Committee for Support for Research, The Medical Research Council, The Natural Environment Research Council and The Wellcome Trust. Available: http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002552. Accessed 30 July 2008.
- 29. Lowrance W (2006) Access to collections of data and materials for heath research: A report to the Medical Research Council and the Wellcome Trust. Available: http://www.wellcome.ac.uk/About-us/Publications/Books/Biomedical-ethics/WTX030843.htm. Accessed 30 July 2008.
- 30. National Institutes of Health (2007) Guidance for developing data-sharing plans for GWAS. Available: http://grants.nih.gov/grants/gwas/gwas_data_sharing_plan.pdf. Accessed 30 July 2008.
- 31. caBIG (2008) Data sharing plan content guideline draft. Available: https://cabig.nci.nih.gov/working_groups/DSIC_SLWG/Documents/caBIG_Data_Sharing_Plan_Guideline_20080109.pdf. Accessed 30 July 2008.
- 32. Ball CA, Sherlock G, Brazma A (2004) Funding high-throughput data sharing. Nat Biotechnol 22: 1179–1183.
- 33. Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, et al. (2007) Advancing translational research with the Semantic Web. BMC Bioinformatics 8: S2.
- 34. Detmer DE (2003) Building the national health information infrastructure for personal health, health care services, public health, and research. BMC Med Inform Decis Mak 3: 1.
- 35. Theologis A, Davis RW (2004) To give or not to give? That is the question. Plant Physiol 135: 4–9.
- 36. Altman RB, Benowitz N, Gurwitz D, Lunshof J, Relling M, et al. (2007) Genetic nondiscrimination legislation: A critical prerequisite for pharmacogenomics data sharing. Pharmacogenomics 8: 519.
- 37. National Committee on Vital Health and Health Statistics (2007) Enhanced protections for uses of health data: A stewardship framework for “secondary uses” of electronically collected and transmitted health data. Available: http://www.centerforhit.org/x2125.xml. Accessed 30 July 2008.
- 38. National Institutes of Health (2007) Points to consider for IRBs and institutions in their review of data submission plans for institutional certifications under NIH's policy for sharing of data obtained in NIH Supported or conducted genome-wide association studies (GWAS). Available: http://grants.nih.gov/grants/gwas/gwas_ptc.pdf. Accessed 30 July 2008.
- 39. Haga S (2007) Exploring attitudes about data disclosure and data-sharing in genomics research. NIH grant number 1R03HG004312-01.
- 40. Gardner D, Toga AW, Ascoli GA, Beatty JT, Brinkley JF, et al. (2003) Towards effective and rewarding data sharing. Neuroinformatics 1: 289–295.
- 41. Campbell EG, Bendavid E (2003) Data-sharing and data-withholding in genetics and the life sciences: Results of a national survey of technology transfer officers. J Health Care Law Policy 6: 241–255.
- 42. Harvard Faculty of Arts and Sciences (2008) Harvard to collect, disseminate scholarly articles for faculty. Available: http://www.fas.harvard.edu/home/news_and_events/releases/scholarly_02122008.html. Accessed 30 July 2008.
- 43. SPARC (2008) Berkeley steps forward with bold initiative to pay authors' open-access charges. Available: http://www.arl.org/sparc/publications/articles/memberprofile-berkeley.shtml. Accessed 30 July 2008.
- 44. University of Wisconsin–Madison (2005) Seed money for open access publishing. Available: http://www.library.wisc.edu/scp/openaccess/response.html#fund. Accessed 30 July 2008.
- 45. University of North Carolina–Chapel Hill (2005) Open access authors' fund. Available: http://www.hsl.unc.edu/Collections/ScholCom/OAFundAnnounce.cfm. Accessed 30 July 2008.
- 46. National Institutes of Health (2008) Revised policy on enhancing public access to archived publications resulting from NIH-funded research [NOT-OD-08-033]. Available: http://grants.nih.gov/grants/guide/notice-files/not-od-08-033.html. Accessed 30 July 2008.