Make Data Sharing Routine to Prepare for Public Health Emergencies

Jean-Paul Chretien and colleagues argue that recent Ebola and Zika virus outbreaks highlight the importance of data sharing in scientific research.

In February 2016, Wellcome Trust organized a pledge among leading scientific organizations and health agencies encouraging researchers to release data relevant to the Zika outbreak as rapidly and widely as possible [1]. This initiative echoed a September 2015 World Health Organization (WHO) consultation that assessed data sharing during the recent West Africa Ebola outbreak and called on researchers to make data publicly available during public health emergencies [2]. These statements were necessary because the traditional way of communicating research results-publication in peer-reviewed journals, often months or years after data collection-is too slow during an emergency.
The acute health threat of outbreaks provides a strong argument for more complete, quick, and broad sharing of research data during emergencies. But the Ebola and Zika outbreaks suggest that data sharing cannot be limited to emergencies without compromising emergency preparedness. To prepare for future outbreaks, the scientific community should expand data sharing for all health research.

Summary Points
• The recent outbreaks caused by Ebola and Zika viruses highlighted the importance of medical and public health research in accelerating outbreak control and prompted calls for researchers to share data rapidly and widely during public health emergencies.
• Effective preparation for emergencies requires the routine practice of data sharing in scientific research.
• Key impediments to data sharing, such as long-standing academic norms and human and technical resource limitations, cannot immediately be surmounted when an emergency occurs.
• Ongoing research that does not directly relate to an emergency now may be critical for the next unpredictable outbreak.
• As part of emergency preparedness, the scientific community should support ongoing initiatives that address major obstacles to data sharing and should embrace open science practices in both emergency and nonemergency research.

Open Science, Ebola, and Zika
Recent calls for data sharing during public health emergencies can be viewed as part of a broader movement towards open science (Box 1). In the health sciences, an important milestone for openness was achieved 20 years ago, as genetic sequencing began to generate massive amounts of data and scientists agreed to deposit sequences in public databases almost as they were produced. Encouraged by the discoveries this facilitated, life science leaders convened summits that extended the call for openness to other types of datasets [8]. Major public health research funders agreed to increase the availability of research data and to promote the use of those data to accelerate advances in public health [9]. Today, the movement towards open science is evident across the health sciences landscape (Box 2), including recent emergencies.
During the Ebola outbreak, researchers unaffiliated with official response efforts transformed surveillance reports into machine-readable formats and shared them in public repositories [20], and some teams assisting the response rapidly deposited Ebola virus genetic sequences into public databases [21]. These efforts allowed many scientists to contribute analytical insights-80% of peer-reviewed epidemiological modeling studies published during the outbreak used only open data [22]. Many researchers also shared computer code of their models online.
Pharmaceutical company leaders acknowledged that "depending on the circumstances for the emergency, preliminary data could be made available with clear descriptions of the verifications that are ongoing and the remaining risks to data integrity" [23]. WHO officials noted that research teams generated and exchanged critical data for novel vaccines faster than ever [24].
As the Zika epidemic highlighted major deficiencies in knowledge of the virus and disease, leading scientific journals agreed to make all Zika-related content free to access and not to penalize submissions for prepublication release of data or results [1]. Scientists organized a call for papers describing and releasing datasets related to Zika, to be considered for online publication in a peer-reviewed journal [25]. As during Ebola, scientists established a public repository

Box 1. Open Science
Various definitions of open science converge on the concept of unlimited access to all aspects of research, to allow anyone to follow, use, and participate in science. Open science comprises a growing list of other "opens," such as open access (scholarly literature not only is freely available online but may be reproduced, distributed, and otherwise reused by others, typically according to the terms of a public copyright license that accompanies the article); open data (data, including data underlying publications, are freely available online and may be used and shared); and open source (software is freely available and may be modified and distributed) [3][4][5]. Sometimes, scientific products meet criteria for open access, open data, or open source after some time has elapsed since their production (e.g., "delayed open access" journals make content available only to subscribers initially and then make it open access later, typically after 1-2 years). An expansive version of open science is open notebook science, in which the entire primary record of research, including the researcher's personal or laboratory notebook, is freely available online as it is recorded [6]. Open government aims to improve citizens' access to government data and proceedings [7] and advances open science, especially for government-funded research.
for sharing Zika data [26]. One leading virology laboratory, inspired by rapid sharing of genomic data during the Ebola response, is releasing data from its animal model experiments with Zika virus online in real time [27].

Data-Sharing Challenges
Despite these successes, the Ebola and Zika responses also highlight openness challenges for effective data sharing. Three major impediments limit data sharing and provide compelling reasons why emergency preparedness requires data sharing before emergencies occur.
First, there are no established standards for data users to credit data providers. In one example, researchers in Brazil who deposited Zika virus genome sequences in a public database felt they were not credited appropriately when another group used those sequences for a paper published 2 weeks later [28].
The scientific community has not yet established standards that could have prevented the disagreement. In one survey of clinical and basic science researchers, 50% of those who shared data were not credited in any way in the resulting publication or were recognized only in the

Box 2. Examples of Increasing Openness in Health Sciences
Funding agencies: Major research sponsors have implemented policies that encourage or require data sharing. In 2003, the United States National Institutes of Health (NIH) began requiring a data-sharing plan for grant applications with annual costs over US $500,000; a 2013 national survey found that 65% of life science researchers thought the NIH policies had been influential in increasing data sharing [10]. The US Centers for Disease Control and Prevention also adopted a data-sharing policy in 2003 [11], more recently requiring grantees to include a data release plan [12]. The Bill & Melinda Gates Foundation, beginning in 2017, will require peer-reviewed publications and underlying data to be open immediately on publication [13]. Open government initiatives [14] are increasing public access to government-held data, including data collected in scientific research.
Scientific journals: The proportion of articles indexed in PubMed that were freely available online within about a year of publication nearly doubled from 2006 to 2010, from 26% to 50% [15]. A search of the Scopus database in April 2014 estimated that 71% of biomedical research papers published during 2011-2013 were freely available online (though only about a quarter of these were made available immediately on publication by the publisher or author) [16]. At the same time, several prominent journals now encourage data sharing and require a statement about data accessibility [17]; PLOS, beginning in 2014, required authors to make all underlying data publicly available on publication for its family of journals. In 2016, the International Committee of Medical Journal Editors (ICMJE) proposed the requirement that authors submitting clinical trial reports to ICMJE member journals make the deidentified individual patient data underlying the study available within 6 months of publication [18].
Scientists: Some individual researchers and institutions have adopted nearly comprehensive openness. For example, to accelerate discovery in neuroscience, the Montreal Neurological Institute and Hospital of McGill University is beginning an unprecedented 5-year experiment in openness during which it will make all data and results freely available and will not seek patents [19].
acknowledgments section [29]. Opinions diverge over whether data providers should review results before publication, collaborate on the analysis, approve the analysis plan in advance, or limit conditions of data reuse [30]. Community-wide standards are needed so that the risk of uncredited secondary analysis will not dissuade scientists from sharing.
Second, scientists may doubt that sharing data will advance their scholarly stature as much as publishing primary research. During the Ebola response, some researchers waited weeks or months before releasing Ebola virus genomic data [21]. Their motivations are unknown, but fear of granting a competitive advantage to other scientists is a deterrent to sharing in the usual course of scientific research [31] and likely explains some data-sharing failures during the outbreak [32].
In a national US survey, 28% of life scientists reported intentionally delaying publication by more than 6 months to protect scientific primacy or for other nontechnical reasons. Some of them may have drawn lessons from experience: 25% of those who had shared data, information, or materials reported they had been "scooped" by another scientist [33]. A PLOS Medicine editorial succinctly summarized the challenge, which applies in emergency and nonemergency settings: "as long as authorship of individual published reports is perceived to confer greater reward than generating and sharing the data that underlie them, a disincentive to share data will persist" [34].
Third, scientists may not be able to share data effectively because of inadequate technology, standards, or human capacity. One of the reasons researchers could share genetic sequences effectively during the Ebola and Zika outbreaks, besides longstanding openness norms in the community, was their familiarity with public databases designed for such data (e.g., GenBank). Widely accepted central databases do not exist for other types of research data. Clinical trial data, for example, mostly reside in independent databases and are collected under various standards [35]. Some platforms are little more than "data dumpsters" without the metadata, data dictionaries, or documentation required for responsible analysis [36]. Any data-sharing arrangement faces the challenge of protecting patient privacy while preserving the usefulness of the data shared, a topic of active methodological research.
Obstacles are even more significant in lower-resource settings [37]. A review of the Ebola response found that affected countries lacked integrated standards for data collection and that "data were not aggregated, analyzed, or shared in a timely manner and in some cases not at all" [38]. In Sierra Leone, for example, inadequate standards allowed a date to refer ambiguously to when data was collected, submitted, or edited [39]. Sharing data in a useful way requires staff time, technical infrastructure, and human capacities that may not be available in low-resource settings. These essential elements of effective data sharing cannot be expected to materialize during a crisis.

Preparing for the Next Surprise
Open data deserves recognition and support as a key component of emergency preparedness. Initiatives to facilitate discovery of datasets and track their use [40][41][42]; provide measures of academic contribution, including data sharing that enables secondary analysis [43]; establish common platforms for sharing and integrating research data [44]; and improve data-sharing capacity in resource-limited areas [45] are critical to improving preparedness and response.
Research sponsors, scholarly journals, and collaborative research networks can leverage these new opportunities with enhanced data-sharing requirements for both nonemergency and emergency settings. A proposal to amend the International Health Regulations with clear codes of practice for data sharing warrants serious consideration [46]. Any new requirements should allow scientists to conduct and communicate the results of secondary analyses, broadening the scope of inquiry and catalyzing discovery. Publication embargo periods, such as one under consideration for genetic sequences of pandemic-potential influenza viruses [47], may lower barriers to data sharing but may also slow the timely use of data for public health.
Integrating open science approaches into routine research should make data sharing more effective during emergencies, but this evolution is more than just practice for emergencies. The cause and context of the next outbreak are unknowable; research that seems routine now may be critical tomorrow. Establishing openness as the standard will help build the scientific foundation needed to contain the next outbreak.
Recent epidemics were surprises-Zika and chikungunya sweeping through the Americas; an Ebola pandemic with more than 10,000 deaths; the emergence of severe acute respiratory syndrome and Middle East respiratory syndrome, and an influenza pandemic (influenza A [H1N1]pdm09) originating in Mexico-and we can be sure there are more surprises to come. Opening all research provides the best chance to accelerate discovery and development that will help during the next surprise.