Citation: Buckee C, Balsari S, Schroeder A (2022) Making data for good better. PLOS Digit Health 1(1): e0000010. https://doi.org/10.1371/journal.pdig.0000010
Editor: Leo Anthony Celi, Massachusetts Institute of Technology, UNITED STATES
Published: January 18, 2022
Copyright: © 2022 Buckee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors’ related research is supported by grants from the Harvard Data Science Initiative (CB,SB), Schmidt Futures (CB,SB), Google.org (CB,SB,AS) and Facebook Data for Good (AS). The funders had no role in the decision to publish, or in the preparation of the manuscript.
Competing interests: The authors are co-directors of CrisisReady.io, a research-response platform at Harvard and Direct Relief advancing data driven decision-making in crisis. CrisisReady funders had no role in the decision to publish, or in the preparation of the manuscript.
Today’s societies produce vast—and increasing—amounts of digital data “exhaust” from daily human activities such as the use of mobile devices, wearables and home sensors; store purchases; and online engagement on social media. Such data have historically been used by corporations to sell products and make life more convenient (even if in unevenly distributed ways), and in limited academic circles, to solve public heath challenges. During the COVID-19 pandemic, however, technology companies started making aggregated human mobility datasets widely available, as part of their corporate social responsibility efforts or “data for good” programs.[1,2] Many companies, researchers, and policy makers unfamiliar with the academic literature realized—for the first time—the potential use of digital data from mobile phones to monitor social distancing and other emergency public health measures. An avalanche of social distancing dashboards, prediction tools, heat-maps, digital contact tracing programs, and symptom-based COVID-19 prediction apps, followed. Despite the long standing excitement about the potential for digital tools, Big Data and AI to transform our lives, these innovations–with some exceptions–have so far had little impact on the greatest public health emergency of our time.[4,5]
Attempts to use digital data streams to rapidly produce public health insights that were not only relevant for local contexts in cities and countries around the world, but also available to decision makers who needed them, exposed enormous gaps across the translational pipeline. The insights from novel data streams which could help drive precise, impactful health programs, and bring effective aid to communities, found limited use among public health and emergency response systems. We share here our experience from the COVID-19 Mobility Data Network (CMDN), now Crisis Ready (crisisready.io), a global collaboration of researchers, mostly infectious disease epidemiologists and data scientists, who served as trusted intermediaries between technology companies willing to share vast amounts of digital data, and policy makers, struggling to incorporate insights from these novel data streams into their decision making. Through our experience with the Network, and using human mobility data as an illustrative example, we recognize three sets of barriers to the successful application of large digital datasets for public good.
First, in the absence of pre-established working relationships with technology companies and data brokers, the data remain primarily confined within private circuits of ownership and control. During the pandemic, data sharing agreements between large technology companies and researchers were hastily cobbled together, often without the right kind of domain expertise in the mix. Second, the lack of standardization, interoperability and information on the uncertainty and biases associated with these data, necessitated complex analytical processing by highly specialized domain experts.[9,10] And finally, local public health departments, understandably unfamiliar with these novel data streams, had neither the bandwidth nor the expertise to sift noise from signal. Ultimately, most efforts did not yield consistently useful information for decision making, particularly in low resource settings, where capacity limitations in the public sector are most acute.[11,12]
We remain hopeful that the vast data sets that people generate every day will be extraordinarily useful for crisis response. For example, data on population mobility provide critical information about population displacement and travel patterns. Satellite imagery and power outage data can help estimate, in near real time, infrastructure disruption. Data from electronic medical records, pharmacies and insurance companies can help map where the medically vulnerable are, what real-time bed capacity is, and what the needs of evacuating communities look like, so that host communities and health systems can better prepare for receiving populations in distress. While we contend that many of the efforts to harness digital data for COVID-19 response did not meet their goals, we also believe that the pandemic expanded awareness of, and access to, novel data streams to a broad range of researchers and policy makers, and are likely to become routinely used for public health in the future.
What would it take for the “Data for Good” agenda to achieve its promised benefits for communities impacted by crises? Substantial work is needed along the entire translational pipeline to understand which data streams and methodologies are most helpful for different response efforts. Most digital data, unless collected on a specific app, are not generated for public health purposes. In order to use them for disaster response the data sharing arrangements between data providers, academic partners, and public health agencies must be in place prior to the disaster. The important privacy concerns raised by the use of individually identifiable digital data necessitate aggregation and anonymization–proprietary processes that often dilute the epidemiological or clinical utility of the data. There is therefore need to standardize these approaches across industries, or for end-users to at least be familiar with the methodology in advance of acute disasters, so as to be able to efficiently combine data from multiple sources. In the absence of existing regulatory frameworks supporting the ethical use of personal data, corporations, researchers and policy makers need to be incentivized to use the data responsibly to derive meaningful insights. The processed data and analysis must finally be shared in near real-time, in the right format, with the right people. Most of these requirements are lacking in public health contexts.
The CMDN’s focus was on the use of digital human mobility data to provide insights into the impact of social distancing, lockdown, and travel restriction policies. Every evening for several months, our researchers shared daily updates from analysis using aggregated population mobility data from social media platforms and telecom companies with local public health partners, explaining the utility and limits of the analysis, and learning what decision makers needed. Perhaps our most important lesson learned from this experience was that translational impact relies on distributed capacity and trusted partnerships.
Currently, the possibilities for improving disaster response with new data and advanced analytics vastly outstrip the ability of most disaster response agencies to employ them. In the wake of 2020, there have been calls for increasing the number of individuals who are trained in epidemiology around the world and can respond to epidemics. We are calling for a similar investment in global technical capacity to translate new data streams into better public health response during crises. Data “bilinguals”, or individuals who have the ability to apply novel digital data to generate specific, contextually relevant, policy-relevant insights, during disasters, will be essential. These individuals may be academics or people working in government agencies or NGOs who have a technical understanding of the appropriate use of digital data streams, and who can work with decision makers on a specific problem. On the policy side, greater sensitization of the potential (and pitfalls) of using these data at various levels of government will facilitate the integration of new data streams into decision making. Data preparedness must become an integral component of disaster preparedness exercises, so data access, methodology and applications are negotiated prior to a disaster.
We conclude that regional collaboration among scientists embedded in or working closely with local response agencies, but supported by a global network of peers, will accelerate the use of these data to help our communities. Without this type of sustained, broad-based and equitably distributed capacity investment, no amount of additional data or improved methodology is likely to result in substantial gains for disaster-affected communities.
The authors thank the scientists, policy makers and technology partners who participated in the COVID-19 Mobility Data Network for their collaboration.
- 1. Facebook Data for Good Mobility Dashboard. Covid-19 Mobility Data Network. [Cited 2021 November 10]. Available from https://visualization.covid19mobility.org/?region=WORLD
- 2. Google COVID-19 Community Mobility Reports. [Cited 2021, November 10] Available from https://www.google.com/covid19/mobility/
- 3. Unacast Social Distancing Scoreboard. [Cited 2021 Nov 10]. Available from https://www.unacast.com/post/the-unacast-social-distancing-scoreboard
- 4. Ienca M., Vayena E. On the responsible use of digital data to tackle the COVID-19 pandemic. Nat Med 26, 463–464 (2020). pmid:32284619
- 5. Budd J., Miller B.S., Manning E.M. et al. Digital technologies in the public-health response to COVID-19. Nat Med 26, 1183–1192 (2020). pmid:32770165
- 6. Balsari S., Kiang MV, Buckee CO. Data in Crisis—Rethinking Disaster Preparedness in the United States. N Engl J Med 2021; 385:1526–1530 pmid:34469643
- 7. Covid-19 Mobility Data Network. [Cited 2021, November 23] Available from https://covid19mobility.org
- 8. Balsari S, Buckee CO, Khanna T. Which Covid Data Can You Trust? Harvard Business Review. May 8, 2020. [Cited 2021, November 27] Available from https://hbr.org/2020/05/which-covid-19-data-can-you-trust
- 9. Grantz KH, Meredith HR, Cummings DAT, Metcalf CJE, Grenfell BT, Giles JT, et al. The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology. Nat Commun 2020 Sep 30;11(1):4961. pmid:32999287
- 10. Kishore Nishant, Taylor AR, Jacob PE, Vembar V, Ted Cohen T, Buckee CO, et al. Evaluating the reliability of mobility metrics from aggregated mobile phone data as proxies for SARS-CoV-2 transmission in the USA: a population-based study. Lancet Digit Health 2021 Nov 2;S2589-7500(21)00214-4. pmid:34740555
- 11. Holmdahl I, Buckee C. Wrong but Useful—What Covid-19 Epidemiologic Models Can and Cannot Tell Us. N Engl J Med 2020; 383:303–305 pmid:32412711
- 12. Hogan K, Macedo B, Macha V, Barman A, Jiang X. Contact Tracing Apps: Lessons Learned on Privacy, Autonomy, and the Need for Detailed and Thoughtful Implementation. JMIR Med Inform. 2021 Jul 19;9(7):e27449. pmid:34254937
- 13. Mobility Data Collaborative. Future of Privacy Forum. Mobility Data Sharing Assessment Operator’s Manual. August 2021. [Cited 2021 November 10] Available from https://fpf.org/wp-content/uploads/2021/08/2-MDSA-Operators-Manual.pdf
- 14. Contracts for Data Collaboration. [Cited 2021, November 10] Available from https://contractsfordatacollaboration.org/library/
- 15. Chan J., Bateman L., Bhatia A., Bruni K. CMDN User Feedback Briefs. CMDN. May 5 2021. [Cited 2021, November 10] Available from https://www.crisisready.io/publications/cmdn-user-feedback-briefs/
- 16. O’Carroll PW, Kirk MD, Reddy C, Morgan OW, Baggett HC. The Global Field Epidemiology Roadmap: Enhancing Global Health Security by Accelerating the Development of Field Epidemiology Capacity Worldwide. Health Secur. 2021 May-Jun;19(3):349–351. Epub 2021 Apr 30. PMCID: PMC8217588. pmid:33944584
- 17. Winowatan M. Bilingual. The Living Library, GovLab, NYU Tandon School of Engineering. [Cited 2021, November 23] Available from https://thelivinglib.org/bilingual/
- 18. CrisisReady Rapid Response Grants [Cited 2021, November 27] Available from https://www.crisisready.io/work/rapid-response-grants/