Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics

  • David A. Broniatowski ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Engineering Management and Systems Engineering, School of Engineering and Applied Science, The George Washington University, Washington, DC, United States of America, Institute for Data, Democracy and Politics, The George Washington University, Washington, DC, United States of America

  • Daniel Kerchner,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing – review & editing

    Affiliation George Washington University Libraries, The George Washington University, Washington, DC, United States of America

  • Fouzia Farooq,

    Roles Data curation, Formal analysis, Investigation, Software, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America

  • Xiaolei Huang,

    Roles Data curation, Investigation, Software, Writing – review & editing

    Affiliation Department of Computer Science, University of Memphis, Memphis, TN, United States of America

  • Amelia M. Jamison,

    Roles Formal analysis, Investigation, Validation, Writing – review & editing

    Current address: Department of Health, Behavior and Society, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States of America

    Affiliation Department of Family Science, Center for Health Equity, School of Public Health, University of Maryland, College Park, MD, United States of America

  • Mark Dredze,

    Roles Data curation, Investigation, Project administration, Supervision, Writing – review & editing

    Affiliation Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States of America

  • Sandra Crouse Quinn,

    Roles Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliation Department of Family Science, Center for Health Equity, School of Public Health, University of Maryland, College Park, MD, United States of America

  • John W. Ayers

    Roles Methodology, Visualization, Writing – review & editing

    Affiliation Division of Infectious Diseases and Global Public Health, University of California San Diego, La Jolla, CA, United States of America


12 Feb 2024: Broniatowski DA, Kerchner D, Farooq F, Huang X, Jamison AM, et al. (2024) Correction: Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics. PLOS ONE 19(2): e0298907. View correction


The COVID-19 pandemic brought widespread attention to an “infodemic” of potential health misinformation. This claim has not been assessed based on evidence. We evaluated if health misinformation became more common during the pandemic. We gathered about 325 million posts sharing URLs from Twitter and Facebook during the beginning of the pandemic (March 8-May 1, 2020) compared to the same period in 2019. We relied on source credibility as an accepted proxy for misinformation across this database. Human annotators also coded a subsample of 3000 posts with URLs for misinformation. Posts about COVID-19 were 0.37 times as likely to link to “not credible” sources and 1.13 times more likely to link to “more credible” sources than prior to the pandemic. Posts linking to “not credible” sources were 3.67 times more likely to include misinformation compared to posts from “more credible” sources. Thus, during the earliest stages of the pandemic, when claims of an infodemic emerged, social media contained proportionally less misinformation than expected based on the prior year. Our results suggest that widespread health misinformation is not unique to COVID-19. Rather, it is a systemic feature of online health communication that can adversely impact public health behaviors and must therefore be addressed.


On February 15, 2020, the Director General of the World Health Organization declared that the coronavirus disease 2019 pandemic (COVID-19) spurred an “infodemic” of misinformation [1]. This claim quickly became accepted as a matter of fact among government agencies, allied health groups, and the public at-large [210]. For instance, during the past year over 15,000 news reports archived on Google News refer to a COVID-19 “infodemic” in their title and about 5,000 scholarly research reports on Google Scholar refer to an infodemic in the title and/or abstract. Despite this widespread attention, the claim that online content about COVID-19 is more likely to be false than other topics has not been tested.

We seek to characterize the COVID-19 infodemic’s scale and scope in comparison to other health topics. In particular, we focus on the opening stages of the infodemic–March through May, 2020 –when case counts began to increase worldwide, vaccines were not yet available, and concerted collective action–such as social distancing, mask-wearing, and compliance with government lockdowns–was necessary to reduce the rate at which COVID-19 spread. Misinformation during this time period was especially problematic because of its potential to undermine these collective efforts. Our study therefore aims to answer the following question:

  • Were posts about COVID-19 more likely to contain links to misinformation when compared to other health topics?

Beyond the sheer volume of links shared, one might define an “infodemic” by the likelihood that a particular type of post might go viral. Thus, our second question:

  • When it comes to COVID-19, were links containing misinformation more likely to go viral?

To answer these questions, we must rely on a scalable method. One commonly used proxy for misinformation is source credibility. If the infodemic was indeed characterized by false content, one might expect a higher proportion of this content to come from low credibility sources that “lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information” [11]. Thus, our third question:

  • Does content from less credible sources include more misinformation?

Evidence before this study

Prior studies [12] found that low-credibility content was, in fact, rare on Twitter, albeit shared widely within concentrated networks [13]. We only found two studies comparing across multiple social media platforms [13, 14], with both studies concluding that the prevalence of low-credibility content varied significantly between platforms. None of these studies compared COVID-19 content to other health topics.

To our knowledge, this study is the first to evaluate the claim of an infodemic by comparing COVID-19 content to other health topics. We analyzed hundreds of millions of social media posts to determine if COVID-19 posts pointed to lower-credibility sources compared to other health content.

Materials and methods

Data collection

Data comprised all public posts made to Twitter, and public posts from Facebook from pages (intended to represent brands and celebrities) with more than 100,000 likes, and groups (intended as venues for public conversation) with at least 95,000 members or those based in the US with at least 2,000 members.

COVID-19 tweets.

First, we collected English language tweets from Twitter matching keywords pertaining to COVID-19 [15] between March 8, 2020 and May 1, 2020. Next, we compared these to tweets containing keywords pertaining to other health topics [16] for the same dates in 2019.

We obtained COVID-19 tweets using the Social Feed Manager software [17], which collected English-language tweets from the Twitter API’s statuses/filter streaming endpoint ( matching keywords of “#Coronavirus”, “#CoronaOutbreak”, and “#COVID19” posted between March 8, 2020 and May 1, 2020 [15].

Health tweets.

We obtained tweets about other health topics using the Twitter Streaming API to collect English-language tweets containing keywords pertaining to generalized health topics posted between March 8, 2019 and May 1, 2019 (keywords are listed in reference [16]).

Facebook data.

Next, we collected comparable data from Facebook for the same dates using CrowdTangle [18]–a public insights tool owned and operated by Facebook. Specifically, we collected English-language posts from Facebook Pages matching keywords of:

  • “coronavirus”, “coronaoutbreak”, and “covid19” posted between March 8, 2020 and May 1, 2020, downloaded on June 2–3, 2020.
  • the same health-related keywords used in the health stream posted between March 8, 2019 and May 1, 2019, downloaded on July 13–14, 2020.


The data used in this article are from publicly available online sources, the uses of which are deemed exempt by the George Washington University institutional review board (180804).

Credibility categorization

Our analysis draws upon an assumption that is widespread in prior work [1114, 19, 20]: that “the attribution of ‘fakeness’ is … not at the level of the story but at that of the publisher.” [19]. This assumption is attractive because it is scalable, allowing researchers to analyze vast quantities of posts by characterizing their source URLs. We therefore extracted all Uniform Resource Locators (URLs) in each post. We used the “tldextract” Python module [21] to identify each URL’s top-level domain (for example the top-level domain for is, unshortening links (e.g., "") if necessary (see Appendix A in S1 File). We grouped these top-level domains into three categories reflecting their overall credibility using a combination of credibility scores from independent sources (NewsGuard;, and MediaBiasFactCheck;, as follows (see Appendix B in S2 File for details):

More credible.

This category contained the most credible sources. Top-level domains were considered “more credible” if they fit into one of the following two categories:

  • Government and Academic Sources, defined by Singh et al. [12], as “high quality health sources”, included official government sources such as a public health agency (e.g., if the top-level domain, or academic journals and institutions of higher education (e.g., if the top-level domain; see Appendix B in S2 File).
  • Other More Credible Sources, defined by Singh et al. [12], as “traditional media sources”, were given a credibility rating of at least 67% by NewsGuard, or rated as “very high” (coded as 100%) or “high” (80%) on the MediaBiasFactCheck factual reporting scale (NewsGuard and MediaBiasFactCheck scores are strongly correlated, r = 0.81, so we averaged these scores when both were available).

Less credible.

Top-level domains were considered “less credible” if they were given a credibility rating between 33% and 67% by NewsGuard, or rated as “mostly factual” (60%) or “mixed” (40%) on the MediaBiasFactCheck factual reporting scale (averaging these when both were available).

Not credible.

These sources contained the least credible sources, such as conspiracy-oriented sites, but also government-sponsored sites that are generally considered propagandistic. Top-level domains were considered “not credible” if they:

  • Were given a credibility rating of 33% or less by NewsGuard or rated as “low” (20%) or “very low” (0%) on the MediaBiasFactCheck factual reporting scale.
  • Were rated as a “questionable source” by MediaBiasFactCheck.

Like prior work, [12, 13, 19, 20, 22] our analysis draws upon a widespread simplifying assumption: that “the attribution of ‘fakeness’ is … not at the level of the story but at that of the publisher.” [19]. This assumption is attractive because it is scalable. However, in the interest of evaluating it for health topics, we performed an additional validity check. To determine the content of each credibility category, we developed a codebook (Table 1) to assess the presence of false claims. We generated a stratified sample of 3000 posts by randomly selecting 200 posts from each COVID-19 dataset for each credibility category (More, Less, Not Credible, Unrated) and a set of 200 “in platform” posts (i.e., those linking to Twitter from Twitter or Facebook from Facebook). Three authors (DK, FF, and AMJ) manually labeled batches of 100 posts from each platform each until annotators achieved high interrater reliability (Krippendorff’s α>0.80), which we obtained on the second round (α = 0.81). Disagreements were resolved by majority and ties adjudicated by a fourth author (DAB). The remaining 2400 posts were then split equally between all three annotators. We also generated qualitative descriptions for each credibility category.

Virality analysis. We conducted negative binomial regressions for each COVID dataset to predict the number of shares or retweets for each original post (Facebook and Twitter share counts were current as of June 2, 2020, and May 31, 2020, respectively). Following Singh et al. [12], we analyzed high-quality health sources separately from traditional media sources, separating the “more credible” category into two subcategories: “academic and government” and “other more credible” sources. For tweets with multiple URLs, we assigned each tweet with a lower-credibility URL (“not credible” or “less credible”) to its least credible category (see Appendix C in S3 File).


We identified 305,129,859 posts on Twitter, 13,437,700 posts to Facebook pages, and 6,577,307 posts to Facebook groups, containing keywords pertaining to COVID-19 and other health conditions. These posts contained 41,134,540 URLs (excluding in-platform links such as retweets and shares) including 554,378 unique top-level domains. 14,609 (2.6%) of these unique top-level domains were assigned a credibility rating, these top-level domains accounted for 19,294,621 (47%) of all URLs shared. The remaining URLs were unrated (see S1 Fig for raw counts).

Content of credibility categories

We conducted an inductive analysis of each credibility category to validate the use of credibility as a proxy for misinformation (see Table 2 for examples of URLs from each category).

“Not credible” sources contained more misinformation than “more credible” sources.

In our stratified random sample of 3000 posts, those with URLs rated as “not credible” were 3.67 (95% CI: 3.50–3.71) times more likely to contain false claims than “more credible” sources (Fig 1). Results were comparable when comparing only those posts labeled as containing news or information (see S2 Fig), and we did not detect a significant difference between high-quality health sources (5.33% misinformation, 95% CI: 0.00–10.42, n = 75) and more credible traditional media sources (5.33% misinformation, 95% CI: 3.41–7.26, n = 525). Neither intermediate “less credible” sources (8.50% misinformation, 95% CI: 6.11–10.89), “unrated” sources (7.33% misinformation, 95% CI: 5.10–9.56), or “in platform” sources (5.17% misinformation, 95% CI: 4.26–6.07) were statistically significantly more likely to contain misinformation when compared to “more credible” sources (5.33% misinformation, 95% CI: 3.41–7.26, n = 600).

Fig 1. Proportions of misinformation for each credibility category.

Error bars represent 95% confidence intervals.

Beyond these misinformation ratings, we calculated the proportions of each content type in our codebook, for each credibility category (S2 Fig). A qualitative description of each category follows.

More credible.

These sources primarily shared news and government announcements. Content was rarely political, although users sometimes editorialized, often with liberal bias. Here, misinformation reported on, and potentially amplified, questionable content, such as explaining conspiracy theories or reporting on claims that bleach cures COVID. Some content also expressed uncertainty around COVID-19 science, pointing out limitations of data and models, and acknowledging major questions could not yet be answered.

Less credible.

These sources contained a wide variety of content. Non-US politics were common, especially from Indian, Chinese, and European sources. Misinformation in this category included some political conspiracy theories, but also more subtle falsehoods including suggesting COVID is less severe than flu, promoting hydroxychloroquine as a cure, or claiming that “lockdowns” are an overreaction. This category also includes content that inadvertently amplified questionable content while attempting to debunk it.

Not credible.

Misinformation was more common in this category. Common themes included: blaming China for the virus, questioning its origins, rejecting vaccines, and framing COVID as undermining U.S. President Trump. These sources also tended to have a conservative political bias. Content emphasizing scientific uncertainty suggested that response measures were unjustified or that science was distorted for political ends. This category also included propaganda narratives, often extolling Russian and Chinese COVID responses.

Comparison to other health topics prior to the pandemic

Posts about COVID-19 were less likely to contain links to “not credible” sources and more likely to contain links to “more credible” sources when compared to other health topics prior to the pandemic. On average, URLs shared were more likely to be credible than non-credible during the pandemic (Fig 2). Among rated links, the proportion of “not credible” links shared during the pandemic in posts containing COVID-19 keywords was lower on Twitter (RR = 0.37; 95% CI: 0.37–0.37), Facebook Pages (RR = 0.41; 95% CI: 0.40–0.42), and Facebook Groups (RR = 0.37; 95% CI: 0.37–0.38). Additionally, the proportion of “more credible” links in posts containing COVID-19 keywords was higher on Twitter (RR = 1.13; 95% CI: 1.13–1.13), Facebook Pages (RR = 1.07; 95% CI: 1.07–1.07), and Facebook Groups (RR = 1.03; 95% CI: 1.02–1.03). These results replicated when focusing only on “high-quality health sources”—academic and government sources—for all three platforms: Twitter (RR = 3.52; 95% CI: 3.50–3.54), Facebook Pages (RR = 1.15; 95% CI: 1.14–1.17), and Facebook Groups (RR = 1.09; 95% CI: 1.06–1.11). URLs were also less likely to be unrated during the pandemic: Twitter RR = 0.67 (95% CI: 0.67 to 0.67), Facebook Pages RR = 0.74 (95% CI: 0.74 to 0.74), and Facebook Groups RR = 0.58 (95% CI: 0.58 to 0.58) (see Supplementary Material).

Fig 2. Proportions of COVID-19 and health URLs for each credibility category and social media platform.

The least credible posts are not the most viral

Even if low credibility content is less widespread on Twitter and Facebook, it can still be harmful if it garners more engagement. We therefore compared the average number of shares for each credibility category. We did not find that the least credible content was the most widely shared. Rather, on Twitter and Facebook Pages, the most viral posts contained links to government and academic sources, whereas intermediate “less credible” sources were the most viral in Facebook Groups (Fig 3).

Fig 3. Average number of shares for each credibility category by platform, estimated using negative binomial regression.


Like prior studies [12, 14, 22], we find that there is indeed an overwhelming amount of content pertaining to COVID-19 online, making it difficult to discern truth from falsehood. Furthermore, we found that posts with URLs rated as “not credible” were indeed more likely to contain falsehoods than posts in other categories.

We are the first to compare this content to other health topics across platforms, adding much needed context. Upon comparison, we found that social media posts about COVID-19 were more likely to come from credible sources, and less likely to come from non-credible sources. Thus, available evidence suggests that misinformation about COVID-19 is proportionally quite rare, especially when compared to misinformation about other health topics.

Although sources rated as “not credible” were roughly 3.67 times more likely to share misinformation, Fig 2 shows that misinformation–i.e., explicitly false claims about COVID-19 –was only present in a minority of posts. Thus, prior studies which used credibility as a proxy for misinformation may have overestimated the prevalence of explicitly false claims. Explicit falsehoods, although harmful, seem to be rare. To the extent that “more credible” sources shared misinformation, they did so to report on or, in some cases, attempt to debunk, it. Thus, contrary to the claim of an “infodemic” of misinformation, posts about COVID-19 included less misinformation than other health-related posts prior to the pandemic.

Our results demonstrate that the volume of low-credibility content is much lower than the volume of high-credibility content on Twitter and Facebook. However, small volumes of harmful content could still be problematic if they garner a disproportionately large number of engagements. We found that this was not the case. To the contrary, content from the highest-quality sources–government and academic websites–was shared more often, on average, on both Twitter and Facebook. In Facebook Groups, where links to “not credible” sources were shared more often than links to high-quality sources, intermediate “less credible” sources were most frequently shared. However, we did not find that misinformation was significantly more prevalent in this category than in the “more credible” category.

Taken as a whole, these results suggest that misinformation about COVID-19 may largely be concentrated within specific online communities with limited spread overall. Online misinformation about COVID-19 remains problematic. However, our results suggest that the widespread reporting of false claims pertaining to COVID-19 may have been overstated at the start of the pandemic, whereas other health topics may be more prone to misinformation.


Our inclusion criteria for social media data are based on keywords associated with COVID-19, vaccine-preventable illnesses, and other health conditions. This collection procedure might introduce some noise in our dataset, for example if online actors exploited the virality of the COVID-19 hashtags/keywords to promote their content. If so, this would engender potentially more misinformation during the pandemic; in fact, we found that there was less (see S2 Fig, where we quantified proportions of “opportunistic” content). Furthermore, we used inclusion criteria that are comparable to prior studies, including those upon which the initial claim of an infodemic was based: a WHO/PAHO fact sheet from May 1, 2020 (, defines the “infodemic” using keyword search terms that are similar to ours. Other studies of the “infodemic” have taken the same approach [1214]. Thus, our findings contextualize previous work in this area which has primarily focused on low-credibility sources rather than a more holistic picture.

Our inclusion criteria yielded several unrated URLs, comprising roughly half our sample. These URLs were not primarily misinformative (see S3 Fig). However, even if unrated URLs did contain large quantities of misinformation, COVID-19 data were statistically significantly less likely to contain this unrated content on all social media platforms studied compared to what would be expected prior to the pandemic.


Taken together, our findings suggest that the “infodemic” is, in fact, a general feature of health information online, that is not restricted to COVID-19. In fact, COVID-19 content seems less likely to contain explicitly false facts. This does not mean that misinformation about COVID-19 is absent; however, it does suggest that attempts to combat it might be better informed by comparison to the broader health misinformation ecosystem. Such a comparison would potentially engender a more dramatic response.

Health leaders who have focused on COVID-19 misinformation should acknowledge that this problem affects other areas of health even more so. Beyond the COVID-19 infodemic, calls-to-action to address medical misinformation more broadly should be given higher priority.

Supporting information

S2 File. Appendix B.

Measuring source credibility.


S3 File. Appendix C.

Categorizing tweets with multiple URLs.


S1 Fig. Raw counts of posts and URLs in each dataset.

URLs are segmented by whether they were rated, unrated, or “in platform” (e.g., pointing from Facebook to Facebook or from Twitter to Twitter).


S2 Fig. Content proportions in each dataset (n = 600 for each credibility category).


S3 Fig. Proportion of posts sharing information and also containing falsehoods (“misinformation”) broken down by credibility category.

Error bars reflect one standard error.



  1. 1. World Health Organization. Novel Coronavirus (‎ 2019-nCoV)‎: situation report, 3. 2020.
  2. 2. Galvão J. COVID-19: the deadly threat of misinformation. The Lancet Infectious Diseases. 2020;0. pmid:33031753
  3. 3. Diseases TLI. The COVID-19 infodemic. The Lancet Infectious Diseases. 2020;20: 875. pmid:32687807
  4. 4. Ball P, Maxmen A. The epic battle against coronavirus misinformation and conspiracy theories. Nature. 2020;581: 371–374. pmid:32461658
  5. 5. Chou W-YS, Gaysynsky A, Vanderpool RC. The COVID-19 Misinfodemic: Moving Beyond Fact-Checking. Health Education & Behavior. 2020; 1090198120980675. pmid:33322939
  6. 6. McGinty M, Gyenes N. A dangerous misinfodemic spreads alongside the SARS-COV-2 pandemic. Harvard Kennedy School Misinformation Review. 2020;1. Available:
  7. 7. Smith M, McAweeney E, Ronzaud L. The COVID-19 “Infodemic.” 2020 Apr. Available: pmid:34368805
  8. 8. Allyn B. Researchers: Nearly Half Of Accounts Tweeting About Coronavirus Are Likely Bots. In: [Internet]. 20 May 2020 [cited 17 Jul 2020]. Available:
  9. 9. Union PO of the E. Tackling COVID-19 disinformation- getting the facts right. Publications Office of the European Union; 10 Jun 2020 [cited 10 Dec 2020]. Available:
  10. 10. Gabrielle L. Briefing With Special Envoy Lea Gabrielle, Global Engagement Center On Disinformation and Propaganda Related to COVID-19. In: United States Department of State [Internet]. 27 Mar 2020 [cited 10 Dec 2020]. Available:
  11. 11. Lazer DM, Baum MA, Benkler Y, Berinsky AJ, Greenhill KM, Menczer F, et al. The science of fake news. Science. 2018;359: 1094–1096. pmid:29590025
  12. 12. Singh L, Bode L, Budak C, Kawintiranon K, Padden C, Vraga E. Understanding high-and low-quality URL Sharing on COVID-19 Twitter streams. Journal of Computational Social Science. 2020; 1–24. pmid:33263092
  13. 13. Yang K-C, Torres-Lugo C, Menczer F. Prevalence of low-credibility information on twitter during the covid-19 outbreak. arXiv preprint arXiv:200414484. 2020.
  14. 14. Cinelli M, Quattrociocchi W, Galeazzi A, Valensise CM, Brugnoli E, Schmidt AL, et al. The COVID-19 social media infodemic. Sci Rep. 2020;10: 16598. pmid:33024152
  15. 15. Kerchner D, Wrubel L. Coronavirus Tweet Ids. Harvard Dataverse; 2020.
  16. 16. Paul MJ, Dredze M. Discovering health topics in social media using topic models. PloS one. 2014;9: e103408. pmid:25084530
  17. 17. George Washington University Libraries. Social Feed Manager, Version 2.3.0. Zenodo; 2020.
  18. 18. Team CrowdTangle. CrowdTangle. Menlo Park, California, United States; 2020. Available: List IDs: 1418322, 1418323, 1418326, 1418327, 1418328, 1418329.
  19. 19. Grinberg N, Joseph K, Friedland L, Swire-Thompson B, Lazer D. Fake news on Twitter during the 2016 U.S. presidential election. Science. 2019;363: 374–378. pmid:30679368
  20. 20. Pennycook G, Rand DG. Fighting misinformation on social media using crowdsourced judgments of news source quality. Proc Natl Acad Sci USA. 2019;116: 2521–2526. pmid:30692252
  21. 21. Kurkowski J. john-kurkowski/tldextract. 2020. Available:
  22. 22. Pulido CM, Villarejo-Carballido B, Redondo-Sama G, Gómez A. COVID-19 infodemic: More retweets for science-based information on coronavirus than for false information. International Sociology. 2020; 0268580920914755.