Anatomy of an online misinformation network

Massive amounts of fake news and conspiratorial content have spread over social media before and after the 2016 US Presidential Elections despite intense fact-checking efforts. How do the spread of misinformation and fact-checking compete? What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors? How to reduce the overall amount of misinformation? To explore these questions we built Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. Hoaxy captures public tweets that include links to articles from low-credibility and fact-checking sources. We perform k-core decomposition on a diffusion network obtained from two million retweets produced by several hundred thousand accounts over the six months before the election. As we move from the periphery to the core of the network, fact-checking nearly disappears, while social bots proliferate. The number of users in the main core reaches equilibrium around the time of the election, with limited churn and increasingly dense connections. We conclude by quantifying how effectively the network can be disrupted by penalizing the most central nodes. These findings provide a first look at the anatomy of a massive online misinformation diffusion network.


Introduction
The viral spread of online misinformation is emerging as a major threat to the free exchange of opinions, and consequently to democracy. Recent Pew Research Center surveys found that 63% of Americans do not trust the news coming from social media, even though an increasing majority of respondents uses social media to get the news on a regular basis (67% in 2017, up from 62% in 2016). Even more disturbing, 64% of Americans say that fake news have left them with a great deal of confusion about current events, and 23% also admit to passing on fake news stories to their social media contacts, either intentionally or unintentionally [1,2,3]. PLOS  Misinformation is an instance of the broader issue of abuse of social media platforms, which has received a lot of attention in the recent literature [4,5,6,7,8,9,10,11,12,13,14,15]. The traditional method to cope with misinformation is to fact-check claims. Even though some are pessimistic about the effectiveness of fact-checking, the evidence is still conflicting on the issue [16,17]. In experimental settings, perceived social presence reduces the propensity to fact-check [18]. An open question is whether this finding translates to the online setting, which would affect the competition between low-and high-quality information. This question is especially pressing. Even though algorithmic recommendation may promote quality under certain conditions [19], models and empirical data show that high-quality information does not have a significant advantage over low-quality information in online social networks [20,15].
Technology platforms, journalists, fact checkers, and policymakers are debating how to combat the threat of misinformation [21]. A number of systems, tools, and datasets have been proposed to support research efforts about misinformation. Mitra and Gilbert, for example, proposed CREDBANK, a dataset of tweets with associated credibility annotations [22]. Hassan et al. [23] built a corpus of political statements worthy of fact-checking using a machine learning approach. Some systems let users visualize the spread of rumors online. The most notable are TwitterTrails [24] and RumorLens [25]. These systems, however, lack monitoring capabilities. The Emergent site [26] detected unverified claims on the Web, tracking whether they were subsequently verified, and how much they were shared. The approach was based on manual curation, and thus did not scale.
The development of effective countermeasures requires an accurate understanding of the problem, as well as an assessment of its magnitude [27,28]. To date, the debate on these issues has been informed by limited evidence. Online social network data provides a way to investigate how human behaviors, and in particular patterns of social interaction, are influenced by newsworthy events [29]. Studies of news consumption on Facebook reveal that users tend to confine their attention on a limited set of pages [30,31]. Starbird demonstrates how alternative news sites propagate and shape narratives around mass-shooting events [32].
Articles in the press have been among the earliest reports to raise the issue of fake news [33]. Many of these analyses, however, are hampered by the quality of available data-subjective, anecdotal, or narrow in scope. In comparison, the internal investigations conducted by the platforms themselves appear to be based on comprehensive disaggregated datasets [34,35], but lack transparency, owing to the two-fold risk of jeopardizing the privacy of users and of disclosing internal information that could be potentially exploited for malicious purposes [36].
Motivated by these limitations, in previous work we presented a prototype of Hoaxy, an open platform for the study of the diffusion of misinformation and its competition with factchecking [37]. Here we build upon this prior effort, contributing to the debate on how to combat digital misinformation in two ways: • We describe the implementation and deployment of the Hoaxy system, which was first introduced in a 2016 demo [37]. The system has been collecting data on the spread of misinformation and fact checking from the public Twitter stream since June of 2016. It is now publicly available (hoaxy.iuni.iu.edu). Users can query the tool to search instances of claims and relative fact checking about any topic and visualize how these two types of content spread on Twitter.
• We leverage the data collected by Hoaxy to analyze the diffusion of articles from low-credibility sources and fact-checks on Twitter in the run up to and wake of the 2016 US Presidential Election. This analysis provides a first characterization of the anatomy of a large-scale online misinformation diffusion network.
When studying misinformation, the first challenge is to assess the truthfulness of a claim. This presents several difficulties. The most important is scalability: it is impossible to manually evaluate a very large number of claims, even for professional fact-checking organizations with dedicated staff. Here we mitigate these issues by relying on a list of low-credibility sources compiled by trusted third-party organizations. In the run-up to and wake of the 2016 US Presidential Elections, several reputable media and fact checking organizations have compiled lists of popular sources that routinely publish unverified content such as hoaxes, conspiracy theories, fabricated news, click bait, and biased, misleading content. We manually assess that the great majority of the articles published by these sources, considered here, contain some form of misinformation or cannot be verified (see Methods). For brevity, in the remainder of the paper we refer to articles from low-credibility sources simply as "articles." Hoaxy retrieves the full and comprehensive set of tweets that share (i.e., include a link to) articles and fact-checks. These tweets are important because, by tracking them, we can observe how a particular piece of content spreads over the social network. It is important to note that Hoaxy collects 100% of these tweets, not a sample. This lets us obtain, for any given piece of misinformation in our corpus, the full picture of how it spreads and competes with subsequent fact-checking, if any.
In this paper we address three research questions: • RQ1: How do the spread of misinformation and fact-checking compete?
• RQ2: What are the structural and dynamic characteristics of the core of the misinformation diffusion network, and who are its main purveyors?
• RQ3: How to reduce the overall amount of misinformation?
We pose our first question (RQ1) to investigate whether those who are responsible for spreading articles are also exposed to corrections of those articles. Regretfully, only 5.8% of the tweets in our dataset share links to fact-checking content-a 1:17 ratio with misinformation tweets. We analyze the diffusion network in the run up to the election, and find a strong coreperiphery structure. Fact-checking almost disappears as we move closer to the inner core of the network, but surprisingly we find that some fact-checking content is being shared even inside the main core. Unfortunately, we discover that these instances are not associated with interest in accurate information. Rather, links to Snopes or Politifact are shared either to mock said publications, or to mislead other users (e.g., by falsely claiming that the fact-checkers found a claim to be true). This finding is consistent with surveys on the trust of fact-checking organizations, which find strong polarization of opinions [38].
Our second question (RQ2) is about characterizing the core of the article diffusion network. We find the main core to grow in size initially and then become stable in both size and membership, while its density continues to increase. We analyze the accounts in the core of the network to identify those users who play an important role in the diffusion of misinformation. The use of Botometer, a state-of-the-art social bot detection tool [12], reveals a higher presence of social bots in the main core. We also consider a host of centrality measures (in-strength, out-strength, betweenness, and PageRank) to characterize and rank the accounts that belong in the main core. Each metric emphasizes different subsets of core users, but interestingly the most central nodes according to different metrics are found to be similar in their partisan slant.
Our last question (RQ3) addresses possible countermeasures. Specifically we ask what actions platforms could take to reduce the overall exposure to misinformation. Platforms have already taken some steps this direction, by prioritizing high-quality over low-quality content [35,39]. Here we consider a further step by investigating whether penalizing the main purveyors of misinformation, as identified by RQ2, yields an effective mitigation strategy. We find that a simple greedy solution would reduce the overall amount of misinformation significantly.

Network core analysis
The k-core of a graph is formally defined as the maximal subgraph with nodes of at least degree k. In practice, k-core decomposition uses a recursive procedure, given the k-core, for extracting the (k + 1)-core by recursively removing all nodes with degree k. The nodes that have been removed constitute the k-shell. The k-core decomposition is the sequence of k-cores of increasing values of k. Finally, the non-empty graph with maximum value of k is called the main core. Prior work has used k-core decomposition to probe the structure of complex networks [40,41]. In the case of social networks, k-cores can be used to identify influential users [42], and to characterize the efficiency of information spreading [43].
We convert the weighted, directed diffusion networks analyzed in this paper into unweighted, undirected networks when applying k-core decomposition. We also used s-core decomposition, which takes weights into account by using the strength s (a.k.a. weighted degree) of a node in place of the degree k [44]. The results are qualitatively similar, so we omit them here and present the analysis based on k-core decomposition.

Bot detection
Social bots play an important role in the spread of misinformation [15]. Researchers have built supervised learning tools to detect such automated accounts with high accuracy. We leverage such a tool, called Botometer [12], to evaluate Twitter accounts.
Botometer performs classification over a large set of features that include temporal, network, language, and sentiment signals. The classifier is trained in supervised fashion from a set of labeled examples. The set includes examples discovered with a honeypot and by human raters. Two classifiers are available, a standard one, which includes English-based language features, and a 'universal' one, which does not include language features, and is thus applicable beyond English-speaking contexts. We use the standard classifier through a public API [45,46].

Article verification
Our analysis considers content published by a set of websites flagged as sources of misinformation by third-party journalistic and fact-checking organizations. We merged several lists of low-credibility sources compiled by such organizations. It should be noted that these lists were compiled independently of each other, and as a result they have uneven coverage. However, there is some overlap between them. The full list is available online [47].
The source-based approach relies on the assumption that most of the articles published by our compilation of low-credibility sources are some type of misinformation, as we cannot factcheck each individual claim. To validate this assumption, we manually verified a sample of 50 articles drawn uniformly at random from our corpus of articles-all those tweeted at least once by any of the sources in our compilation, during the period of interest. Each article was evaluated independently by two reviewers, with ties broken by a third reviewer. We applied a broadly used rubric based on seven types of misinformation: fabricated content, manipulated content, imposter content, false context, misleading content, false connection, and satire [21]. We also added articles that could not be verified (inconclusive). Satire was not excluded because fake-news sites often label their content as satirical, and viral satire is often mistaken for real news. Further details about the verification procedure can be found in a technical report [15]. Fig 1 shows that only a minority of articles in the collection (27 ± 7%) can be verified. The sampling method biases the analysis toward more prolific sources, some of which simply copy and paste large numbers of articles from other sources. The fraction of verifiable articles is cut in half when sampling articles by tweets, thus biasing the sample toward popular rather than prolific sources [15].
We also tracked the websites of several independent fact-checking organizations: politifact. com, snopes.com, factcheck.org, badsatiretoday.com, hoax-slayer.com, opensecrets.org, and truthorfiction.com. In April 2017 we added climatefeedback.org, which does not affect the present analysis. Fig 2 shows the architecture of the Hoaxy system. The system is composed of a back-end and a front-end. Next we describe some of the technical aspects that went into the design and implementation of these components.

Hoaxy system
Back-end. The back-end provides data collection, processing, storage, and indexing capabilities. We start from the list of sources discussed earlier. Data are collected from two realms: social media (i.e., Twitter) and the news source sites in the list. To collect data from Twitter, Hoaxy filters the real-time stream for tweets matching our list of domain keywords [48]. Matches are performed server-side against the complete text of the tweet. This means that for each delivered tweet we further make sure that the match is actually a hyperlink. Tweets that simply mention our sources but do not link to them (e.g., "I read this on snopes.com!") are discarded.
All matching link URLs are then extracted from the tweet and fetched directly from the source. To get a complete snapshot of all content produced by the sources, Hoaxy also regularly crawls their websites in a separate process. We use a mix of RSS and direct crawling to do so. Regardless of the way it is collected, from each fetched document Hoaxy extracts title, metadata, and body information.
All collected data (tweets and fetched documents) are saved in a relational database. Documents are further indexed using Lucene [49], to enable full-text search from the front-end.
Content duplication and document text extraction are two critical aspects of this data collection pipeline. Because we are crawling data from the Web, we expect to observe several different variants of the same URLs. This is especially true for the resources obtained from the social media stream, for which duplication may occur due to marketing campaign and other tracking parameters, shortening (e.g., bit.ly) and snapshotting (e.g., archive.is) services, and domain aliasing (e.g., dcgazette.com and thedcgazette.com).
While acknowledging that principled solutions to deal with the problem of Web content duplication have been around for decades [50], we found that a set of few, simple heuristics gave satisfactory results. For example, we found that focusing on the most common tracking parameters (i.e. UTM parameters) we can canonicalize about 30% of all URLs. Similarly, by following all types of HTTP redirect responses, we resolve shortened URLs for about 45% of the URLs extracted from tweets. Snapshotting and domain aliases account instead for only five duplicates, and we simply ignore them. We also had the problem of extracting the actual text of the fetched documents. There is a lot of extraneous content in the body of documents due to the presence of ads, comment threads, and personalization. All this 'noise' poses a problem for indexing the corpus efficiently. Algorithms for document text extraction have been around for several years [51]. We tested several implementations and eventually settled for the one offered by a third-party API [52].
Having collected, processed, stored, and indexed all the data, the final component of the back-end is the API, a small piece of middleware that enables programmatic access to both the relational database and the full-text Lucene index for the purposes of search and visualization.
Front-end. Hoaxy provides an intuitive Web interface to search and visualize the spread of articles contained in our database, and the competition with subsequent fact-checking verifications (see Fig 3). The user first specifies a query (Fig 3(a)). Users can choose to retrieve either the most relevant or the most recent results. To do so, we first send the query to Lucene, which returns a list of most relevant/recent articles from low-credibility and fact-checking sources. In practice, because there are many more articles from low-credibility sources than fact-checking sources, we rank these articles separately, and then merge the top results from the two rankings into a single list. Finally we re-rank the results based on the number of tweets in the database.
After selecting the results that match their query (Fig 3(b)), the user can finally visualize the results. Hoaxy provides two types of visualization: a timeline plot (not shown in the figure) that displays the growth in the number of tweets for articles from both low-credibility and fact-checking sources, and an interactive visualization of the diffusion network (Fig 3(c)). In the network, nodes represent Twitter accounts and edges connect any two users that exchanged information by retweet, reply, mention, or quoted retweet. Edge directionality represents the flow of information, e.g., from the retweeted to the retweeter account or from the mentioning to the mentioned account.
Deployment. We started collecting data with Hoaxy from 69 low-credibility and 7 factchecking sources-76 in total-in June 2016. In December 2016, 50 more low-credibility sources were added. The system has collected data continuously ever since. As of October 2017, Hoaxy has collected a total of 29,351,187 tweets-27,648,423 with links to articles from low-credibility sources and 1,705,576 with links to fact-checking sources. The total number of documents collected so far is 653,911-628,350 by low-credibility sources and 25,561 by factchecking ones.
The public Web interface of Hoaxy was launched on December 20, 2016. Fig 4 plots the daily query volume and some of the most popular topics queried by users over the course of the first 6 months of operation. Unsurprisingly, the term 'Trump' is among the most popular search terms, but we also see substantial churn in user interest, with topics following closely the most popular pieces of controversial information of the moment, e.g., 'vaccines,' 'pizzagate,' 'voter fraud' and 'Trump Russia.'

Datasets
To explore our research questions, we focus on the retweet network (including quoted retweets) for links to articles from either low-credibility or fact-checking sources. A retweet provides information about the primary spreader (retweeted account) and secondary spreader (retweeting account). To be sure, Hoaxy does collect any kind of tweet, as long as a URL, whose Web domain matches our list of sources, is included in the tweet. To give an idea of the full scope of the Hoaxy dataset, retweets and quoted retweets occur 66.9% of the times; approximately 1 in 10 retweets is a quoted retweet. Of the remaining types of tweets, replies (i.e. tweets forming a conversation thread and including an @-mention of another users) account for 2.1% of the total. The remaining tweets are neither retweets nor replies; they are original tweets.
The network is a graph defined as follows: we include a node for each Twitter user account in the database. Edges are directed (as explained earlier) and weighted. The weight of an edge represents the number of retweets from one account to another. That is, we increase by one the weight on a directed edge from user a to user b every time we observe that b retweets a. Edges are labeled by the type of content being retweeted. To do so, we split the total weight w (e) of edge e in two separate counts, one that keeps track of retweets of articles (w c ) and one of fact-checks (w f ), respectively. That is, w(e) = w c (e) + w f (e) for all e 2 E. We observe w c (e) Á w f (e) > 0 in only a small minority of edges, meaning that we can easily label each edge as an 'article' or 'fact-check' edge with a simple majority rule (ties are broken at random).
Because prior work shows that collective attention patterns change dramatically in the wake of highly anticipated events, like elections [53], we split our analysis in two periods, pre-and post-Election Day (Nov. 8,2016). Table 1 provides a summary of the three networks analyzed in this paper. We explore the overall spread of content on the full network spanning six months before Election Day, including articles from both low-credibility and fact-checking sources. We decompose this network into its k-core shells to uncover the functional roles of the most densely connected sub-graph. Row 1 of Table 1 shows summary statistics for the network used at this stage.
The second dataset is used to study the diffusion of the sole misinformation, therefore we ignore all edges labeled as 'fact-check.' To characterize the long-term evolution of the main core, we extend the period of observation to October 9, 2017. Recall that 50 additional lowcredibility sources were added to Hoaxy in December 2016. To keep our analysis consistent across the pre-and post-Election Day periods, we do not include data from these sites in the present work. The network in row 2 of Table 1 is considered at this stage.
The third dataset (row 3 of Table 1) includes only articles from low-credibility sources but goes back to the pre-Election-Day period to characterize the most central users in the core and the robustness of the network.
All the analyses presented in this paper can be replicated by collecting data through the Hoaxy API [54] or downloading the network datasets at doi.org/10.5281/zenodo.1154840.

Ethics statement
Hoaxy collects data about public tweets obtained through the Twitter API, in full compliance with Twitter's Terms of Service. The Indiana University Human Subjects Committee has deemed the protocol exempt from review on the basis that the data is public and poses no risk for the subjects. The reviewers who performed article verification and annotated Twitter profiles were researchers in our lab who were recruited and provided informed consent via email.

Results
Having described in the prior section how Hoaxy collects data, let us now analyze the misinformation diffusion networks. To the best of our knowledge, the following is the first in-depth analysis of the diffusion network of online misinformation and fact-checking in the period of the 2016 US Presidential Election.

Low-credibility articles vs. fact-checking
We performed k-core decomposition of the entire network (row 1 of Table 1). The main core is obtained at max{k} = 50. We start with a simple exercise of visualizing the fraction of factchecking retweets among all retweets that forms each core in Fig 5. If nodes contributed to articles and fact-checking in the same proportions irrespective of core number, one would expect this fraction to remain constant as peripheral nodes are removed. On the contrary, we see a drastic decrease as k increases. Fact-checking activity almost disappears as we move toward the innermost, densest portion of the network.
To understand the details, Fig 6 visualizes different k-core networks for increasing values of k. We can draw several insights from these visualizations. The force-directed layout algorithm splits the network in two communities. There is substantial content segregation across these two communities, which we denote as the 'fact-checkers' and (misinformation) 'spreaders,' respectively. The edges across the two groups appear to be mostly colored in orange, suggesting some exposure of misinformation spreaders to fact-checks. However, it is still possible to appreciate some retweeting of fact-checking content involving spreaders even in the main core (Fig 6(d)). To understand in more quantitative terms the role of fact-checking in the spread of information in the core, we characterize users according to  Anatomy of an online misinformation network two simple metrics. Recall that in a weighted, undirected network the strength of a node is the sum of all the weights of all its incident edges, s(v) = ∑ e2v w(e). In a directed network one can likewise define the in-strength s in and the out-strength s out , by taking the sum only on the incoming and outgoing edges, respectively. We further consider edge labels and distinguish between article (s c ) and fact-check (s f ) strength. For each node v 2 V let us define two ratios, the fact-checking ratio ρ f and the retweet ratio ρ in : Intuitively, a user with a value of ρ f close to unity is going to be a fact-checker (as opposed to misinformation spreader), whereas an account with a value of ρ in close to unity is going to be a secondary spreader of information, i.e., to amplify messages through retweets rather than post original messages. The right-hand side of Fig 6 shows the joint distributions of (ρ f , ρ in ) for different values of k. We observe that for small values of k, most users fall close to the four corners of the space, meaning that they take on exactly one of the four possible combinations of roles ('primary spreader of articles from low-credibility sources', 'secondary spreader of articles from fact-checking sources', etc.).
For larger values of k (Fig 6(c) and 6(d)), we observe a shift away from secondary spreaders of fact-checking. In other words, the fact-checking links in the network core are retweeted by accounts who mainly spread misinformation. Fig 7 confirms that the drop in the spread of fact-checking is precipitous. For k > 20 there is only a small, stable residual activity.
The fact that fact-checking still spreads in the main core is a somewhat surprising observation. Therefore we search for patterns that explain how misinformation spreaders interact with fact-checking. Manual inspection of the data let us identify three key characteristics of these retweets of fact-checking content made by spreaders in the main core: (1) they link to fact-checking articles with biased, misleading wording; (2) they attack fact-checking sites; or (3) they use language that is inconsistent with the stance of a fact-checking article, for example implying that a claim is true even though the linked fact-checking article states that it is false. A sample of tweets with each of the aforementioned characteristics is shown in Table 2. Similar patterns of citing mainstream media to challenge them have been observed by Starbird [32].

Anatomy of the main core
Core dynamics. Although we observe that the main core is dominated by misinformation spreaders, it is unclear if this has always been the case. From this point and in the subsequent analysis we discard all edges labeled as 'fact-check' and focus only on the spread of misinformation. We start by investigating the long-term growth of the network. To do so we consider a network based on the Hoaxy dataset that extends post-Election Day; see row 2 of Table 1.
In particular we consider all retweets in our dataset in chronological order. At any given point in time, we consider a snapshot of the network formed by all retweets up to that point. We perform k-core decomposition on this cumulative network. We extract two pieces of information: the maximum value of k and the size of the main core. The left panel of Fig 8 shows how these two quantities change over time. We observe that the core gets both larger and denser. To characterize the extent to which the increasing density is just a byproduct of considering snapshots of growing size, we also plot the k of the main core for a shuffled version of each snapshot (i.e., the configuration model of each snapshot). In rewiring edges, we preserve the degree sequence of the network. While both the actual and the rewired network grow denser, the actual network does so at a higher rate, and the difference is statistically significant.
The main core reaches an equilibrium size of approximately 800 accounts around Election Day (see inset of left panel of Fig 8). This observations prompts the question of who are the users in the main core, and whether there is substantial churn in this group over time. By considering the intersection between two consecutive main cores in the sequence, we find a low churn rate after a peak in August (right panel of Fig 8), implying a stable set of users who  consistently drive network activities. In fact, there are 321 users who remain in the main core for the whole duration of the observation period. The retweet network of a subset of them is shown in Fig 9. Core membership. We now return to the period before the election. For this analysis we consider the pre-Election network, whose statistics are described in row 3 of Table 1. We use this network to characterize several aspects of how misinformation spreads before the election, starting with the presence of automated accounts. We do expect to find evidence of automation, given recent findings that show that a sizable fraction of the broader Twitter conversation about the elections was due to bots [55]. Open questions are whether bots were successful at spreading misinformation, which would be reflected by their tendency to locate in the core of the network, as opposed to the periphery; and whether bots interact with humans.
To determine the likelihood that the observed patterns of activity in the core of the network are the result of the deployment of social bots, we perform bot detection on a sample of accounts. After k-core decomposition, we sample 2000 accounts at random from each k-shell. If the size of a shell is smaller, we include the whole shell. To estimate the likelihood that each of these account is automated we compute a bot score by querying Botometer, a state-of-theart bot detection tool (see Methods). Fig 10 shows a sharp increase in bot scores as we move toward the core of the network. We observe many human-like accounts in the main core as well, indicated by the bot scores of accounts in the main core. If we define bot accounts as those having bot scores above 0.5, about 75% of the accounts in the main core are classified as human. In addition, about 25% of the retweets in the main core are classified as humans retweeting bots. This suggests a significant impact of bots on human perception in information diffusion.
For at least some of the most important individuals in the network core, it would be useful to have a behavioral description at a more granular level than just group averages. To this end, we need first to define a subset of important accounts. The network science toolbox provides us with several methods to identify salient nodes. We consider four centrality metrics: instrength s in , out-strength s out , PageRank [56], and betweenness [57]. The first two are local measures of primary (s out ) and secondary (s in ) spreading activity. Intuitively, s out captures influence, as measured by the number of times that an account is retweeted. On the other hand, s in captures prolific accounts that retweet a lot. The distribution of s in and s out is shown The shuffled version is obtained by sampling from the configuration model. This is repeated many times to obtain the 95% confidence interval shown in orange. The inset shows the size of the main core over time. Right: Churn rate (relative monthly change) of accounts in the main core. https://doi.org/10.1371/journal.pone.0196087.g008 in Fig 11 (left panel). Both distributions are broad, and the range of out-strength is broader than that of in-strength, due to the simple fact that, in networks of this size, the rate at which one can be retweeted is generally larger than that at which one can retweet others. The third and fourth measures are instead global notions of centrality, based on random walks and shortest paths, respectively. Given that these metrics capture fundamentally different notions of centrality, we expect them to produce different rankings of the nodes in the network. The right panel of Fig 11 shows strong variation in the average rank of accounts in the main core, confirming this intuition. The metric that seems to best capture the main core is the instrength, indicating that a majority of core accounts are secondary spreaders (prolific accounts).
For each measure, we rank the accounts in the main core and consider the top ten users. This exercise yields four lists of accounts, whose screen names are shown in Table 3. There is little or no overlap between these lists, and their union yields 34 unique accounts. Having identified a subset of moderate size that includes main core members of potential interest, we performed a qualitative analysis of these accounts. Three human raters were asked to inspect the Twitter profile of each user independently, and to provide answers to the following questions: 1. Bot or Human?

Partisanship?
3. Personal or organizational account? 4. How often does it share articles from low-credibility sources?
For questions 1-3, whose answer is a categorical variable, the raters could also choose 'neither' or 'not sure'. After the annotators coded each account, for each question we applied a Anatomy of an online misinformation network majority rule to identify the consensus label. The few cases in which a consensus could not be reached were broken by a fourth rater (one of the authors). The results are shown in Table 4. We report results for 32 of the 34 original accounts, since two accounts had been suspended by Twitter, and thus could not be annotated. Many of the central accounts appear to be automated and display a high degree of partisanship, all in support of the same candidate.

Network robustness
Our last question is about the overall robustness of the network (row 3 of Table 1). We ask: How much does the efficient spread of articles from low-credibility sources rely on the activity of the most central nodes? To explore this question we apply node disconnection, a standard procedure for estimating robustness in network science [58]. The idea is to remove one node at a time, and analyze how two simple metrics are curtailed as a result: total volume of article retweets, and total number of unique article links. The more these quantities can be reduced by removing a small number of nodes, the more the efficiency of the misinformation network is disrupted.
We wish to measure the fraction of retweets remaining after simulating the scenario in which a certain number of accounts are disconnected, by removing all edges to and from those accounts. There is one caveat. The retweet metadata from Twitter does not allow to reconstruct Anatomy of an online misinformation network the full retweet cascades. Instead, a retweet connects directly to the root of the cascade tree, so that the cascade tree is projected onto a star network. This means that when we disconnect a retweeting node, only the single leaf node is removed from the star, rather that the subtree rooted at that node in the actual cascade tree. We prioritize accounts to disconnect based on the four centrality metrics discussed before (s in , s out , betweenness, and PageRank). Fig 12 shows the result of the simulation. The greedy strategy that ranks users by decreasing out-strength achieves the best reduction of both metrics. The efficiency of the network is greatly impaired even after disconnecting as few as 10 most influential accounts (i.e., with greatest s out ). Surprisingly, disconnecting nodes with the highest s in is not as efficient a strategy for reducing misinformation; the network is robust with respect to the removal prolific accounts in the core. Betweenness, in comparison, seems to give good results on the total number of retweets (left panel of Fig 12), but does not produce better results than PageRank and in-strength when considering unique links (right panel).
From a policy perspective, we are not proposing that a social media platform should suspend accounts whose posts are highly retweeted. Of course, platforms must take great care in minimizing the chances that a legitimate account is suspended. However, platforms do use various signals to identify and penalize low-quality information [35,39]. The present analysis suggests that the use of s out in the article spreading network might provide a useful signal to prioritize further review, with the goal of mitigating the spread of misinformation. Such an approach assumes the availability of a list of low-quality sources, which can be readily compiled.

Discussion
The rise of digital misinformation is calling into question the integrity of our information ecosystem. Here we made two contributions to the ongoing debate on how to best combat this threat. First, we presented Hoaxy, an open platform that enables large-scale, systematic studies of how misinformation and fact-checking spread and compete on Twitter. We described key aspects of its design and implementation. All Hoaxy data is available through an open API. Second, using data from Hoaxy, we presented an in-depth analysis of the misinformation diffusion network in the run up to and wake of the 2016 US Presidential Election. We found that the network is strongly segregated along the two types of information circulating in it, and that a dense, stable core emerged after the election. We characterized the main core in terms of multiple centrality measures and proposed an efficient strategies to reduce the circulation of information by penalizing key nodes in this network. The networks used in the present analysis are available on an institutional repository (see Methods).
Recall that Hoaxy collects 100% of the tweets carrying each piece of misinformation in our collection, not a sample. As a result, our analysis provides a complete picture of the anatomy of the misinformation network. Of course, our methodology has some unavoidable limitations. First of all, Hoaxy only tracks a fixed, limited set of sources, due to data volume restrictions in the public Twitter API. Of these sources, it only tracks how their content spreads on Twitter, ignoring other social media platforms. Facebook, by far the largest social media platform, does not provide access to data on shares, ostensibly for privacy reasons, even though a significant fraction of misinformation spreads via its pages [30], which are understood to be public. Thus we acknowledge that coverage of our corpus of misinformation is incomplete. Nonetheless, by focusing on low-credibility sources that have come to the attention of large media and factchecking organizations, and that have been flagged as the most popular purveyors of unverified claims, Hoaxy captures a broad snapshot of misinformation circulating online.
Second, Hoaxy does not track the spread of unsubstantiated claims in the professional mainstream press. News websites do report unverified claims, in a manner and with a frequency dictated by their own editorial standards. For example, hedging language is often used to express degrees of uncertainty [59]. While most claims reported in the mainstream media are eventually verified, many remain unverified, and some even turn out to be false. Some instances of misinformation may see their spread boosted as a result of additional exposure on mainstream news outlets. Understanding the dynamics of the broader media and information ecosystem is therefore needed to fully comprehend the phenomenon of digital misinformation, but it is outside the scope of the present work.
Third, we consider only US-based sources publishing English content. This is an unavoidable consequence of our reliance on lists produced by US-based media organizations. Different sources will be of course active in different countries. Worrisome amounts of misinformation, for example, have been observed in the run-up to the general elections in France [14]. To foster the study of misinformation in non-US contexts, we have released the code of Hoaxy under an open-source license, so that other groups can build upon our work [60,61].
Last but not least, it is important to reiterate that the articles collected by Hoaxy are in general not verified. Inspection of our corpus confirms that not all articles collected by Hoaxy are completely inaccurate. As far as the present analysis is concerned, we provide an assessment of the rate of confirmed articles in our dataset (see Methods). When used as a search engine for misinformation, Hoaxy addresses this limitation by showing the most relevant fact-checking articles matching the input query, thereby facilitating claim verification. We hope that the data, software, and visualizations offered by the Hoaxy platform will be useful to researchers, reporters, policymakers, and, last but not least, ordinary Internet users as they learn to cope with online misinformation. who assisted in verification of articles and annotation of central accounts. We are also indebted