Debunking in a World of Tribes

Recently a simple military exercise on the Internet was perceived as the beginning of a new civil war in the US. Social media aggregate people around common interests eliciting a collective framing of narratives and worldviews. However, the wide availability of user-provided content and the direct path between producers and consumers of information often foster confusion about causations, encouraging mistrust, rumors, and even conspiracy thinking. In order to contrast such a trend attempts to \textit{debunk} are often undertaken. Here, we examine the effectiveness of debunking through a quantitative analysis of 54 million users over a time span of five years (Jan 2010, Dec 2014). In particular, we compare how users interact with proven (scientific) and unsubstantiated (conspiracy-like) information on Facebook in the US. Our findings confirm the existence of echo chambers where users interact primarily with either conspiracy-like or scientific pages. Both groups interact similarly with the information within their echo chamber. We examine 47,780 debunking posts and find that attempts at debunking are largely ineffective. For one, only a small fraction of usual consumers of unsubstantiated information interact with the posts. Furthermore, we show that those few are often the most committed conspiracy users and rather than internalizing debunking information, they often react to it negatively. Indeed, after interacting with debunking posts, users retain, or even increase, their engagement within the conspiracy echo chamber.


Introduction
Socio-technical systems and microblogging platforms such as Facebook and Twitter have created a direct path from producers to consumers of content, changing the way users get informed, debate ideas, and shape their worldviews [1][2][3][4][5][6]. Misinformation on online social media is pervasive and represents one of the main threats to our society according to the World Economic Forum [7,8]. The diffusion of false rumors affects public perception of reality as well as the political debate [9]. Indeed, links between vaccines and autism, the belief that 9/11 was an inside job, or the more recent case of Jade Helm 15-a simple military exercise that was perceived as the imminent threat of the civil war in the US-are just few examples of the consistent body of the collective narratives grounded on unsubstantiated information.
Confirmation bias plays a pivotal role in cascades dynamics and facilitates the emergence of echo chambers [10]. Indeed, users online show the tendency a) to select information that adheres to their system of beliefs even when containing parodistic jokes; and b) to join polarized groups [11]. Recently, researches have shown [12][13][14][15][16][17] that continued exposure to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Results and discussion
The aim of this work is to test the effectiveness of debunking campaigns on online social media. As a more general aim we want to characterize and compare users attention with respect to a) their preferred narrative and b) information dissenting from such a narrative. Specifically we want to understand how users usually exposed to unverified information such as conspiracy theories respond to debunking attempts.

Echo chambers
As a first step we characterize how distinct types of information-belonging to the two different narratives-are consumed on Facebook. In particular we focus on users' actions allowed by Facebook's interaction paradigm-i.e., likes, shares, and comments. Each action has a particular meaning [28]. A like represents a positive feedback to a post; a share expresses a desire to increase the visibility of a given information; and a comment is the way in which online collective debates take form around the topic of the post. Therefore, comments may contain negative or positive feedbacks with respect to the post.
Assuming that a user u has performed x and y likes on scientific and conspiracy-like posts, respectively, we let ρ(u) = (y − x)/(y + x). Thus, a user u for whom ρ(u) = −1 is polarized towards science, whereas a user whose ρ(u) = 1 is polarized towards conspiracy. We define the user polarization ρ likes 2 [−1, 1] (resp., ρ comments ) as the ratio of difference in likes (resp., comments) on conspiracy and science posts. In Fig 1 we show that the probability density function (PDF) for the polarization of all users is sharply bimodal with most having (ρ(u) * −1) or (ρ(u) * 1). Thus, most users may be divided into two groups, those polarized towards science and those polarized towards conspiracy. The same pattern holds if we look at polarization based on comments rather than on likes.
To further understand how these two segregated communities behave, we explore how they interact with their preferred type of information. In the left panel of Fig 2 we show the distributions of the number of likes, comments, and shares on posts belonging to both scientific and conspiracy news. As seen from the plots, all the distributions are heavy-tailed-i.e, all the distributions are best fitted by power laws and all possess similar scaling parameters (see Materials and methods section for further details).
We define the persistence of a post (resp., user) as the Kaplan-Meier estimates of survival functions by accounting for the first and last comment to the post (resp., of the user). In the right panel of Fig 2 we plot the Kaplan-Meier estimates of survival functions of posts grouped by category. To further characterize differences between the survival functions, we perform the Peto & Peto [29] test to detect whether there is a statistically significant difference between the two survival functions. Since we obtain a p-value of 0.944, we can state that there are not significant statistical differences between the posts' survival functions on both science and conspiracy news. Thus, the posts' persistence is similar in the two echo chambers.
We continue our analysis by examining users interaction with different kinds of posts on Facebook. In the left panel of Fig 3 we plot the CCDFs of the number of likes and comments of users on science or conspiracy news. These results show that users consume information in a comparable way-i.e, all distributions are heavy tailed (for scaling parameters and other details refer to Materials and methods section). The right panel of Fig 3 shows that the persistence of users-i.e., the Kaplan-Meier estimates of survival functions-on both types of content is nearly identical. Attention patterns of users in the conspiracy and science echo chambers reveal that both behave in a very similar manner.
In summary, contents related to distinct narratives aggregate users into different communities and consumption patterns are similar in both communities.

Response to debunking posts
Debunking posts on Facebook strive to contrast misinformation spreading by providing factchecked information to specific topics. However, not much is known about the effectiveness of debunking to contrast misinformation spreading. In fact, if confirmation bias plays a pivotal role in selection criteria, then debunking might sound to users usually exposed to unsubstantiated rumors like something dissenting from their narrative. Here, we focus on the scientific and conspiracy echo chambers and analyze consumption of debunking posts. As a preliminary step we show how debunking posts get liked and commented according to users polarization. Notice that we consider a user to be polarized if at least the 95% of his liking activity concentrates just on one specific narrative. Fig 4 shows how users' activity is distributed on debunking posts: Left (resp., right) panel shows the proportions of likes (resp., comments) left by users polarized towards science, users polarized towards conspiracy, and not polarized users. We notice that the majority of both likes and comments is left by users polarized towards science (resp., 66,95% and 52,12%), while only a small minority is made by users polarized towards conspiracy (resp., 6,54% and 3,88%). Indeed, the scientific echo chamber is the biggest consumer of debunking posts and only few users usually active in the conspiracy echo chamber interact with debunking information. Out of 9,790,906 polarized conspiracy users, just 117,736 interacted with debunking posts-i.e., commented a debunking post at least once.
To better characterize users' response to debunking attempts, we apply sentiment analysis techniques to the comments of the Facebook posts (see Materials and methods section for further details). We use a supervised machine learning approach: first, we annotate a sample of comments and, then, we build a Support Vector Machine (SVM) [30] classification model. Finally, we apply the model to associate each comment with a sentiment value: negative, neutral, or positive. The sentiment denotes the emotional attitude of Facebook users when commenting. In Fig 5 we show the fraction of negative, positive, and neutral comments for all users and for the polarized ones. Notice that we consider only posts having at least a like, a comment, and a share. Comments tend to be mainly negative and such a negativity is dominant regardless of users polarization.
Our findings show that debunking posts remain mainly confined within the scientific echo chamber and only few users usually exposed to unsubstantiated claims actively interact with the corrections. Dissenting information is mainly ignored. Furthermore, if we look at the sentiment expressed by users in their comments, we find a rather negative environment.
Interaction with dissenting information. Users tend to focus on a specific narrative and select information adhering to their system of beliefs while they ignore dissenting information. However, in our scenario few users belonging to the conspiracy echo chamber interact with debunking information. What about such users? And further, what about the effect of their interaction with dissenting information? In this section we aim at better characterizing the consumption patterns of the few users that tend to interact with dissenting information. . However, we may observe that users who commented to debunking posts are slightly more prone to comment in general. Thus, users engaging debates with debunking posts seems to be those few who show a higher commenting activity overall.
To further characterize the effect of the interaction with debunking posts, as a secondary step, we perform a comparative analysis between the users behavior before and after they comment on debunking posts. Fig 7 shows the liking and commenting rate-i.e, the average number of likes (or comments) on conspiracy posts per day-before and after the first interaction with debunking. The plot shows that users' liking and commenting rates increase after commenting. To assess the difference between the two distributions before and after the interaction with debunking, we perform both Kolmogorov-Smirnov [31] and Mann-Whitney-Wilcoxon [32] tests; since p-value is < 0.01, we reject the null hypothesis of equivalence of the two distributions both for likes and comments rates. To further analyze the effects of interaction with the debunking posts we use the Cox Proportional Hazard model [33] to estimate the hazard of conspiracy users exposed to-i.e., who interacted with-debunking compared to those not exposed and we find that users not exposed to debunking are 1.76 times more likely to stop interacting with conspiracy news (see Materials and methods section for further details). who interacted (exposed) and did not (not exposed) with debunking. Users persistence is computed both on their likes (left) and comments (right). Bottom panel: Complementary cumulative distribution functions (CCDFs) of the number of likes (left) and comments (right), per each user exposed and not exposed to debunking. https://doi.org/10.1371/journal.pone.0181821.g006

Conclusions
Users online tend to focus on specific narratives and select information adhering to their system of beliefs. Such a polarized environment might foster the proliferation of false claims. Indeed, misinformation is pervasive and really difficult to correct. To smooth the proliferation of unsubstantiated rumors major corporations such as Facebook and Google are studying specific solutions. Indeed, examining the effectiveness of online debunking campaigns is crucial for understanding the processes and mechanisms behind misinformation spreading. In this work we show the existence of social echo chambers around different narratives on Facebook in the US. Two well-formed and highly segregated communities exist around conspiracy and scientific topics-i.e., users are mainly active in only one category. Furthermore, by focusing on users interactions with respect to their preferred content, we find similarities in the way in which both forms of content are consumed.
Our findings show that debunking posts remain mainly confined within the scientific echo chamber and only few users usually exposed to unsubstantiated claims actively interact with the corrections. Dissenting information is mainly ignored and, if we look at the sentiment expressed by users in their comments, we find a rather negative environment. Furthermore we show that the few users from the conspiracy echo chamber who interact with the debunking posts manifest a higher tendency to comment, in general. However, if we look at their commenting and liking rate-i.e., the daily number of comments and likes-we find that their activity in the conspiracy echo chamber increases after the interaction. Thus, dissenting information online is ignored. Indeed, our results suggest that debunking information remains confined within the scientific echo chamber and that very few users of the conspiracy echo chamber interact with debunking posts. Moreover, the interaction seems to lead to an increasing interest in conspiracy-like content.
On our perspective the diffusion of bogus content is someway related to the increasing mistrust of people with respect to institutions, to the increasing level of functional illiteracy-i.e., the inability to understand information correctly-affecting western countries, as well as the combined effect of confirmation bias at work on a enormous basin of information where the quality is poor. According to these settings, current debunking campaigns as well as algorithmic solutions do not seem to be the best options. Our findings suggest that the main problem behind misinformation is conservatism rather than gullibility. Moreover, our results also seem to be consistent with the so-called inoculation theory [34], for which the exposure to repeated, mild attacks can let people become more resistant in changing their ordinary beliefs. Indeed, being repeatedly exposed to relatively weak arguments (inoculation procedure) could result in a major resistance to a later persuasive attack, even if the latter is stronger and uses arguments different from the ones presented before i.e., during the inoculation phase. Therefore, when users are faced with untrusted opponents in online discussion, the latter results in a major commitment with respect to their own echo chamber. Thus, a more open and smoother approach, which promotes a culture of humility aiming at demolish walls and barriers between tribes, could represent a first step to contrast misinformation spreading and its persistence online.

Ethics statement
The entire data collection process is performed exclusively by means of the Facebook Graph API [35], which is publicly available and can be used through one's personal Facebook user account. We used only public available data (users with privacy restrictions are not included in our dataset). Data was downloaded from public Facebook pages that are public entities. Users' content contributing to such entities is also public unless the users' privacy settings specify otherwise and in that case it is not available to us. When allowed by users' privacy specifications, we accessed public personal information. However, in our study we used fully anonymized and aggregated data. We abided by the terms, conditions, and privacy policies of Facebook.

Data collection
We identified two main categories of pages: conspiracy news-i.e. pages promoting contents neglected by main stream media-and science news. Using an approach based on [12,14], we defined the space of our investigation with the help of Facebook groups very active in debunking conspiracy theses. We categorized pages according to their contents and their selfdescription. The selection of the sources has been iterated several times and verified by all the authors. To the best of our knowledge, the final dataset is the complete set of all scientific, conspiracist, and debunking information sources active in the US Facebook scenario. Tables 1-3 show the complete list of conspiracy, science, and debunking pages, respectively. We collected all the posts of such pages over a time span of five years (Jan 2010, Dec 2014). The first category includes all pages diffusing conspiracy information-pages which disseminate controversial information, most often lacking supporting evidence and sometimes contradictory of the official news (i.e. conspiracy theories). Indeed, conspiracy pages on Facebook often claim that their mission is to inform people about topics neglected by main stream media. Pages like I don't trust the government, Awakening America, or Awakened Citizen promote heterogeneous contents ranging from aliens, chemtrails, geocentrism, up to the causal relation between vaccinations and homosexuality. Notice that we do not focus on the truth value of their information but rather on the possibility to verify their claims. The second category is that of scientific dissemination including scientific institutions and scientific press having the main mission to diffuse scientific knowledge. For example, pages like Science, Science Daily, and Nature are active in diffusing posts about the most recent scientific advances.
The third category contains all pages active in debunking false rumors online. We use this latter set as a testbed for the efficacy of debunking campaign. The exact breakdown of the data is presented in Table 4.

Sentiment classification
Data annotation consists in assigning some predefined labels to each data point. We selected a subset of 24,312 comments from the Facebook dataset (Table 4) and later used it to train a sentiment classifier. We used a user-friendly web and mobile devices annotation platform, Goldfinch-kindly provided by Sowa Labs (http://www.sowalabs.com/)-and engaged trustworthy English speakers, active on Facebook, for the annotations. The annotation task was to label each Facebook comment-isolated from its context-as negative, neutral, or positive. Each annotator had to estimate the emotional attitude of the user when posting a comment to Facebook. During the annotation process, the annotators performance was monitored in terms of the inter-annotator agreement and self-agreement, based on a subset of the comments which were intentionally duplicated. The annotation process resulted in 24,312 sentiment labeled comments, 6,555 of them annotated twice. We evaluate the self-and inter-annotator agreements in terms of Krippendorff's Alpha-reliability [36], which is a reliability coefficient able to measure the agreement of any number of annotators, often used in literature [37]. Alpha is defined as where D o is the observed disagreement between annotators and D e is the disagreement one would expect by chance. When annotators agree perfectly, Alpha = 1, and when the level of agreement equals the agreement by chance, Alpha = 0. In our case, 4,009 comments were polled twice to two different annotators and are used to assess the inter-annotator agreement, for which Alpha = 0.810, while 2,546 comments were polled twice to the same annotator and are used to asses the annotators' self-agreements, for which Alpha = 0.916. We treat sentiment classification as an ordinal classification task with three ordered classes. We remind that ordinal classification is a form of multi-class classification where there is a natural ordering between the classes, but no meaningful numeric difference between them [38]. We apply the wrapper approach, described in [39], with two linear-kernel Support Vector Machine (SVM) classifiers [30]. SVM is a state-of-the-art supervised learning algorithm, well suited for large scale text categorization tasks, and robust on large feature spaces. The two SVM classifiers were trained to distinguish the extreme classes-negative and positive-from the rest-neutral plus positive, and neutral plus negative. During prediction, if both classifiers agree, they yield the common class, otherwise, if they disagree, the assigned class is neutral.
The sentiment classifier was trained and tuned on the training set of 19,450 annotated comments. The comments were processed into the standard Bag-of-Words (BoW) representation. The trained sentiment classifier was then evaluated on a disjoint test set of the remaining 4,862 comments. Three measures were used to evaluate the performance of the sentiment classifier: 1. The aforementioned Alpha 2. The Accuracy, defined as the fraction of correctly classified examples: Accuracy ¼ hÀ ; À i þ h0; 0i þ hþ; þi N 3. F 1 ðþ; À Þ, the macro-averaged F-score of the positive and negative classes, a standard evaluation measure [40] for sentiment classification tasks: In general, F 1 is the harmonic mean of Precision and Recall for each class [41]: The averaged evaluation are the followings: Alpha = 0.589±0.017, Accuracy = 0.654±0.012, and F 1 ðþ; À Þ ¼ 0:685AE0:011. The 95% confidence intervals are estimated from 10-fold cross validations.

Statistical tools
Kaplan-Meier estimator. Let us define a random variable T on the interval [0, 1), indicating the time an event takes place. The cumulative distribution function (CDF), F(t) = Pr(T t), indicates the probability that a subject selected at random will have a survival time less than or equal some stated value t. The survival function, defined as the complementary CDF (CCDF), is the probability of observing a survival time greater than some stated value t. We remind that the CCDF of a random variable X is one minus the CDF, the function f(x) = Pr(X > x)) of T. To estimate this probability we use the Kaplan-Meier estimator [42]. Let n t denote the number of users at risk of stop commenting at time t, and let d t denote the number of users that stop commenting precisely at t. Then, the conditional survival probability at time t is defined as (n t − d t )/n t . Thus, if we have N observations at times t 1 t 2 Á Á Á t N , assuming that the events at times t i are jointly independent, the Kaplan-Meier estimate of the survival function at time t is defined asŜ with the convention thatŜðtÞ ¼ 1; if t < t i . Comparison between power law distributions. Comparisons between power law distributions of two different quantities are usually carried out through log-likelihood ratio test [43] or Kolmogorov-Smirnov test [31]. The former method relies on the ratio between the likelihood of a model fitted on the pooled quantities and the sum of the likelihoods of the models fitted on the two separate quantities, whereas the latter is based on the comparison between the cumulative distribution functions of the two quantities. However, both the afore-mentioned approaches take into account the overall distributions, whereas more often we are especially interested in the scaling parameter of the distribution, i.e. how the tail of the distribution behaves. Moreover, since the Kolmogorov-Smirnov test was conceived for continuous distributions, its application to discrete data gives biased p-values. For these reasons, in this paper we decide to compare our distributions by assess significant differences in the scaling parameters by means of a Wald test. The Wald test we conceive is defined as H 0 :â 1 Àâ 2 ¼ 0 H 1 :â 1 Àâ 2 6 ¼ 0; whereâ 1 andâ 2 are the estimates of the scaling parameters of the two powerlaw distributions. The Wald statistics, where VARðâ 1 Þ is the variance ofâ 1 , follows a χ 2 distribution with 1 degree of freedom. We reject the null hypothesis H 0 and conclude that there is a significant difference between the scaling parameters of the two distributions if the p-value of the Wald statistics is below a given significance level. Attention patterns. Different fits for the tail of the distributions have been taken into account (lognormal, Poisson, exponential, and power law). As for attention patterns related to posts, Goodness of fit tests based on the log-likelihood [31] have proved that the tails are best fitted by a power law distribution both for conspiracy and scientific news (see Tables 5 and 6). Log-likelihoods of different attention patterns (likes, comments, shares) are computed under competing distributions. The one with the higher log-likelihood is then the better fit [31]. Loglikelihood ratio tests between power law and the other distributions yield positive ratios, and p-value computed using Vuong's method [44] are close to zero, indicating that the best fit provided by the power law distribution is not caused by statistical fluctuations. Lower bounds and scaling parameters have been estimated via minimization of Kolmogorov-Smirnov statistics [31]; the latter have been compared via Wald test (see Table 7).
As for users activity, Tables 8 and 9 list the fit parameters with various canonical distributions for both conspiracy and scientific news. Table 10 shows the power law fit parameters and summarizes the estimated lower bounds and scaling parameters for each distribution.
Cox-Hazard model. The hazard function is modeled as h(t) = h 0 (t)exp(βx), where h 0 (t) is the baseline hazard and x is a dummy variable that takes value 1 when the user has been exposed to debunking and 0 otherwise. The hazards depend multiplicatively on the covariates, and exp(β) is the ratio of the hazards between users exposed and not exposed to debunking. The ratio of the hazards of any two users i and j is exp(β(x i − x j )), and is called the hazard ratio. This ratio is assumed to be constant over time, hence the name of proportional hazard. When we consider exposure to debunking by means of likes, the estimated β is 0.72742(s.e. = 0.01991, p < 10 −6 ) and the corresponding hazard ratio, exp(β), between users exposed and not exposed is 2.07, indicating that users not exposed to debunking are 2.07 times more likely to stop consuming conspiracy news. Goodness of fit for the Cox Proportional Hazard Model has been assessed by means of Likelihood ratio test, Wald test, and Score test which provided pvalues close to zero. Fig 8 (left) shows the fit of the Cox proportional hazard model when the lifetime is computed on likes. Moreover, if we consider exposure to debunking by means of comments, the estimated β is 0.56748(s.e. = 0.02711, p < 10 −6 ) and the corresponding hazard ratio, exp(β), between users exposed and not exposed is 1.76, indicating that users not exposed to debunking are 1.76 times more likely to stop consuming conspiracy news. Goodness of fit for the Cox Proportional Hazard Model has been assessed by means of Likelihood ratio test, Wald test, and Score test, which provided p-values close to zero. Fig 8 (right) shows the fit of the Cox proportional hazard model when the lifetime is computed on comments.