Figures
Abstract
Social media aggregate people around common interests eliciting collective framing of narratives and worldviews. However, in such a disintermediated environment misinformation is pervasive and attempts to debunk are often undertaken to contrast this trend. In this work, we examine the effectiveness of debunking on Facebook through a quantitative analysis of 54 million users over a time span of five years (Jan 2010, Dec 2014). In particular, we compare how users usually consuming proven (scientific) and unsubstantiated (conspiracy-like) information on Facebook US interact with specific debunking posts. Our findings confirm the existence of echo chambers where users interact primarily with either conspiracy-like or scientific pages. However, both groups interact similarly with the information within their echo chamber. Then, we measure how users from both echo chambers interacted with 50,220 debunking posts accounting for both users consumption patterns and the sentiment expressed in their comments. Sentiment analysis reveals a dominant negativity in the comments to debunking posts. Furthermore, such posts remain mainly confined to the scientific echo chamber. Only few conspiracy users engage with corrections and their liking and commenting rates on conspiracy posts increases after the interaction.
Citation: Zollo F, Bessi A, Del Vicario M, Scala A, Caldarelli G, Shekhtman L, et al. (2017) Debunking in a world of tribes. PLoS ONE 12(7): e0181821. https://doi.org/10.1371/journal.pone.0181821
Editor: Jose Javier Ramasco, Instituto de Fisica Interdisciplinar y Sistemas Complejos, SPAIN
Received: February 21, 2016; Accepted: May 13, 2017; Published: July 24, 2017
Copyright: © 2017 Zollo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The entire data collection process has been carried out exclusively by means of the Facebook Graph API, which are publicly available. Further details about data collection are provided in the Methods section of the paper, together with the complete list of pages.
Funding: Funding for this work was provided by EU FET project MULTIPLEX nr. 317532, SIMPOL nr. 610704, DOLFINS nr. 640772, SOBIGDATA 654024, IMT/eXtrapola Srl (P0082). SH and LS were supported by the Israel Ministry of Science and Technology, the Japan Science and Technology Agency, the Italian Ministry of Foreign Affairs and International Cooperation, the Israel Science Foundation, ONR and DTRA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Socio-technical systems and microblogging platforms such as Facebook and Twitter have created a direct path from producers to consumers of content, changing the way users get informed, debate ideas, and shape their worldviews [1–6]. Misinformation on online social media is pervasive and represents one of the main threats to our society according to the World Economic Forum [7, 8]. The diffusion of false rumors affects public perception of reality as well as the political debate [9]. Indeed, links between vaccines and autism, the belief that 9/11 was an inside job, or the more recent case of Jade Helm 15—a simple military exercise that was perceived as the imminent threat of the civil war in the US—are just few examples of the consistent body of the collective narratives grounded on unsubstantiated information.
Confirmation bias plays a pivotal role in cascades dynamics and facilitates the emergence of echo chambers [10]. Indeed, users online show the tendency a) to select information that adheres to their system of beliefs even when containing parodistic jokes; and b) to join polarized groups [11]. Recently, researches have shown [12–17] that continued exposure to unsubstantiated rumors may be a good proxy to detect gullibility—i.e., jumping the credulity barrier by accepting highly implausible theories—on online social media. Narratives, especially those grounded on conspiracy theories, play an important cognitive and social function in simplifying causation. They are formulated in a way that is able to reduce the complexity of reality and to tolerate a certain level of uncertainty [18–20]. However, conspiracy thinking creates or reflects a climate of disengagement from mainstream society and recommended practices [21].
Several efforts are striving to contrast misinformation spreading from algorithmic-based solutions to tailored communication strategies [22–27] but not much is known about their efficacy. In this work we characterize the consumption of debunking posts on Facebook and, more generally, the reaction of users to dissenting information.
We perform a thorough quantitative analysis of 54 million US Facebook users and study how they consume scientific and conspiracy-like contents. We identify two main categories of pages: conspiracy news—i.e. pages promoting contents neglected by main stream media—and science news. Using an approach based on [12, 14, 15], we further explore Facebook pages that are active in debunking conspiracy theses (see section Materials and methods for further details about data collection).
Notice that we do not focus on the quality of the information but rather on the possibility for verification. Indeed, it is easy for scientific news to identify the authors of the study, the university under which the study took place and if the paper underwent a peer review process. On the other hand, conspiracy-like content is difficult to verify because it is inherently based upon suspect information and is derived allegations and a belief in secrets from the public. The self-description of many conspiracy pages on Facebook, indeed, claims that they inform people about topics neglected by mainstream media and science. Pages like I don’t trust the government, Awakening America, or Awakened Citizen, promote wide-ranging content from aliens, chem-trails, to the causal relation between vaccinations and autism or homosexuality. Conversely, science news pages—e.g., Science, Science Daily, Nature—are active in diffusing posts about the most recent scientific advances.
The list of pages has been built by censing all pages with the support of very active debunking groups (see section Materials and methods for more details). The final dataset contains pages reporting on scientific and conspiracy-like news. On a time span of five years (Jan 2010, Dec 2014) we downloaded all public posts (with the related lists of likes and comments) of 83 scientific and 330 conspiracy pages. In addition, we identified 66 Facebook pages aiming at debunking conspiracy theories.
Our analysis shows that two well-formed and highly segregated communities exist around conspiracy and scientific topics—i.e., users are mainly active in only one category. Focusing on users interactions with respect to their preferred content, we find similarities in the consumption of posts. Different kinds of content aggregate polarized groups of users (echo chambers). At this stage we want to test the role of confirmation bias with respect to dissenting (resp., confirmatory) information from the conspiracy (resp., science) echo chamber. Focusing on a set of 50,220 debunking posts we measure the interaction of users from both conspiracy and science echo chambers. We find that such posts remain confined to the scientific echo chamber mainly. Indeed, the majority of likes on debunking posts is left by users polarized towards science (∼67%), while only a small minority (∼7%) by users polarized towards conspiracy. However, independently of the echo chamber, the sentiment expressed by users when commenting on debunking posts is mainly negative.
Results and discussion
The aim of this work is to test the effectiveness of debunking campaigns on online social media. As a more general aim we want to characterize and compare users attention with respect to a) their preferred narrative and b) information dissenting from such a narrative. Specifically we want to understand how users usually exposed to unverified information such as conspiracy theories respond to debunking attempts.
Echo chambers
As a first step we characterize how distinct types of information—belonging to the two different narratives—are consumed on Facebook. In particular we focus on users’ actions allowed by Facebook’s interaction paradigm—i.e., likes, shares, and comments. Each action has a particular meaning [28]. A like represents a positive feedback to a post; a share expresses a desire to increase the visibility of a given information; and a comment is the way in which online collective debates take form around the topic of the post. Therefore, comments may contain negative or positive feedbacks with respect to the post.
Assuming that a user u has performed x and y likes on scientific and conspiracy-like posts, respectively, we let ρ(u) = (y − x)/(y + x). Thus, a user u for whom ρ(u) = −1 is polarized towards science, whereas a user whose ρ(u) = 1 is polarized towards conspiracy. We define the user polarization ρlikes ∈ [−1, 1] (resp., ρcomments) as the ratio of difference in likes (resp., comments) on conspiracy and science posts. In Fig 1 we show that the probability density function (PDF) for the polarization of all users is sharply bimodal with most having (ρ(u) ∼ −1) or (ρ(u) ∼ 1). Thus, most users may be divided into two groups, those polarized towards science and those polarized towards conspiracy. The same pattern holds if we look at polarization based on comments rather than on likes.
Probability density functions (PDFs) of the polarization of all users computed both on likes (left) and comments (right).
To further understand how these two segregated communities behave, we explore how they interact with their preferred type of information. In the left panel of Fig 2 we show the distributions of the number of likes, comments, and shares on posts belonging to both scientific and conspiracy news. As seen from the plots, all the distributions are heavy-tailed—i.e, all the distributions are best fitted by power laws and all possess similar scaling parameters (see Materials and methods section for further details).
Left panel: Complementary cumulative distribution functions (CCDFs) of the number of likes, comments, and shares received by posts belonging to conspiracy (top) and scientific (bottom) news. Right panel: Kaplan-Meier estimates of survival functions of posts belonging to conspiracy and scientific news. Error bars are on the order of the size of the symbols.
We define the persistence of a post (resp., user) as the Kaplan-Meier estimates of survival functions by accounting for the first and last comment to the post (resp., of the user). In the right panel of Fig 2 we plot the Kaplan-Meier estimates of survival functions of posts grouped by category. To further characterize differences between the survival functions, we perform the Peto & Peto [29] test to detect whether there is a statistically significant difference between the two survival functions. Since we obtain a p-value of 0.944, we can state that there are not significant statistical differences between the posts’ survival functions on both science and conspiracy news. Thus, the posts’ persistence is similar in the two echo chambers.
We continue our analysis by examining users interaction with different kinds of posts on Facebook. In the left panel of Fig 3 we plot the CCDFs of the number of likes and comments of users on science or conspiracy news. These results show that users consume information in a comparable way—i.e, all distributions are heavy tailed (for scaling parameters and other details refer to Materials and methods section). The right panel of Fig 3 shows that the persistence of users—i.e., the Kaplan-Meier estimates of survival functions—on both types of content is nearly identical. Attention patterns of users in the conspiracy and science echo chambers reveal that both behave in a very similar manner.
Left panel: Complementary cumulative distribution functions (CCDFs) of the number of comments (top), and likes (bottom), per each user on the two categories. Right panel: Kaplan-Meier estimates of survival functions for users on conspiracy and scientific news. Error bars are on the order of the size of the symbols.
In summary, contents related to distinct narratives aggregate users into different communities and consumption patterns are similar in both communities.
Response to debunking posts
Debunking posts on Facebook strive to contrast misinformation spreading by providing fact-checked information to specific topics. However, not much is known about the effectiveness of debunking to contrast misinformation spreading. In fact, if confirmation bias plays a pivotal role in selection criteria, then debunking might sound to users usually exposed to unsubstantiated rumors like something dissenting from their narrative. Here, we focus on the scientific and conspiracy echo chambers and analyze consumption of debunking posts. As a preliminary step we show how debunking posts get liked and commented according to users polarization. Notice that we consider a user to be polarized if at least the 95% of his liking activity concentrates just on one specific narrative. Fig 4 shows how users’ activity is distributed on debunking posts: Left (resp., right) panel shows the proportions of likes (resp., comments) left by users polarized towards science, users polarized towards conspiracy, and not polarized users. We notice that the majority of both likes and comments is left by users polarized towards science (resp., 66,95% and 52,12%), while only a small minority is made by users polarized towards conspiracy (resp., 6,54% and 3,88%). Indeed, the scientific echo chamber is the biggest consumer of debunking posts and only few users usually active in the conspiracy echo chamber interact with debunking information. Out of 9,790,906 polarized conspiracy users, just 117,736 interacted with debunking posts—i.e., commented a debunking post at least once.
Proportions of likes (left) and comments (right) left by users polarized towards science, users polarized towards conspiracy, and not polarized users.
To better characterize users’ response to debunking attempts, we apply sentiment analysis techniques to the comments of the Facebook posts (see Materials and methods section for further details). We use a supervised machine learning approach: first, we annotate a sample of comments and, then, we build a Support Vector Machine (SVM) [30] classification model. Finally, we apply the model to associate each comment with a sentiment value: negative, neutral, or positive. The sentiment denotes the emotional attitude of Facebook users when commenting. In Fig 5 we show the fraction of negative, positive, and neutral comments for all users and for the polarized ones. Notice that we consider only posts having at least a like, a comment, and a share. Comments tend to be mainly negative and such a negativity is dominant regardless of users polarization.
Sentiment of comments made by all users (left), users polarized towards science (center), and users polarized towards conspiracy (right) on debunking posts having at least a like, a comment, and a share.
Our findings show that debunking posts remain mainly confined within the scientific echo chamber and only few users usually exposed to unsubstantiated claims actively interact with the corrections. Dissenting information is mainly ignored. Furthermore, if we look at the sentiment expressed by users in their comments, we find a rather negative environment.
Interaction with dissenting information.
Users tend to focus on a specific narrative and select information adhering to their system of beliefs while they ignore dissenting information. However, in our scenario few users belonging to the conspiracy echo chamber interact with debunking information. What about such users? And further, what about the effect of their interaction with dissenting information? In this section we aim at better characterizing the consumption patterns of the few users that tend to interact with dissenting information. Focusing on the conspiracy echo chamber, in the top panel of Fig 6 we show the distinct survival functions—i.e. the probability of continuing in liking and commenting along time on conspiracy posts—of users who commented or not on debunking posts. Users interacting with debunking posts are generally more likely to survive—to pursue their interaction with conspiracy posts. The bottom panel of Fig 6 shows the CCDFs of the number of likes and comments for both type of users. The Spearman’s rank correlations coefficient between the number of likes and comments for both type of users are very similar: ρexp = 0.53 (95% c.i. [0.529, 0.537]); ρnot_exp = 0.57 (95% c.i. [0.566, 0.573]). However, we may observe that users who commented to debunking posts are slightly more prone to comment in general. Thus, users engaging debates with debunking posts seems to be those few who show a higher commenting activity overall.
Top panel: Kaplan-Meier estimates of survival functions of users who interacted (exposed) and did not (not exposed) with debunking. Users persistence is computed both on their likes (left) and comments (right). Bottom panel: Complementary cumulative distribution functions (CCDFs) of the number of likes (left) and comments (right), per each user exposed and not exposed to debunking.
To further characterize the effect of the interaction with debunking posts, as a secondary step, we perform a comparative analysis between the users behavior before and after they comment on debunking posts. Fig 7 shows the liking and commenting rate—i.e, the average number of likes (or comments) on conspiracy posts per day—before and after the first interaction with debunking. The plot shows that users’ liking and commenting rates increase after commenting. To assess the difference between the two distributions before and after the interaction with debunking, we perform both Kolmogorov-Smirnov [31] and Mann-Whitney-Wilcoxon [32] tests; since p-value is < 0.01, we reject the null hypothesis of equivalence of the two distributions both for likes and comments rates. To further analyze the effects of interaction with the debunking posts we use the Cox Proportional Hazard model [33] to estimate the hazard of conspiracy users exposed to—i.e., who interacted with—debunking compared to those not exposed and we find that users not exposed to debunking are 1.76 times more likely to stop interacting with conspiracy news (see Materials and methods section for further details).
Rate—i.e., average number, over time, of likes (left) (resp., comments (right)) on conspiracy posts of users who interacted with debunking posts.
Conclusions
Users online tend to focus on specific narratives and select information adhering to their system of beliefs. Such a polarized environment might foster the proliferation of false claims. Indeed, misinformation is pervasive and really difficult to correct. To smooth the proliferation of unsubstantiated rumors major corporations such as Facebook and Google are studying specific solutions. Indeed, examining the effectiveness of online debunking campaigns is crucial for understanding the processes and mechanisms behind misinformation spreading. In this work we show the existence of social echo chambers around different narratives on Facebook in the US. Two well-formed and highly segregated communities exist around conspiracy and scientific topics—i.e., users are mainly active in only one category. Furthermore, by focusing on users interactions with respect to their preferred content, we find similarities in the way in which both forms of content are consumed.
Our findings show that debunking posts remain mainly confined within the scientific echo chamber and only few users usually exposed to unsubstantiated claims actively interact with the corrections. Dissenting information is mainly ignored and, if we look at the sentiment expressed by users in their comments, we find a rather negative environment. Furthermore we show that the few users from the conspiracy echo chamber who interact with the debunking posts manifest a higher tendency to comment, in general. However, if we look at their commenting and liking rate—i.e., the daily number of comments and likes—we find that their activity in the conspiracy echo chamber increases after the interaction.
Thus, dissenting information online is ignored. Indeed, our results suggest that debunking information remains confined within the scientific echo chamber and that very few users of the conspiracy echo chamber interact with debunking posts. Moreover, the interaction seems to lead to an increasing interest in conspiracy-like content.
On our perspective the diffusion of bogus content is someway related to the increasing mistrust of people with respect to institutions, to the increasing level of functional illiteracy—i.e., the inability to understand information correctly—affecting western countries, as well as the combined effect of confirmation bias at work on a enormous basin of information where the quality is poor. According to these settings, current debunking campaigns as well as algorithmic solutions do not seem to be the best options. Our findings suggest that the main problem behind misinformation is conservatism rather than gullibility. Moreover, our results also seem to be consistent with the so-called inoculation theory [34], for which the exposure to repeated, mild attacks can let people become more resistant in changing their ordinary beliefs. Indeed, being repeatedly exposed to relatively weak arguments (inoculation procedure) could result in a major resistance to a later persuasive attack, even if the latter is stronger and uses arguments different from the ones presented before i.e., during the inoculation phase. Therefore, when users are faced with untrusted opponents in online discussion, the latter results in a major commitment with respect to their own echo chamber. Thus, a more open and smoother approach, which promotes a culture of humility aiming at demolish walls and barriers between tribes, could represent a first step to contrast misinformation spreading and its persistence online.
Materials and methods
Ethics statement
The entire data collection process is performed exclusively by means of the Facebook Graph API [35], which is publicly available and can be used through one’s personal Facebook user account. We used only public available data (users with privacy restrictions are not included in our dataset). Data was downloaded from public Facebook pages that are public entities. Users’ content contributing to such entities is also public unless the users’ privacy settings specify otherwise and in that case it is not available to us. When allowed by users’ privacy specifications, we accessed public personal information. However, in our study we used fully anonymized and aggregated data. We abided by the terms, conditions, and privacy policies of Facebook.
Data collection
We identified two main categories of pages: conspiracy news—i.e. pages promoting contents neglected by main stream media—and science news. Using an approach based on [12, 14], we defined the space of our investigation with the help of Facebook groups very active in debunking conspiracy theses. We categorized pages according to their contents and their self-description. The selection of the sources has been iterated several times and verified by all the authors. To the best of our knowledge, the final dataset is the complete set of all scientific, conspiracist, and debunking information sources active in the US Facebook scenario.
Tables 1–3 show the complete list of conspiracy, science, and debunking pages, respectively. We collected all the posts of such pages over a time span of five years (Jan 2010, Dec 2014). The first category includes all pages diffusing conspiracy information—pages which disseminate controversial information, most often lacking supporting evidence and sometimes contradictory of the official news (i.e. conspiracy theories). Indeed, conspiracy pages on Facebook often claim that their mission is to inform people about topics neglected by main stream media. Pages like I don’t trust the government, Awakening America, or Awakened Citizen promote heterogeneous contents ranging from aliens, chemtrails, geocentrism, up to the causal relation between vaccinations and homosexuality. Notice that we do not focus on the truth value of their information but rather on the possibility to verify their claims. The second category is that of scientific dissemination including scientific institutions and scientific press having the main mission to diffuse scientific knowledge. For example, pages like Science, Science Daily, and Nature are active in diffusing posts about the most recent scientific advances. The third category contains all pages active in debunking false rumors online. We use this latter set as a testbed for the efficacy of debunking campaign. The exact breakdown of the data is presented in Table 4.
Number of pages, posts, likes, comments, likers, and commenters for science, conspiracy, and debunking pages.
Sentiment classification
Data annotation consists in assigning some predefined labels to each data point. We selected a subset of 24,312 comments from the Facebook dataset (Table 4) and later used it to train a sentiment classifier. We used a user-friendly web and mobile devices annotation platform, Goldfinch—kindly provided by Sowa Labs (http://www.sowalabs.com/)—and engaged trustworthy English speakers, active on Facebook, for the annotations. The annotation task was to label each Facebook comment—isolated from its context—as negative, neutral, or positive. Each annotator had to estimate the emotional attitude of the user when posting a comment to Facebook. During the annotation process, the annotators performance was monitored in terms of the inter-annotator agreement and self-agreement, based on a subset of the comments which were intentionally duplicated. The annotation process resulted in 24,312 sentiment labeled comments, 6,555 of them annotated twice. We evaluate the self- and inter-annotator agreements in terms of Krippendorff’s Alpha-reliability [36], which is a reliability coefficient able to measure the agreement of any number of annotators, often used in literature [37]. Alpha is defined as where Do is the observed disagreement between annotators and De is the disagreement one would expect by chance. When annotators agree perfectly, Alpha = 1, and when the level of agreement equals the agreement by chance, Alpha = 0. In our case, 4,009 comments were polled twice to two different annotators and are used to assess the inter-annotator agreement, for which Alpha = 0.810, while 2,546 comments were polled twice to the same annotator and are used to asses the annotators’ self-agreements, for which Alpha = 0.916.
We treat sentiment classification as an ordinal classification task with three ordered classes. We remind that ordinal classification is a form of multi-class classification where there is a natural ordering between the classes, but no meaningful numeric difference between them [38]. We apply the wrapper approach, described in [39], with two linear-kernel Support Vector Machine (SVM) classifiers [30]. SVM is a state-of-the-art supervised learning algorithm, well suited for large scale text categorization tasks, and robust on large feature spaces. The two SVM classifiers were trained to distinguish the extreme classes—negative and positive—from the rest—neutral plus positive, and neutral plus negative. During prediction, if both classifiers agree, they yield the common class, otherwise, if they disagree, the assigned class is neutral.
The sentiment classifier was trained and tuned on the training set of 19,450 annotated comments. The comments were processed into the standard Bag-of-Words (BoW) representation. The trained sentiment classifier was then evaluated on a disjoint test set of the remaining 4,862 comments. Three measures were used to evaluate the performance of the sentiment classifier:
- The aforementioned Alpha
- The Accuracy, defined as the fraction of correctly classified examples:
- , the macro-averaged F-score of the positive and negative classes, a standard evaluation measure [40] for sentiment classification tasks: In general, F1 is the harmonic mean of Precision and Recall for each class [41]: where Precision for class x is the fraction of correctly predicted examples out of all the predictions with class x: and Recall for class x is the fraction of correctly predicted examples out of all the examples with actual class x:
The averaged evaluation are the followings: Alpha = 0.589±0.017, Accuracy = 0.654±0.012, and . The 95% confidence intervals are estimated from 10-fold cross validations.
Statistical tools
Kaplan-Meier estimator.
Let us define a random variable T on the interval [0, ∞), indicating the time an event takes place. The cumulative distribution function (CDF), F(t) = Pr(T ≤ t), indicates the probability that a subject selected at random will have a survival time less than or equal some stated value t. The survival function, defined as the complementary CDF (CCDF), is the probability of observing a survival time greater than some stated value t. We remind that the CCDF of a random variable X is one minus the CDF, the function f(x) = Pr(X > x)) of T. To estimate this probability we use the Kaplan–Meier estimator [42]. Let nt denote the number of users at risk of stop commenting at time t, and let dt denote the number of users that stop commenting precisely at t. Then, the conditional survival probability at time t is defined as (nt − dt)/nt. Thus, if we have N observations at times t1 ≤ t2 ≤ ⋯ ≤ tN, assuming that the events at times ti are jointly independent, the Kaplan-Meier estimate of the survival function at time t is defined as with the convention that .
Comparison between power law distributions.
Comparisons between power law distributions of two different quantities are usually carried out through log-likelihood ratio test [43] or Kolmogorov-Smirnov test [31]. The former method relies on the ratio between the likelihood of a model fitted on the pooled quantities and the sum of the likelihoods of the models fitted on the two separate quantities, whereas the latter is based on the comparison between the cumulative distribution functions of the two quantities. However, both the afore-mentioned approaches take into account the overall distributions, whereas more often we are especially interested in the scaling parameter of the distribution, i.e. how the tail of the distribution behaves. Moreover, since the Kolmogorov-Smirnov test was conceived for continuous distributions, its application to discrete data gives biased p-values. For these reasons, in this paper we decide to compare our distributions by assess significant differences in the scaling parameters by means of a Wald test. The Wald test we conceive is defined as where and are the estimates of the scaling parameters of the two powerlaw distributions. The Wald statistics, where is the variance of , follows a χ2 distribution with 1 degree of freedom. We reject the null hypothesis H0 and conclude that there is a significant difference between the scaling parameters of the two distributions if the p-value of the Wald statistics is below a given significance level.
Attention patterns.
Different fits for the tail of the distributions have been taken into account (lognormal, Poisson, exponential, and power law). As for attention patterns related to posts, Goodness of fit tests based on the log-likelihood [31] have proved that the tails are best fitted by a power law distribution both for conspiracy and scientific news (see Tables 5 and 6). Log-likelihoods of different attention patterns (likes, comments, shares) are computed under competing distributions. The one with the higher log-likelihood is then the better fit [31]. Log-likelihood ratio tests between power law and the other distributions yield positive ratios, and p-value computed using Vuong’s method [44] are close to zero, indicating that the best fit provided by the power law distribution is not caused by statistical fluctuations. Lower bounds and scaling parameters have been estimated via minimization of Kolmogorov-Smirnov statistics [31]; the latter have been compared via Wald test (see Table 7).
As for users activity, Tables 8 and 9 list the fit parameters with various canonical distributions for both conspiracy and scientific news. Table 10 shows the power law fit parameters and summarizes the estimated lower bounds and scaling parameters for each distribution.
Cox-Hazard model.
The hazard function is modeled as h(t) = h0(t)exp(βx), where h0(t) is the baseline hazard and x is a dummy variable that takes value 1 when the user has been exposed to debunking and 0 otherwise. The hazards depend multiplicatively on the covariates, and exp(β) is the ratio of the hazards between users exposed and not exposed to debunking. The ratio of the hazards of any two users i and j is exp(β(xi − xj)), and is called the hazard ratio. This ratio is assumed to be constant over time, hence the name of proportional hazard. When we consider exposure to debunking by means of likes, the estimated β is 0.72742(s.e. = 0.01991, p < 10−6) and the corresponding hazard ratio, exp(β), between users exposed and not exposed is 2.07, indicating that users not exposed to debunking are 2.07 times more likely to stop consuming conspiracy news. Goodness of fit for the Cox Proportional Hazard Model has been assessed by means of Likelihood ratio test, Wald test, and Score test which provided p-values close to zero. Fig 8 (left) shows the fit of the Cox proportional hazard model when the lifetime is computed on likes.
Kaplan-Meier estimates of survival functions of users who interacted (exposed, orange) and did not (not exposed, green) with debunking and fits of the Cox proportional hazard model. Persistence of users is computed both on likes (left) and comments (right).
Moreover, if we consider exposure to debunking by means of comments, the estimated β is 0.56748(s.e. = 0.02711, p < 10−6) and the corresponding hazard ratio, exp(β), between users exposed and not exposed is 1.76, indicating that users not exposed to debunking are 1.76 times more likely to stop consuming conspiracy news. Goodness of fit for the Cox Proportional Hazard Model has been assessed by means of Likelihood ratio test, Wald test, and Score test, which provided p-values close to zero. Fig 8 (right) shows the fit of the Cox proportional hazard model when the lifetime is computed on comments.
Acknowledgments
The authors declare no competing interests. Funding for this work was provided by EU FET project MULTIPLEX nr. 317532, SIMPOL nr. 610704, DOLFINS nr. 640772, SOBIGDATA 654024, IMT/eXtrapola Srl (P0082). SH and LS were supported by the Israel Ministry of Science and Technology, the Japan Science and Technology Agency, the Italian Ministry of Foreign Affairs and International Cooperation, the Israel Science Foundation, ONR and DTRA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We would like to thank: Dr. Igor Mozetič for his precious help with the sentiment analysis task; Geoff Hall and “Skepti Forum”, for providing fundamental support in defining the atlas of news sources in the US Facebook; Francesca Pierri for her valuable advices and suggestions.
Author Contributions
- Conceptualization: WQ FZ.
- Formal analysis: WQ FZ AB MDV.
- Investigation: WQ FZ.
- Methodology: WQ FZ.
- Software: WQ FZ AB MDV.
- Supervision: WQ.
- Validation: WQ FZ AB MDV AS GC LS SH.
- Visualization: WQ FZ.
- Writing – original draft: WQ FZ AB MDV AS GC LS SH.
- Writing – review & editing: WQ FZ.
References
- 1. Brown J, Broderick AJ, Lee N. Word of mouth communication within online communities: Conceptualizing the online social network. Journal of interactive marketing. 2007;21(3):2–20.
- 2. Kahn R, Kellner D. New media and internet activism: from the ‘Battle of Seattle’ to blogging. new media and society. 2004;6(1):87–95.
- 3. Quattrociocchi W, Conte R, Lodi E. Opinions Manipulation: Media, Power and Gossip. Advances in Complex Systems. 2011;14(4):567–586.
- 4. Quattrociocchi W, Caldarelli G, Scala A. Opinion dynamics on interacting networks: media competition and social influence. Scientific Reports. 2014;4. pmid:24861995
- 5.
Kumar R, Mahdian M, McGlohon M. Dynamics of Conversations. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD’10. New York, NY, USA: ACM; 2010. p. 553–562.
- 6. Schmidt AL, Zollo F, Del Vicario M, Bessi A, Scala A, Caldarelli G, et al. Anatomy of news consumption on Facebook. Proceedings of the National Academy of Sciences. 2017;114(12).
- 7.
Howell WL. Digital Wildfires in a Hyperconnected World. World Economic Forum; 2013. Global Risks 2013.
- 8.
Quattrociocchi W. How does misinformation spread online? World Economic Forum. 2015;.
- 9. Kuklinski JH, Quirk PJ, Jerit J, Schwieder D, Rich RF. Misinformation and the currency of democratic citizenship. Journal of Politics. 2000;62(3):790–816.
- 10. Del Vicario M, Bessi A, Zollo F, Petroni F, Scala A, Caldarelli G, et al. The spreading of misinformation online. Proceedings of the National Academy of Sciences. 2016;113(3):554–559.
- 11.
Quattrociocchi W, Scala A, Sunstein CR. Echo Chambers on Facebook. Available at SSRN. 2016;.
- 12. Bessi A, Coletto M, Davidescu GA, Scala A, Caldarelli G, Quattrociocchi W. Science vs Conspiracy: collective narratives in the age of (mis)information. Plos ONE. 2015;10(2).
- 13. Takayasu M, Sato K, Sano Y, Yamada K, Miura W, Takayasu H. Rumor diffusion and convergence during the 3.11 earthquake: a Twitter case study. PLoS one. 2015;10(4):e0121443. pmid:25831122
- 14. Mocanu D, Rossi L, Zhang Q, Karsai M, Quattrociocchi W. Collective attention in the age of (mis)information. Computers in Human Behavior. 2015;51, Part B:1198–1204.
- 15. Bessi A, Scala A, Rossi L, Zhang Q, Quattrociocchi W. The economy of attention in the age of (mis) information. Journal of Trust Management. 2014;1(1):1–13.
- 16. Bessi A, Zollo F, Del Vicario M, Scala A, Caldarelli G, Quattrociocchi W. Trend of Narratives in the Age of Misinformation. PLoS One. 2015;10(8).
- 17. Zollo F, Novak PK, Del Vicario M, Bessi A, Mozetič I, Scala A, et al. Emotional Dynamics in the Age of Misinformation. PLoS One. 2015;10(9).
- 18.
Byford J. Conspiracy Theories: A Critical Introduction. Palgrave Macmillan; 2011. Available from: http://books.google.it/books?id=vV-UhrQaoecC.
- 19.
Fine GA, Campion-Vincent V, Heath C. Rumor Mills: The Social Impact of Rumor and Legend. Social problems and social issues. Transaction Publishers;. Available from: http://books.google.it/books?id=dADxBwgCF5MC.
- 20.
Hogg MA, Blaylock DL. Extremism and the Psychology of Uncertainty. Blackwell/Claremont Applied Social Psychology Series. Wiley; 2011. Available from: http://books.google.it/books?id=GTgBQ3TPwpAC.
- 21. Betsch C, Sachse K. Debunking vaccination myths: Strong risk negations can increase perceived vaccination risks. Health psychology. 2013;32(2):146. pmid:22409264
- 22.
Qazvinian V, Rosengren E, Radev DR, Mei Q. Rumor has it: Identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2011. p. 1589–1599.
- 23.
Ciampaglia GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flammini A. Computational fact checking from knowledge networks. arXiv preprint arXiv:150103471. 2015;.
- 24.
Resnick P, Carton S, Park S, Shen Y, Zeffer N. RumorLens: A System for Analyzing the Impact of Rumors and Corrections in Social Media. In: Proc. Computational Journalism Conference; 2014.
- 25.
Gupta A, Kumaraguru P, Castillo C, Meier P. Tweetcred: Real-time credibility assessment of content on twitter. In: Social Informatics. Springer; 2014. p. 228–243.
- 26. AlMansour AA, Brankovic L, Iliopoulos CS. A Model for Recalibrating Credibility in Different Contexts and Languages-A Twitter Case Study. International Journal of Digital Information and Wireless Communications (IJDIWC). 2014;4(1):53–62.
- 27.
Ratkiewicz J, Conover M, Meiss M, Gonçalves B, Flammini A, Menczer F. Detecting and Tracking Political Abuse in Social Media. In: ICWSM; 2011.
- 28. Ellison NB, Steinfield C, Lampe C. The Benefits of Facebook “Friends:” Social Capital and College Students’ Use of Online Social Network Sites. Journal of Computer-Mediated Communication. 2007;12(4):1143–1168.
- 29. Peto R, Peto J. Asymptotically efficient rank invariant test procedures. J Royal Statistical Society Ser A. 1972;(135):185–207.
- 30.
Vapnik VN. The Nature of Statistical Learning Theory. New York, NY, USA: Springer-Verlag New York, Inc.; 1995.
- 31. Clauset A, Shalizi CR, Newman MEJ. Power-Law Distributions in Empirical Data. SIAM Review. 2009;51(4):661–703.
- 32. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics. 1947; p. 50–60.
- 33. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220.
- 34. McGuire WJ. The effectiveness of supportive and refutational defenses in immunizing and restoring beliefs against persuasion. Sociometry. 1961;24(2):184–197.
- 35.
Facebook. Using the Graph API; 2013. Website. Available from: https://developers.facebook.com/docs/graph-api/using-graph-api/.
- 36.
Krippendorff K. Content Analysis, An Introduction to Its Methodology. 3rd ed. Thousand Oaks, CA: Sage Publications; 2012.
- 37. Mozetič I, Grčar M, Smailović J. Multilingual Twitter Sentiment Classification: The Role of Human Annotators. PLoS ONE. 2016;11(5):1–26.
- 38.
Gaudette L, Japkowicz N. Evaluation methods for ordinal classification. In: Advances in Artificial Intelligence. Springer; 2009. p. 207–210.
- 39.
Frank E, Hall M. A simple approach to ordinal classification. Springer; 2001.
- 40. Kiritchenko S, Zhu X, Mohammad SM. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research. 2014; p. 723–762.
- 41. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing & Management. 2009;45(4):427–437.
- 42. Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association. 1958;53(282):457–481.
- 43.
Bessi A. Two samples test for discrete power-law distributions; 2015. Available from: http://arxiv.org/abs/1503.00643.
- 44. Vuong QH. Likelihood Ratio Tests for Model Selection and non-nested Hypotheses. Econometrica. 1989;57.