Users Polarization on Facebook and Youtube

On social media algorithms for content promotion, accounting for users preferences, might limit the exposure to unsolicited contents. In this work, we study how the same contents (videos) are consumed on different platforms -- i.e. Facebook and YouTube -- over a sample of $12M$ of users. Our findings show that the same content lead to the formation of echo chambers, irrespective of the online social network and thus of the algorithm for content promotion. Finally, we show that the users' commenting patterns are accurate early predictors for the formation of echo-chambers.


Introduction
The way people attempt to make sense of relevant issues changed with the shift from an era of mediated mass communication to one of disintermediated echo chambers [1][2][3][4][5][6]. On online social media, polarized communities emerge around diverse narratives. Some of these narratives reflect the extreme disagreement of public opinion on global and social issues. The emergence of polarization in online environments might reduce viewpoint heterogeneity, which has long been viewed as an important component of strong democratic societies [7,8].
Confirmation bias has been shown to play a pivotal role in the diffusion of rumors online [9]. However, on online social media, different algorithms foster personalized contents according to user tastes -i.e. they show users viewpoints that they already agree with, hence leading to the so called filter bubbles. Little is known about the factors affecting the algorithms' outcomes. Facebook promotes posts according to the News Feed algorithm, that helps users to see more stories from friends they interact with the most, and the number of comments and likes a post receives and what kind of story it is -e.g. photo, video, status update -can also make a post more likely to appear [10]. Conversely, YouTube promotes videos through Watch Time, which prioritizes videos that lead to a longer overall viewing session over those that receive more clicks [11]. One hypothesis is that these algorithms might have a role in the emergence of echo chambers. However, not much is known about the role of cognitive factors in driving users to aggregate in echo chambers supporting their preferred narrative. Recent studies suggest confirmation bias as one of the driving forces of content selection, which eventually leads to the emergence of polarized communities [12][13][14][15][16].
In this work, we aim at characterizing the behavior of users dealing with the same contents, but different mechanisms of content promotion. By focusing on all YouTube videos posted by scientific and conspiracy-like Facebook pages, we want to understand whether different mechanisms regulating content promotion in Facebook and Youtube lead to the emergence of homogeneous echo chambers.
We choose to analyze such specific narratives for two main reasons: a) scientific news and conspiracy-like news are two very distinct and conficting narratives; b) scientific pages share the main mission to diffuse scientific knowledge and rational thinking, while the alternative ones resort to unsubstantiated rumors.
Indeed, conspiracy-like pages disseminate myth narratives and controversial information, usually lacking supporting evidence and most often contradictory of the official news. Moreover,mthe spreading of misinformation on online social media has become a widespread phenomenon to an extent that the World Economic Forum listed massive digital misinformation as one of the main threats for the modern society [16,17].
In spite of different debunking strategies, unsubstantiated rumors -e.g. those supporting anti-vaccines claims, climate change denials, and alternative medicine myths -keep proliferating in polarized communities emerging on online enviroments [9,14], leading to a climate of disengagement from mainstream society and recommended practices. A recent study [18] pointed out the inefficacy of debunking and the concrete risk of a backfire effect [19,20] from the usual and most commited consumers of conspiracy-like narratives.
We believe that additional insights about cognitive factors and behavioral patterns driving the emergence of polarized environments are crucial to understand and develop strategies to mitigate the spreading of online misinformation.
In this paper, using a quantitative analysis on a massive dataset (12M of users), we compare consumption patterns of videos supporting scientific and conspiracy-like news on Facebook and Youtube.
We extend our analysis by investigating the polarization dynamics -i.e. how users become polarized comment after comment. On both platforms, we observe that some users interact only with a specific kind of content since the beginning, whereas others start their commenting activity by switching between contents supporting different narratives. The vast majority of the latter -after the initial switching phase -starts consuming mainly one type of information, becoming polarized towards one of the two conflicting narratives.
Finally, by means of a multinomial logistic model, we are able to predict with a good precision the probability of whether a user will become polarized towards a given narrative or she will continue to switch between information supporting competing narratives. The observed evolution of polarization is similar between Facebook and YouTube to an extent that the statistical learning model trained on Facebook is able to predict with a good precision the polarization of YouTube users, and vice versa. Our findings show that conflicting narratives lead to the aggregation of users in different echo chambers, irrespective of the online social network and the algorithm of content promotion.

Results
We start our analysis by focusing on the statistical signatures of content consumption on Facebook and Youtube videos. The focus is on all videos posted by conspiracy-like and scientific pages on Facebook. We compare the consumption patterns of the same video on both Facebook and Youtube. On Facebook a like stands for a positive feedback to the post; a share expresses the will to increase the visibility of a given information; and a comment is the way in which online collective debates take form around the topic 2/13 promoted by posts. Similarly, on YouTube a like stands for a positive feedback to the video; and a comment is the way in which online collective debates grow around the topic promoted by videos.

Contents Consumption across Facebook and YouTube.
Focusing on the consumptions patterns of YouTube videos posted on Facebook pages, we compute the Spearman's rank correlation coefficients between users' actions on Facebook posts and the related YouTube videos (see Figure 1). By means of the Mantel test [21] we find a statistically significant (simulated p-value < 0.01, based on 10 4 Monte Carlo replicates), high, and positive (r = 0.987) correlation between the correlation matrices of Science and Conspiracy. In particular, we find positive and high correlations between users' actions on YouTube videos for both Science and Conspiracy, indicating a similar strong monotone increasing relationship between views, likes, and comments. Furthermore, we observe positive and mild correlations between users' actions on Facebook posts linking YouTube videos for both Science and Conspiracy, suggesting a monotone increasing relationship between likes, comments, and shares. Conversely, we find positive yet low correlations between users' actions across YouTube videos and the Facebook posts linking the videos for both Science and Conspiracy, implying that the success -in terms of received attention -of videos posted on YouTube does not ensure a comparable success on Facebook, and vice versa. Such results provide the first evidences towards a similar consumption behavior of users consuming conflicting narratives in different online social networks.
To further investigate users' consumption patterns, in Figure

Polarized and Homogeneous Communities.
We broaden our analysis by looking at how Facebook and Youtube users are polarized towards scientific or conspiracy-like contents. Figure 3 shows the Probability Density Functions (PDFs) of about 12M users' polarization computed on Facebook and YouTube. We observe two bimodal distributions, indicating that most of the users are strongly polarized towards one of the two conflicting narratives in both online social networks. To quantify the degree of polarization we use the Bimodality Coefficient (BC), and we find that the BC is very high for both Facebook and YouTube. In particular, BC F B = 0.964 and BC Y T = 0.928. Moreover, we observe that the percentage of polarized users (users with ρ < 0.05 and ρ > 0.95) is 93.6% on Facebook and 87.8% on YouTube; therefore, two well separated communities support competing narratives in both online social networks.
Such a result shows that conflicting narratives lead users to aggregate in well separated echo chambers, independently from the online social network and the specific algorithm of content promotion.
To further characterize such a polarized environment, we analyze the consumption patterns of polarized users.   The aggregation of users around conflicting narratives lead to the emergence of echo chambers. Once inside such homogeneous and polarized communities, users supporting both narratives behave in a similar way, irrespective of the platform and the algorithm of content promotion.

Prediction of Users Polarization.
We further extend our analysis by investigating the polarization dynamics -i.e. how users' polarization evolves comment after comment. We consider random samples of 400 users who left at least 100 comments, and we compute the mobility of a user across different contents along time. On both Facebook and YouTube, we observe that some users interact with a specific kind of content, whereas others start their commenting activity by switching between contents supporting different narratives. The vast majority of the latter -after the initial switching phase -starts consuming one type of information, becoming polarized towards one of the two conflicting narratives.
We exploit such a feature to derive a data-driven model to forecast users' polarizations. Indeed, by means of a multinomial logistic model, we are able to predict the probability of whether a user will become polarized towards a given narrative or she will continue to switch between information supporting competing narratives.
In particular, we consider the users' polarization after n comments, ρ n with n = 1, . . . , 100, as a predictor to classify users in three different classes: Polarized in Science (N = 400), Not Polarized (N = 400), Polarized in Conspiracy (N = 400). Figure 6 shows precision, recall, and accuracy of the classification tasks on Facebook and YouTube as a function of n. On both online social networks, we find that the model's performances monotonically increase as a function of n for each class. Focusing on accuracy, significant results (greater than 0.70) are obtained for low values of n. A suitable compromise between classification performances and required number of comments seems to be n = 50, which provides an accuracy greater than 0.80 for each class on both YouTube and Facebook.
To assess how the results generalize to independent datasets and to limit problems like overfitting, we split YouTube and Facebook users datasets in training sets (N = 1000) and test sets (N = 200), and we perform Monte Carlo cross validations with 10 3 iterations. Results of Monte Carlo validations are shown in Table 1 and confirm the goodness of the model.
We conclude that the early mobility on commenting is an accurate predictor of the preferential attachment of users to a specific echo chamber.
Moreover, in Table 2, we show that the evolution of the polarization on Facebook and YouTube is so alike that the same model (with n = 50), when trained with Facebook users (N = 1200) to classify YouTube users (N = 1200), leads to an accuracy in the classification task greater than 0.80 for each class. Similarly, using YouTube users as training set to classify Facebook users leads to similar performances. Such results highlight a strong similarity in behavioral patterns of users interacting in different online social networks.

Discussion
Algorithms for content promotion are supposed to be the main determinants of the polarization effect arising out of online social media. Still, not much is known about the role of cognitive factors in driving users to aggregate in echo chambers supporting their favorite narrative. Recent studies suggest confirmation bias as one of the driving forces of content selection, which eventually leads to the emergence of polarized communities [12][13][14][15].
Our findings show that conflicting narratives lead to the aggregation of users in homogeneous echo chambers, irrespective of the online social network and the algorithm of content promotion.
Indeed, in this work, we characterize the behavioral patterns of users dealing with the same contents, but different mechanisms of content promotion. In particular, we investigate whether different mechanisms regulating content promotion in Facebook and Youtube lead to the emergence of homogeneous echo chambers.
We study how users interact with two very distinct and conflicting narratives -i.e. conspiracy-like and scientific news -on Facebook and YouTube. Using extensive quantitative analysis, we find the emergence of polarized and homogeneous communities supporting competing narratives that behave similarly on both online social networks. Moreover, we analyze the evolution of polarization, i.e. how users become polarized towards a narrative. Still, we observe strong similarities between behavioral patterns of users supporting conflicting narratives on different online social networks. Such a common behavior allows us to derive a statistical learning model to predict with a good precision whether a user will become polarized towards a certain narrative or she will continue to switch between contents supporting different narratives. Finally, we observe that the behavioral patterns are so similar in Facebook and YouTube that we are able to predict with a good precision the polarization of Facebook users by training the model with YouTube users, and vice versa.

Methods Ethics Statement.
The entire data collection process has been carried out exclusively through the Facebook Graph API [22] and the YouTube Data API [23], which are both publicly available, and for the analysis we used only public available data (users with privacy restrictions are not included in the dataset). The pages from which we download data are public Facebook and YouTube entities. User content contributing to such entities is also public unless the user's privacy settings specify otherwise and in that case it is not available to us.
The Facebook dataset is composed of 413 US public pages divided to Conspiracy and Science news. The first category (Conspiracy) includes pages diffusing alternative information sources and myth narratives -pages which disseminate controversial information, usually lacking supporting evidence and most often contradictory of the official news. The second category (Science) includes scientific institutions and scientific press having the main mission of diffusing scientific knowledge. Such a space of investigation is defined with the same approach as in [18], with the support of different  Table 3.

Preliminaries and Definitions.
Polarization of Users. Polarization of users, ρ u ∈ [0, 1], is defined as the fraction of comments that a user u left on posts (videos) supporting conspiracy-like narratives on Facebook (YouTube). In mathematical terms, given s u , the number of comments left on Science posts by user u, and c u , the number of comments left on Conspiracy posts by user u, the polarization of u is defined as We then consider users with ρ u > 0.95 as users polarized towards Conspiracy, and users with ρ u < 0.05 as users polarized towards Science.
Bimodality Coefficient. The Bimodality Coefficient (BC) [24] is defined as , with µ 3 referring to the skewness of the distribution and µ 4 referring to its excess kurtosis, with both moments being corrected for sample bias using the sample size n.
The BC of a given empirical distribution is then compared to a benchmark value of BC crit = 5/9 ≈ 0.555 that would be expected for a uniform distribution; higher values point towards bimodality, whereas lower values point toward unimodality.
Multinomial Logistic Model. Multinomial logistic regression is a classification method that generalizes logistic regression to multi-class problems, i.e. with more than two possible discrete outcomes [25]. Such a model is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables. In the multinomial logistic model we assume that the log-odds of each response follow a linear model where α j is a constant and β j is a vector of regression coefficients, for j = 1, 2, . . . , J − 1. Such a model is analogous to a logistic regression model, except that the probability distribution of the response is multinomial instead of binomial, and we have J − 1 equations instead of one. The J − 1 multinomial logistic equations contrast each of categories j = 1, 2, . . . , J − 1 with the baseline category J. If J = 2 the multinomial logistic model reduces to the simple logistic regression model. The multinomial logistic model may also be written in terms of the original probabilities π ij rather than the log-odds. Indeed, assuming that η iJ = 0, we can write .
Classification Performance Measures. To assess the goodness of our model we use three different measures of classification performance: precision, recall, and accuracy. For each class i, we compute the number of true positive cases T P i , true negative cases T N i , false positive cases F P i , and false negative cases F N i . Then, for each class i the precision of the classification is defined as the recall is defined as recall i = T P i T P i + F N i , and the accuracy is defined as Power law distributions. Scaling exponents of power law distributions are estimated via maximum likelihood (ML) as shown in [26]. To provide a full probabilistic assessment about whether two distributions are similar, we estimate the posterior distribution of the difference between the scaling exponents through an Empirical Bayes method.

10/13
Suppose we have two samples of observations, A and B, following power law distributions. For the sample A, we use the ML estimate of the scaling parameter,θ M L A , as location hyper-parameter of a Normal distribution with scale hyper-parameterσ M L A . Such a Normal distribution represents the prior distribution, p(θ A ) ∼ N (θ M L A ,σ M L A ), of the scaling exponent θ A . Then, according to the Bayesian paradigm, the prior distribution, p(θ A ), is updated into a posterior distribution, p(θ A |x A ): where p(x A |θ A ) is the likelihood. The posterior distribution is obtained via Metropolis-Hastings algorithm, i.e. a Markov Chain Monte Carlo (MCMC) method used to obtain a sequence of random samples from a probability distribution for which direct sampling is difficult [27][28][29]. To obtain reliable posterior distributions, we run 50, 000 iterations (5, 000 burned), which proved to ensure the convergence of the MCMC algorithm.
The posterior distribution of θ B can be computed following the same steps. Once both posterior distributions, p(θ A |x A ) and p(θ B |x B ), are derived, we compute the distribution of the difference between the scaling exponents by subtracting the posteriors, i.e.
Then, by observing the 90% High Density Interval (HDI90) of p(θ A − θ B ), we can draw a full probabilistic assessment of the similarity between the two distributions.