Understanding and coping with extremism in an online collaborative environment: A data-driven modeling

The Internet has provided us with great opportunities for large scale collaborative public good projects. Wikipedia is a predominant example of such projects where conflicts emerge and get resolved through bottom-up mechanisms leading to the emergence of the largest encyclopedia in human history. Disaccord arises whenever editors with different opinions try to produce an article reflecting a consensual view. The debates are mainly heated by editors with extreme views. Using a model of common value production, we show that the consensus can only be reached if groups with extreme views can actively take part in the discussion and if their views are also represented in the common outcome, at least temporarily. We show that banning problematic editors mostly hinders the consensus as it delays discussion and thus the whole consensus building process. To validate the model, relevant quantities are measured both in simulations and Wikipedia, which show satisfactory agreement. We also consider the role of direct communication between editors both in the model and in Wikipedia data (by analyzing the Wikipedia talk pages). While the model suggests that in certain conditions there is an optimal rate of “talking” vs “editing”, it correctly predicts that in the current settings of Wikipedia, more activity in talk pages is associated with more controversy.


Introduction
Large scale collaboration has been a central concept in development of both the Internet and the WWW [1][2][3]. With the ever increasing penetration of the information communication technologies across the globe, and the emergence of the user generated web (sometimes called "Web 2.0"), collaboration of individuals from all around the world to generate public good products is more ubiquitous than ever. A wide range of platforms and protocols facilitate such collaborations between humans and machines at different scales and with different goals [4], e.g., Wikipedia However, in the more and more globalized world of such social systems, conflicts may arise due to opinion differences. This is even more important in systems where common value production is the goal of the service. In most of the above mentioned systems, due to the bottomup management in place [5], the conventional tools of conflict resolution are inapplicable. Hence, it is astonishing that in spite of the magnitude of the opinion differences in the world, even in sensitive issues, quality articles are produced on Wikipedia, comparable to the ones in the expert-written encyclopedias [6]. Therefore, the remaining big puzzle about Wikipedia is that "it only works in practice, in theory, it can never work".
The approach of complex systems science has become more and more relevant to study collective social behavior. The availability of large scale data on our personal and societal activities has transformed the methods and scopes of social sciences considerably, leading to the emergence of the new field of computational social science [7,8]. In this paper, we take such an approach and use agent-based modeling to shed light on some aspects of Wikipedia opinion and content dynamics.
Wikipedia has been studied by various researchers and from different angles. When it comes to conflicts of Wikipedia, a good amount of research has been focused on vandalism and how to detect it [9][10][11]. Even though vandalism is very much related to opinion clashes among users, here we are more interested in conflicts between editors who have faith in the whole project and have no negative incentives. Such cases have been studied empirically by various groups. The bursty nature of editorial wars and the separation between peace and war phases in a dynamical framework, are studied in [12] and [13]. More detailed analysis on dyadic interactions between editors and the role of social status are presented in [14]. Wikipedia articles have been ranked based on their controversy scores and the controversial topics have been analyzed in [15][16][17]. And finally tools for visualizing and detecting Wikipedia conflicts are developed in [18]. However, most of the empirical work on Wikipedia conflicts fail to explain the mechanistic scenarios driving the emergence and resolution of conflicts among editors.
One of the under-researched aspects of Wikipedia edit wars is the role of the "talk pages". Talk pages are forums in which editors can discuss their opinions on the content of the article and try to reach a consensus before overriding each other's edits directly on the article [19]. Even though it has been argued that the presence of such facilities would hinder the emergence of edit wars [20], there is little theoretical work to explain this observation. In other related work, the content of the talk pages is analysed using natural language processing tools to explain their functionality, however again, not much of mechanistic modeling is provided [21][22][23].
Modeling opinion dynamics in an agent-based framework has an extensive literature (for a review see [24]). A successful class of such models are known as "bounded confidence" models, which allow agents to accept opinion alterations within a tolerance threshold [25]. In more recent work, bounded confidence models are generalized to account for emotion dynamics parallel to opinion dynamics [26]. Different directions of generalization of such models have been taken to explain the user dynamics and opinion dynamics on Wikipedia [27,28]. We make use of one of this generalizations which accounts for the common product (the article) among the agents as well as the indirect interactions between agents through this common product [29].
Previous work has shown that in many cases, a consensus can be reached even if the original pool of opinions was very mixed. It is clear that the major problem in building a consensus is the presence of opinion outliers who have views very different from the majority. We devote this paper to the study of this question, that in what extent are people with extreme opinion impedimental in consensus building and what measures can be applied to decrease the chance of a frozen conflict. We also implement the process of banning editors who create conflict, a well known procedure in Wikipedia and show the effect of it on the evolution of the conflict. The results of the model are compared with empirical data generated based on Wikipedia activity logs.

Methods
In this paper we combine the editorial activity data collected from Wikipedia with a generalized version of the computational model of opinion dynamics that we have developed earlier [27,29].

Data
The data collection is carried out using Wikimedia Tool Labs https://tools.wmflabs.org, which provide live access to the Wikipedia database containing logs of all the editorial activities. For more details on data collection see [30]. Data Dumps that we generated and used in this work are available at http://wwm.phy.bme.hu/.
In obtaining statistics of the editors' activities, we explicitly excluded Wikibots (semiautomated computer codes that carry out large scale simple tasks, e.g., correcting typos or creating inter-language links). In order to do so, we excluded all the users with "bot flag" (an identifier that has to be used by bot-runners to distinguish between bot and human edits). We collected data from different language editions. These editions represent a large range of language editions in size and number of articles, as well as large variety in their local rules and conventions. However, as reported below, most of the observed statistical features are language independent. In particular we selected 13 language editions to have diversity in size, rules, and cultures. The current statistics of these language editions are reported in Table 1.
We counted the number of edits to article pages and Wikipedia talk pages and compared this ratio to its equivalent in the model. To be able to study the consensus reaching process, we consider reverts: edits that undo a previous edit. For more details see [31,32].

Computational model
We use the model that we introduced in [27] and further developed in [29]. We consider the case with fixed agent pool: N agents try to edit and eventually agree on a descriptive article about a subject. Each agent has an opinion about the article which is represented by a timevarying scalar variable in the range x i (t) 2 [0, 1]. The article can also be biased towards an opinion value at any time represented on the same scalar interval A(t) 2 [0, 1]. At each timestep, a randomly chosen agent with probability r tries to communicate with another randomly chosen agent or otherwise (with probability 1−r) tries to edit the article. The model for agentagent interaction is known as the Deffuant model for opinion dynamics in mixed populations [25]. We use a specific case of this model described below.
Two agents can only communicate if their opinion differs less than T , the tolerance parameter of the agent-agent interaction, in which case they modify their view on the subject and both adapt a joint opinion half way between their original ones: ( The original opinion models [25] don't take into consideration the article editing, the only interaction is the agent-agent communication, which drives the system into stable configuration characterized by opinion groups which do not interact with each other, and the average number of which is determined by T . Agents have different tolerance ( A ) towards the opinion reflected in the article. If they try an editing action and find that the position of the article differs less than A from their own opinion, they do not change it, instead adapt their opinion towards it by an amount proportional to a convergence parameter μ A and the opinion difference. In the opposite case, when the article is intolerable for the agent, it will modify it accordingly: ( The simulation procedure is thus the following: The system is prepared first by running the agent-agent interaction Eq (1) to create the opinion groups, then each Monte-Carlo step is composed of N actions in which a randomly chosen agent i either talks to another agent or interacts with the article, depending on the parameter r as described above. The relaxation time, in general, is defined as the average number of Monte-Carlo steps needed to reach a consensus where all agents are satisfied with the article, i.e. they are in the tolerance range of the article described by A . To cope with the enormous statistical fluctuations which prevent the calculation of a sensible mean, we measured τ by the position of the maximum of the relaxation time density function, which is equivalent to the most probable relaxation time.
If the simulations are started with random initial agent opinion distributions then the number of opinion groups will vary due the intrinsic randomness in the model. The relaxation time measured for a specific set of parameters is the average which includes qualitatively different scenarios of different number of opinion groups. This phenomenon illustrated in Fig 1 where the relaxation time distribution is shown for simulations with exactly the same parameters but with different random seeds creating different initial opinions x i (t = 0). The distribution of the relaxation time is plotted separately for different number of initial opinion groups (n). Interestingly if initially there are four opinion groups (n = 4) then the system behaves similar to either a 3 or a 5 group setup. We categorize the agents by mainstream and extremists depending on their opinion with respect to the others.
To avoid the above artifacts emerging from this sensitivity, we fix the initial conditions throughout the analysis as follows: A new parameter RoE is introduced as the ratio of the agents in the extremist groups. Naturally it is relevant only for 3 and 4 groups.
It was shown in [27] that the above defined model has three different modes of convergence which can be identified by regions in the phase diagram of (μ A , A ). We reiterate here the main findings: Regime I was observed for low values of μ A , A and was characterized by astronomical relaxation time (which prohibits its study for a reasonable system of size N > 100) and an ever lasting stable conflict in which a large mainstream group fights an endless war against two small extremist groups.
Therefore, we omit the study of this regime here. Regime II was characterized by an oscillatory behavior of the opinion of the article and the convergence was reasonably fast. Regime III showed the behavior most similar to Wikipedia, with very volatile article behavior, and in parallel, extremists gradually converted to the mainstream opinion. In this study we will focus on Regimes II and III and use the following parameter values for the simulations: Regime II is characterized by a very controversial topic ( A small) and a moderately volatile article (μ A intermediate), while Regime III is less controversial ( A is larger) and more volatile (μ A is large). We also note here, that if r is not too small then the opinion groups stay compact during the simulations [27].

Role of the extremists
In order to understand the dynamics of systems with multiple groups, we start with a two group scenario. The convergence of a two group system is always fast (see Fig 2) and can be understood by the following reasoning: In the case of two groups, the article can only be found between the two groups. If an agent from either sides is chosen to edit the article, two scenarios may occur. Either the article is outside of the tolerance of the agent, when the agent will pull the article towards it's group. In the other case the agent will move towards the article, i.e. towards the center of the opinion pool. The inter-agent talks keep the opinion groups together thus if any member moves, the whole group follows it though the distance the group makes will be n g times smaller (n g being the number of agents in the group). Thus the article makes a random walk between the groups while the groups gradually shift towards each other.
However, the article does not make a regular random walk since the step size depends on its position relative to the groups: the farther it is from the group the larger the step size it takes in the direction of it. Therefore the random walk of the article is biased towards the center.
The number of steps the article can make from the center at x c to the group at opinion x e can be calculated as: where x = |x e − x c | is the distance of the two groups (the number of steps is the integer part of n s (x) but we approximate it with a continuous variable).
If the probability of choosing an agent from this group with respect to the other group is p g (p g = 1/2 for the two group case but the general scenario will be useful later) then the article is within the tolerance level of the article if at least n s number of steps were made in the The velocity of a group is inversely proportional to the n g number of agents it contains, so: Since we only know v g (x), we will integrate the inverse of the latter to get the time as function of the group position The integral can be calculated for fixed values of μ A and A . Fig 2 compares the analytical result with the simulation data. The result of Eq (6) fits the numerical results well and gets bad only at the very end where the number of steps the article can make is small and neglecting the integer part makes an important error.
In the following, we continue with the analysis of the cases with 3 and 4 groups where we can test the effect of the extremists RoE and of the probability of the communication action r. If the middle groups are more numerous than the extreme groups, they merge fast as a two group system and the system is converted to a big middle group and two small extremist groups scenario, which is very stable with long relaxation time. On the other hand, if there are more extremists, they first merge with the middle group on their side, thus reducing the four group system to a two group system which converges fast. The difference in the relaxation times between the two cases is of orders of magnitude as indicated by the logarithmic scale in Fig 3. If the mainstream groups merge with the respective extremist, then increasing the size of the extremists will prolong the debate (see Fig 3) because the merged groups will be farther from the middle opinion.
We identify RoE c the transition point between the above two scenarios when all four groups merge together simultaneously. We will consider the movement of a mainstream group as the result of independent two group scenarios. Unfortunately, the integral in Eq (6) is impossible to evaluate for the general case, therefore we will evaluate only the initial speed of the group. Since according to Eq (5) the smaller groups accelerate faster, the transition point should be between two limiting cases: (i) the middle groups start with 0 velocity, and (ii) the middle groups start with third of the extremist group velocities. These two conditions give us two implicit equations which can be solved numerically. The two equations are:  The numerical solutions of the above equation give us the range 0.37 < RoE c < 0.43, which corresponds well to the transition point from the relaxation times (see Fig 3).
Thus, if the mainstream views are divided between two options, as in many cases normally, then there is an optimal ratio of the extremists. Of course no extremists would be the best case but as soon as there are few of them they stabilize the conflict. There comes another sharp transition when they get numerous enough that the extremist groups merge with the respective mainstream leading, thus, to a two group system. The key difference here is that the extremists take an active part in the debate.
The talk to edit ratio r also has an effect on the relaxation time. In regime II for both cases of 3 and 4 groups, there is a medium (*0.6) optimum talk/edit ratio leading to the shortest relaxation time. On the contrary, in regime III, low values of r are more favorable for short relaxation time which means a lot of editing and little discussion. In summary, in case of oscillations we need agent-agent discussion for fast convergence but in the converging extremes, the major element of the convergence process is the relatively volatile article and extremists adapting a more mainstream opinion due to a semi extremist article. For this we only need editing.
To be able to compare these results with the empirical data, we calculated two ratios: (i) the ratio of edits to talk and to Wikipedia pages and (ii) the ratio of reverts on the articles to Wikipedia page edits for each language edition. One should keep in mind that this ratio is not exactly the same as in the model. In reality, edits to the articles are finer than edits to talk pages which generally add a larger part of the text to the discussion. Hence the talk/edit ratio in the model overestimates the same quantity measures in Wikipedia data. However, we expect to see the same trends, namely that with increasing talk/edit ratio the consensus reaching time should increase. As a proxy to the consensus reaching time, we measured the ratio of reverts to all the edits in each language editions. Fig 6 shows the relationship between these two ratios. We observe that in the language editions for which the talk/edit ratio is higher, there are generally more reverts. Similar relationship has also been reported for the article label [13]. This behavior is more similar to the case of 3 opinion groups in regime III where the optimal value of r is very small.

Banning of extremists
In Wikipedia, many different tools are used to control and eventually settle conflicts, for example freezing controversial articles, or banning users temporarily or permanently, who are not obeying community conventions. The aim of these measures is mainly to calm down editors, an aspect which is not included in our model. It has been shown that the editors who are banned more often, have a higher focus on smaller number of articles and they contribute directly to the editorial wars [30].
In our model, we have shown that oscillations have a key role in reaching consensus. Here we investigate how the introduction of banning can effect the consensus reaching process. Our expectation is that by banning agents, the fluctuations (e.g. number of active agents in extremist groups) speed up the oscillations and the relaxation time will be reduced. We implement banning in our simulations as follows: If an agent in an edit process changes the article, it may get banned with a probability proportional to the square of the distance between the opinion of the agent and of the article. This reflects the fact that agents who find an article completely opposing their views are more likely to take action resulting in a banning. The probability of banning can be formulated as ( Banned agents, when selected for editing action will do nothing but get back their normal status and later will be able to edit the article. Agents are thus banned from one editing action while they still participate in the talking. In the simulation, the following agent opinion distribution was used (RoE = 0.5): • We can assume that as soon as the middle group and the article get in the vicinity of an extremist group, the members of the opposite extremist group get banned more easily, enhancing thus the oscillatory process of the article. To test this, we measure the relaxation time (τ b ) and compare it to the case without banning (τ). Fig 7 shows the logarithm of the relaxation time of the model without banning, and the ratio of the relaxation times with/ without banning (normal scale). In the presence of banning, in regime III, patches of relaxation time decrease can be observed, while a more pronounced *10% increase is observed in regimes I and II. Moreover, the increase of the relaxation time is larger where the relaxation time of the original model was already large. Therefore, it seems that this type of banning may only help to make consensus reaching faster when it was already fast without it. A possible explanation why the relaxation time increases could occur, is that when agents are banned, the debate is just delayed, because with banning it is less likely that a chosen agent is able to edit. So by banning agents, all we do is slow down the editing process. It seems, that the effect of positive feedback is too small, and the fluctuations are not large enough to compensate the loss of actions.
We found that banning users hinders consensus building in most cases. As already mentioned, in reality in some cases banning may still help the process of consensus building but for other reasons not included in the model, for example: banned editors leave the pool, or with cooler heads after the banning period, editors can be more constructive. Thus further improvements of the model could be the implementation of these effects, for example: changing A after banning. But the main message is similar to the previous section: one needs active interaction of all participants to achieve a consensus.
It is possible to verify our model by real Wikipedia data as there are two quantities which can be measured both in the model and on Wikipedia. First we show the distribution of the number of times users were banned b. The results are shown in Fig 8. The results for regime III look the same as the empirical data except for users banned once. This small deviation could be explained by the vandal editors on the Wikipedia, who deliberately delete articles or replace entire pages with nonsense. These editors are only banned once permanently, a feature (deliberately) missing from our model. Now we turn to the relationship between the number of edits versus the number of bans. The simulation results are compared with actual Wikipedia data in Fig 9. The positive correlation between the number of edits and the number of times each user is banned reflects the old proverb that "it's only those who do nothing that make no mistakes". The correlation is very similar to the empirical observations. The only deviation is for large numbers of edits, where the model overestimates the number of bans. This may be the result of the ability of human editors to learn how to avoid mistakes resulting in banning, something that agents do not do in our model.
Obviously, regime III is the most similar to the Wikipedia data, which is important since relaxation time is the smallest in this regime which allows editors to reach a consensus within a reasonable time limit. This could be one reason behind the sustainability of Wikipedia. Understanding and coping with extremism in an online collaborative environment We note here that the apparent similarity between regime III of the model and te empirical data is not a result of fitting. We have varied model parameters e.g. the exponent 2 in Eq (9) and found similar results. Of course one can distort the definition of the model such that even regime III results does not match with Wikipedia data, however the robustness of the results suggests that in Wikipedia the primary way to consensus is similar to the converging extremes behavior.

Conclusions
In this paper we investigated the role of people with extremist views in a value production environment like Wikipedia using both modeling and data analysis. Two questions were studied: the influence of the ratio of the extremists on the characteristic time to reach consensus and the effect of temporal banning.
We found that in order to achieve fast consensus in our model all participants need a constructive access to the common product in order to converge to a consensus. The worst scenario is when there is a strong mainstream group which punishes all moves of the extremist groups (a very general scenario in everyday life). We have even found that there is a phase transition like abrupt change in the relaxation time in the four opinion group system as function of the ratio of the extremists and, unexpectedly, the high relaxation time regime is for a low ratio of extremists.
We have also found out that, in general, there is an optimal ratio of talk/edit and it is never zero but especially in Regime III is very small. This is again a counterintuitive result, meaning that more discussion does not help consensus, very often it just freezes the front lines. We have shown that in real Wikipedia, the articles with higher talk/edit ratios have relatively more conflict.
Translating it to real life would mean that if debates are mediated by people with average (politically correct) views, then the debate will not lead to consensus as the extremist groups will remain frustrated forever. If the extremists have a chance to see their opinion reflected in the medium, for at least a small amount of time, they are more inclined to change their views which is necessary for the consensus. Thus, the active participation of the extremists is needed for a consensus. Furthermore, too much discussion just strengthens the position of the people in their opinion group and does not allow them to leave it. We believe these observations are well reflected in other fields of opinion difference e.g. politics.
We also included banning, a general procedure of Wikipedia, and we found it counterproductive in most cases as it only delays the consensus building. The probability distribution of the number of times a user was banned in Regime III, matches very well with the Wikipedia data suggesting that the converging extremes is the most general convergence method on Wikipedia. It means that the point of view of the article is volatile and extremists become satisfied with a temporarily biased article while they also alter their views on the subject while at the end they accept a more mainstream version of the article.