Modelling influence and opinion evolution in online collective behaviour

Opinion evolution and judgment revision are mediated through social influence. Based on a large crowdsourced in vitro experiment (n=861), it is shown how a consensus model can be used to predict opinion evolution in online collective behaviour. It is the first time the predictive power of a quantitative model of opinion dynamics is tested against a real dataset. Unlike previous research on the topic, the model was validated on data which did not serve to calibrate it. This avoids to favor more complex models over more simple ones and prevents overfitting. The model is parametrized by the influenceability of each individual, a factor representing to what extent individuals incorporate external judgments. The prediction accuracy depends on prior knowledge on the participants' past behaviour. Several situations reflecting data availability are compared. When the data is scarce, the data from previous participants is used to predict how a new participant will behave. Judgment revision includes unpredictable variations which limit the potential for prediction. A first measure of unpredictability is proposed. The measure is based on a specific control experiment. More than two thirds of the prediction errors are found to occur due to unpredictability of the human judgment revision process rather than to model imperfection.


A.3 Discussion on the assumptions on η and η �
The only assumption used to derive equation (6) is that η and η � have the same variance and are independent for each participant. Since function f 1 i is unknown, it is not possible to directly test these assumptions. However, since pairs of replicates in the control experiment are related to the same picture, it is unlikely that the covariance between η and η � would be negative. If the covariance was positive, the quantity given in equation (6) would become a lower bound on the unpredictability threshold, as shown through equation (7). Finally, if η and η � did not satisfy the assumption of equal variance, the quantity in equation (6) would still correspond to the average variance 1 2 � E(η 2 ) + E(η �2 ) � which also represents the average intrinsic unpredictability, as seen in equation (7).

B Circumstances of the wisdom of the crowd
The wisdom of the crowds may not always occur. The present section recalls one important hypothesis underlying the wisdom of the crowds. The hypothesis is then tested against the empirical data from the study. In the context of the present study, the wisdom of the crowd corresponds to the following fact: the mean opinion is most often much closer to the true answer than the individual opinions are. Denotingx the mean of n opinions x i and T the corresponding true answer, this is formally expressed as where << stands for significantly smaller than. The wisdom of the crowd given by equation 8 does not always take place. It only occurs if the opinions x i are distributed sufficiently symmetrically around the true answer. When the distribution is largely biased above or below the true answer, equation 8 fails to hold. To understand this fact, the group of individual is split in two : i ∈ N + if Then, the distance of the mean opinion to truth rewrites as is the contribution from opinions above truth and D − = � i∈N − |x i − T | ≥ 0 is the contribution from opinions below truth. Using these notation, the average distance to truth is 1 As a consequence, the wisdom of the crowd described in equation 8 translates to Two extreme cases are possible : • Perfect wisdom of the crowd : opinions are homogeneously distributed around the true answer and D + = D − so that |x(1) − T | = 0.
• No wisdom of the crowd : opinions either totally overestimate or totally underestimate the correct answer and either D − = 0 or D We now turn to the empirical data. Only the first round is discussed here because, in the subsequent rounds, the opinions are no longer independent, a criterion required for the wisdom of the crowd to occur. Fig A displays how opinions are distributed around the true value for the gauging game (A) and the counting game (B). Both distributions fall between the two extreme cases with most opinions underestimating the true value. However, the bias is more important in the counting game which explains that the wisdom of the crowd is more prominent in the gauging game in the first round. This explains the differences between mean opinion errors and individual errors observed in Fig 5.

C Testing the linearity of the consensus model
The consensus model (1) assumes that the opinion change x i (t + 1) − x i (t) grows linearly with the distance between x i (t) and the mean opinionx(t). This assumption is tested against the alternative with γ � = 1. The numerical statistics values are reported for the opinion change between rounds 1 and 2 for the gauging game. The same conclusions hold for the counting game and for the opinion change between rounds 2 and 3. The linearity test provided in [66] applied to our data gives a statistics P = −1.4e7 with empirical variance var(P ) = 4e14 so that we fail to reject the null hypothesis γ = 1 (p-val=0.5). Fig B displays the evolution x i (t + 1) − x i (t) against the distance to the meanx(t) − x i (t) along with the result of the linear regression assuming γ = 1.

D Influenceability and personality
Is influenceability related to personality ? To answer this question, we required the participants to provide information regarding their personality, gender, highest level of education, and whether they were native English speaker. The questionnaire regarding personality comes from a piece of work by Gosling and Rentfrow [39] and was used to estimate the five general personality traits. The questionnaire page is reported in Fig F. For each of the five traits, the participants rated how well they feel in adequacy with a set of synonyms (rating s ∈ {1, . . . , 7}) and with a set of antonyms (rating a ∈ {1, . . . , 7}. This redundancy allows for testing the consistency of the answer of each participants. The participants who had a distance (8 − a) − s too far away from 0 were discarded (threshold values were found using Iglewicz and Hoaglin method based on median absolute deviation [47]). Partial Pearson's linear correlations are first reported between the individual traits measured by the questionnaire (see table A). The correlation signs are found to be consistent with the related literature on the topic [67]. This indicates that our measure of the big five factors is trustworthy. Partial correlations are then provided to link the personal traits to influenceability. As shown in Table B, none of the measured personal traits is able to explain the variability in the influenceability parameter. The only exceptions concern gender and being English native speaker, with weak level of significance (p-val ∈ [0.01, 0.05]). However, these relations are consistent neither

E.1 Confidence intervals for prediction errors
Fig C displays error bars for 95% confidence interval of the RMSEs. This figure reveals that the two methods depending on training set size do not perform significantly better than the consensus model with one couple of typical influenceabilities, even for large training set sizes. This is an argument to favor the model in which the whole population has a unique couple of influenceabilities (α(1), α(2)).

E.2 Prediction accuracy in terms of Mean Absolute Errors
Measuring prediction accuracy in terms of MAEs may appear more intuitive for comparing prediction methods. Fig D. assesses the models using an absolute linear scale, where the errors are deliberately unscaled for the counting game. The prediction methods rank equally when measured in terms of MAE or RMSE. Notice that, due to nonlinear relation between RMSE and MAE, on this alternative scale, the consensus models errors are now closer from the null model than from the unpredictability error. For comparison, recall that for the gauging game, the judgments range between 0 and 100 while they range between 0 and 500 for the counting game.