Correction
6 Jan 2016: The PLOS Computational Biology Staff (2016) Correction: Improving Collective Estimations Using Resistance to Social Influence. PLOS Computational Biology 12(1): e1004713. https://doi.org/10.1371/journal.pcbi.1004713 View correction
Figures
Abstract
Groups can make precise collective estimations in cases like the weight of an object or the number of items in a volume. However, in others tasks, for example those requiring memory or mental calculation, subjects often give estimations with large deviations from factual values. Allowing members of the group to communicate their estimations has the additional perverse effect of shifting individual estimations even closer to the biased collective estimation. Here we show that this negative effect of social interactions can be turned into a method to improve collective estimations. We first obtained a statistical model of how humans change their estimation when receiving the estimates made by other individuals. We confirmed using existing experimental data its prediction that individuals use the weighted geometric mean of private and social estimations. We then used this result and the fact that each individual uses a different value of the social weight to devise a method that extracts the subgroups resisting social influence. We found that these subgroups of individuals resisting social influence can make very large improvements in group estimations. This is in contrast to methods using the confidence that each individual declares, for which we find no improvement in group estimations. Also, our proposed method does not need to use historical data to weight individuals by performance. These results show the benefits of using the individual characteristics of the members in a group to better extract collective wisdom.
Author Summary
We modelled how humans interact, and used the models to find strategies that can make groups more accurate. Each individual in a group combines private and public information to make estimations. But when the public information is biased, social information has the effect of making groups agree even more on an incorrect collective estimation. We reasoned that not all individuals should be influenced equally by the incorrect public information. We obtained a model to understand how private and social information are combined, and used it to obtain a value of social resistance for each individual. We then used these values of social resistance obtained from the model to extract the subgroup of people resisting social influence, and found that they give an improved collective estimation. Collective intelligence is thus maximal when taking into account individuality in human behavior.
Citation: Madirolas G, de Polavieja GG (2015) Improving Collective Estimations Using Resistance to Social Influence. PLoS Comput Biol 11(11): e1004594. https://doi.org/10.1371/journal.pcbi.1004594
Editor: Aldo A. Faisal, Imperial College London, UNITED KINGDOM
Received: April 28, 2015; Accepted: October 12, 2015; Published: November 13, 2015
Copyright: © 2015 Madirolas, de Polavieja. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: Experimental data from Lorenz et al. [9] can be downloaded from http://www.pnas.org/content/108/22/9020?tab=ds.
Funding: We acknowledge funding from Spanish Ministerio de Economía y Competitividad (BFU2009-09967) to G.G.d.P including a doctoral contract to G.M. and from the Champalimaud Neuroscience Programme (Portugal). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Francis Galton was the first to experimentally demonstrate the advantages of collective estimations [1]. At a farmers’ fair, he found that the median of the independent estimations made by 784 farmers of the weight of a slaughtered ox was better than any of their individual estimations. Since then, collective estimations, computed as mean, median or geometric mean values of the group, have been shown to improve upon the estimations of most individuals of a group in several different contexts, an effect popularly known as wisdom of crowds (WOC) [2–8]. However, human crowds can also be notoriously bad at making collective estimations for many estimation tasks [7, 9]. Social interactions can have an additional negative effect in biased crowds [8, 9]. When individuals learn the estimations of the other members of the group, they typically change their own estimation towards the more common values. After social influence, the collective has thus a distribution of estimations more strongly peaked around the biased solution. This can give the collective perception of an agreement but the value agreed upon can be far from the truth [9].
We propose to turn the negative effect of social interactions to our advantage and improve collective estimations. We do so by taking into account the individuality of the members of the group. Francis Galton argued for each individual counting the same in the collective estimation [1]. But for situations in which most individuals are strongly biased, we would be in a better position with methods selecting the unbiased individuals. Of course, this can be done by finding how well each individual performs in a domain of knowledge and weight them accordingly for similar tasks [10–12].
Here we do not consider the case of access to a classification of individuals by performance. Instead we used the impact of social interactions on estimations to extract individuals in the following way. We first obtained a model of estimation in a collective and used it to measure how much each individual of the collective resists social influence. We tested the model by reanalyzing a dataset in which subjects made estimations before and after social influence [9]. This is a rich dataset that can be used as a reference to test models of social influence [13]. In these experiments subjects were asked to privately estimate the answer to six questions [9]: ‘What is the length of the border between Switzerland and Italy in kilometers?’, ‘How many rapes were officially registered in Switzerland in 2006?’, ‘How many assaults were officially registered in Switzerland in 2006?’, ‘What is the population density of Switzerland in inhabitants per square kilometer?’, ‘How many murders were officially registered in Switzerland in 2006?’ and ‘How many more inhabitants did Zurich gain in 2006?’ After their private estimation for each question, each subject could receive social interactions consisting in either receiving on a computer screen a diagram depicting the private estimations of each member of the group (‘full information’ condition) or more simply their arithmetic mean (‘aggregated information’ condition). To test that the observed effects were due to social interactions, they also used control groups that also estimated twice but without social influence in between (‘no information’ condition). The experimental data was obtained using 144 people organized in 12 groups of 12 people. Each group was asked 6 questions, 2 in each of the three conditions.
We used our model to classify each individual by their resistance to social influence as a measure of confidence on their private information. Our proposal is then to use the geometric mean of the estimations of individuals with high social resistance as a better estimator than the WOC, as we show for the dataset from reference [9].
Results
To understand the effect of social interactions in estimation, we first tested whether we could model each person in a group as an estimator of some quantity according to their private and the social information. We have already used this modeling approach for fish and ant groups choosing among a low number of options [14, 15]. Here we adapt it to the case of human data in which individuals estimate quantities that can take any positive real number and the distribution of estimations before social interactions is a log-normal [9, 16–18]. For the analysis of experimental data it is thus useful to take the logarithm of the raw estimations {xi} to obtain {yi ≡ logxi}, whose distribution is then a Gaussian. We obtained that if before social interactions this Gaussian has mean μp and standard deviation σp, , after social interactions the distribution of estimations is predicted to be of the form (see S1 Text) (1)
The predicted distribution in Eq 1 is also a Gaussian in the logarithm of estimations, but its mean and standard deviation have changed. The mean μf is at a value combination of the private mean μp and a parameter μs that summarizes the impact of social information, , with and the private and social weights with values between 0 and 1 and with . The form of μs in Eq 1 depends on the type of social interactions, and we considered two types. One in which each individual receives the estimations from all members of the group, for which we found that μs is of the form μs ≡ log(xs), where xs is the geometric mean of the estimations (see S1 Text): (2)
We also considered a second form of interaction in which each individual receives only the mean of the estimations of the group, for which the social information is the mean of estimations (see S1 Text) (3)
These two types of social information impact Eq 1 differently, with only the second of them changing the mean after social interactions. This is because in the first case, as the expected value of the geometric mean of a sample following a log-normal distribution is the median of the population [19, 20], then we have on average that xs = exp(μp) (see S1 Text), making the mean the same as before social interactions, μf = μp. In the second form of social interactions via the arithmetic mean, the expected value is (see S1 Text), making the mean to shift to higher values after interactions, . Social interactions can change not only the mean but also the standard deviation of estimations. The predicted standard deviation after social interactions in Eq 1 is reduced to , more reduced the higher the social weight , making the group to agree more around the final mean.
We first tested that the predicted distribution in Eq 1 is consistent with the experimental data in [9]. We standardized the estimations made by each individual using a z-score as z ≡ (y-μp)/σp, with y the logarithm of the estimation and μp and σp the mean and standard deviation for each time in which a group answered a question. This transformation of variables allowed us to pool together estimations from different groups and questions, each having its own mean and standard deviation. The distribution of the z-score values before social influence has mean 0 and standard deviation 1, (Fig 1A and 1B, blue). It transforms after social influence according to Eq 1 into for the ‘full information’ condition (see S1 Text). This correctly predicts that the distribution of z-score values after social interactions cannot be distinguished from a Gaussian (p>0.27; Kolmogorov-Smirnov test), does not change the mean (p = 0.14, permutation test; see Methods) and reduces the standard deviation (p<10−9, permutation test). Unless otherwise stated, in the remainder of the paper we use permutations to obtain p-values. The predicted form gives a very good fit to the data and the standard deviation of the data corresponds to a value of the social weight in Eq 1 of (Fig 1A, red).
(A) Probability distribution of estimations before (no info, blue) and after (full info, red) receiving the estimations made by other members of the group. Estimations are pooled from 24 different experiments obtained using different groups and questions, and are plotted together using a z-score, z ≡ (log(x)-μp)/σp, with x the estimation and μp and σp the mean and standard deviation before social interactions for each experiment. Points are experimental frequencies sampled at intervals of width 0.25 and solid line is a Gaussian fit. Shadowed surface is the area in which 95 per cent of the experiments are expected by the Gaussian fit. The statistical prediction is that after social interactions the distribution of answers is also a Gaussian in the logarithmic domain with the same mean and smaller standard deviation. (B) Same as (A) but before (no info, blue) and after (aggregated info, red) giving subjects the mean of the estimation of all subjects. The statistical prediction is that after social interactions the distribution of answers is also a Gaussian in the logarithmic domain with higher mean and smaller standard deviation. (C) Real vs predicted estimations after social interactions from Eq 4 as using . Different colors correspond to the six estimation tasks. (D) Distribution of experimental social weights with Gaussian kernel smoothing (see Methods). Data taken from Lorenz et al. [9]
For the ‘aggregated information’ condition, the Gaussian distribution in Eq 1 for the z-score values is after social interactions of the form (see S1 Text). The value of the final mean depends on the standard deviation of estimations before the interaction, σp, that is different for each of the 24 experiments [9] in which each of the 12 groups answered two questions in the ‘aggregated information’ condition. Using for each experiment the value of σp before social interactions and the value of for the same group but in the ‘full information’ condition, we can predict the shift in mean and the reduction of standard deviation for the 24 experimental cases (S1 Fig). However, a simpler analysis can be made neglecting the variability of values in σp across the 24 experiments, and instead pool all the data and consider the prediction only using the mean value as with = 1.39, and from the ‘full information’ condition. The predicted shift in the mean, , and the reduction in standard deviation, , correspond well with the experimental data (Fig 1B, red) and with the more complete prediction using the sum of 24 Gaussians predicted for each experiment (S1 Fig). It correctly predicts a shift of the mean to higher values (p<10−6) and a reduced standard deviation (p<10−6) that was not found to be different to the one in the ‘full information’ condition (p = 0.45). An alternative Bayesian test [21] shows similar results for the problems studied here (see Methods and S1 Table for a summary of permutations and Bayesian significance tests). In addition, this test obtains when two quantities are likely taking the same value and not only when they are not found to be different, as in the case of the standard deviation in the ‘aggregated information’ and ‘full information’ conditions (S1 Table). In the ‘no information’ condition, subjects repeat the estimation with no social interactions in between and we found no significant change in the parameters of the distribution of estimations (S2 Fig, p>0.5; see also Bayesian test in S1 Table). This shows that the effects seen after social interactions are due to the interaction and not to a repetition of the estimation.
Once we tested the close correspondence between the statistical model in Eq 1 and the experimental data, we considered a simple model for an individual that is consistent with the statistical predictions. Specifically, an individual that privately estimates x1 and, upon reception of the social information, gives a new estimation x2 related to x1 through a linear combination in the logarithmic domain, (4) with {y1,2 ≡ logx1,2}, is consistent with the statistics in Eq 1. This implies that the second estimation can be predicted from the first estimation and the social information as , which is found to be a good approximation for the data with (Fig 1C). A more common rule used in the modelling of social influence in humans is the linear combination rule [13, 22–24], but Eq 4 is a linear combination in the logarithmic domain or, equivalently, a weighted geometric mean between private and social information, (5)
So far we have assumed that each individual uses the same value of the social weight . However, there might be individual differences, with some individuals less influenced by social information. Using and Eq 4 we can obtain a different value of the social weight for each individual as (6)
The distribution across the group of values of the social weight in Eq 6 shows a striking structure of individual differences (Fig 1D). Some individuals resist social influence (peak at in Fig 1D), others shift almost completely to the social information (peak at ), others combine private and social information (values between 0 and 1), and even some shift to values farther from the private value than the social value () or to values in a direction opposite to the social value ().
We took advantage of the individuality and extracted the geometric mean of those individuals that resist social influence. To gain intuition on how to perform this extraction we considered the following exploration of the data. We obtained the joint density of social weights and private estimations y = log(x1) for the question ‘What is the length of the Swiss/Italian border?’ (Fig 2A). To obtain different levels of resolution, we used the following Gaussian smoothing of the data [25] (7) with and yi = log(x1,i) the social weight and the private estimation of individual i, respectively, and with and the sample standard deviation of each variable. We varied the resolution coefficient while keeping γy at its optimal value of γy = 6 [25] to see whether there is a consistent tendency for individuals with different social weights to give different estimations (Fig 2A). At the lowest resolution considered, , there is a clear tendency of individuals with lower social weight to give higher estimations of the border length between Switzerland and Italy (Fig 2A, ). At resolution 2, 3 and 4 the density splits into two peaks, one at high and another at low (Fig 2A, 2,3,4). It is thus clear that for this question the individuals with lower social weight tend to give higher estimations.
(A) Joint probability density of social weights and estimations y = log(x1) and computed by Gaussian smoothing, Eq 7, of data (one black dot per individual). Smoothing from lowest resolution in the direction of the social weight ( = 6) to highest resolution ( = 2). (B) Geometric mean of estimations for groups containing individuals with social weight . At low the groups are formed by individuals resisting social influence. Blue dots: Groups with prediction not significantly different to wisdom of the crowd (WOC). Green dots: groups significantly different from WOC at p<0.05. Red dots: p<0.01. Value labelled ‘resist 1’ computed from individuals with low social weights and contributing more the values of with higher significance (Eq 8). Value labelled ‘resist 2’ computed as ‘resist 1’ but not weighting the different differently depending on significance levels. Line labelled ‘resist 3’ corresponds to the value of with highest significance. (C) Two clusters in the space of estimations and social weights obtained using Gaussian mixtures [26]. White ellipses delimit the area that contains 95% of the probability density for each of the bivariate Gaussians [27]. (D) Visual summary of the relative errors made by WOC, the three variants of the method in (B) and the center of the clusters obtained at low social weight at four levels of resolution in (C). Data taken from Lorenz et al. [9].
We then extracted the individuals with lowest social weight. A simple method consists in extracting all individuals with a social weight below the value that gives a result significantly different to WOC (Fig 2B). Specifically, we started from the complete group and its geometric mean as the WOC value. For this case, the WOC value is 302 km (Fig 2B). We then eliminated individuals one by one from highest to lowest values of the social weight keeping those with , with a decreasing positive real number. With the remaining individuals, we computed the geometric mean. For in the interval between 0.1 and 0.5 of individuals with high resistance, the geometric mean increases to values close to 800 km. At the lowest values of there is a drop in the geometric mean, but the number of individuals is also low. To isolate the relevant individuals, we found which values of give a geometric mean significantly different from the WOC (Fig 2B, green dots for p<0.05 and red dots for p<0.01). The significant values of are in the interval from 0.06 to 0.45, which correspond to groups whose geometric mean lies between 816 and 464 km, respectively. We then tested that we obtain similar estimations using the complete interval of significant values of or only the value of giving the highest significance. Specifically, for the complete interval of significant we used the following measure that weighted more the values of with higher significance as (8) with the geometric mean of the estimations of individuals with social weight , if the p-value obeys and otherwise, and only counting those groups with sufficiently low social weight, . The prediction obtained in this way is 714 km, that deviates only -2.7% from the true value of 734 km while the WOC value of 302 km deviates -59% (Fig 2B, ‘resist 1’, ‘truth’ and ‘WOC’). An alternative to Eq 8 would also use the values of giving significance but weighted all of them equally, giving 689 km, -6.2% off the true value (Fig 2B, ‘resist 2’). Another variant would only take into account a single value of with the highest significance (p = 0.0002) that corresponds to . This gives the prediction of 780 km, 6.3% off the true value (Fig 2B, ‘resist 3’). The three variants give very similar predictions and a large improvement over WOC.
We also used a second class of methods based on the finding that resisting individuals can form peaks in the joint distribution of estimations and social weight (Fig 2A). Methods using the peaks will in general use less individuals but should be valuable when the peaks are clear in the distribution, that is, when they are sharp and separated from other peaks. Specifically, we used clustering by Gaussian mixtures [26]. The advantage of this method is that, although it depends on the distribution and therefore on the value of the resolution , it is very robust to changes in its value. For the question about the length of the Swiss/Italian border, we obtained that the geometric mean of the cluster of people with low social weight is 422, 481, 512 and 491 km for = 2, 3, 4 and 6, respectively (Fig 2C). In particular, it is not necessary that the value chosen for the clustering corresponds with a distribution showing peaks. For example, the distribution with = 6 does not show peaks and it is clustered into approximately the same two clusters than the distribution with = 3 that shows two clear peaks. The values obtained are -42%, -34%, -30% and -33% off the true value of 734 km. The cluster at high social weight correspond to individuals with larger errors (-69%, -67%, -71% and -67% for = 2, 3, 4 and 6, respectively). WOC is typically a value between the ones at low and at high social weights, here 302, -59% off the true value.
So far we have seen that using the individuals with lowest social weight we can estimate ‘What is the Swiss/Italian border length?’ better than using WOC. The results were robust under changes in the method to extract the individuals with low social weights, with a total of 7 variants of the methods used improving over WOC (Fig 2D). We then applied the same methods to the remaining 5 questions from the experiments in [9]. We found a subpopulation with a significant resistance to social influence in 3 of the remaining questions (Fig 3 and Table 1 for a summary; see S4 Fig for the other two questions).
Analysis as in Fig 2B and 2C but for the questions (A, B) ‘How many rapes were officially registered in Switzerland in 2006?’, (C, D) ‘How many assaults were officially registered in Switzerland in 2006?’, and (E, F) ‘What is the population density of Switzerland in inhabitants per square kilometer?’ See S3 Fig for densities in (D) and (F) without ellipses. Data taken from Lorenz et al. [9].
resist 1 computed from individuals with low social weights and contributing more the values of with higher significance (Eq 8). resist 2 computed as ‘resist 1’ but not weighting the different differently depending on significance levels. resist 3 corresponds to the value of with highest significance. = 6, 4, 3, 2 give the central values of the peaks at low social weights obtained from a Gaussian mixture at a resolution in the direction of social weight obtained introducing the values of in Eq 7. Border, ‘What is length of the Swiss/Italian border?’ Rapes, ‘How many rapes were officially registered in Switzerland in 2006?’ Assaults, ‘How many assaults were officially registered in Switzerland in 2006?’ Population, ‘What is the population density of Switzerland in inhabitants per square kilometer?’
For the question of ‘Number of rapes in 2006 in Switzerland’ the geometric mean of individuals of low social weight as measured by Eq 8 and its two variants gives the same value as there is a single significative group at a value of 624, much larger than the WOC result of 257 (Fig 3A, ‘resist 1,2,3’). This corresponds to a much smaller error (-2.3%) than the WOC (-60%) respect to the truth at 639. The distribution of estimations does not show a structure of two peaks separated at low and high social weight (Fig 3B, = 3,4,6) and at high resolution there are too many peaks with very few individuals each (Fig 3B, = 2) so a method based on peaks is not appropriate for this question.
For the ‘Number of assaults in 2006 in Switzerland’, the geometric mean in Eq 8 and the two variants considered have a large deviation from the WOC value of 3685 to 6654, 6313 and 7557, respectively (Fig 3C, ‘resist1’,’resist 2’,’resist 3’). They correspond to errors of -28%, -32% and -18%, respectively, much lower than the -60% error of WOC. The clustering method obtains the same value of 7699 for = 3, 4 and 6 (Fig 3D, = 3,4,6) and for = 2 the resolution is too high and reveals at least four peaks with very few individuals per peak (Fig 3D, = 2). For = 3,4, and 6 the error is -17% of the true value 9272 compared to the -60% error of the WOC of 3685.
For the question about the ‘Population density of Switzerland’ the geometric mean in Eq 8 does not find a subpopulation resisting social influence with estimations significantly different to WOC (Fig 3E). The clustering method finds for = 2,3,4 and 6 the values 174, 177, 177 and 171, respectively (Fig 3F, = 2,3,4,6). Compared to the true value of 184, these values are -5.7%, -4.0%, -4.0% and -7.2% off the true value of 184 while the WOC value of 115 is -38% off.
Our analysis shows that estimation is improved when there is a subpopulation significantly resisting social influence. The seven variants of the methods improve upon WOC and in many cases the improvement is very large (Table 1). The success of the method rests in the correlation between resistance to social influence and closeness to the true value seen in the data. It is also interesting to consider some properties of the resisting individuals. The proportion of these individuals is 25±13% using the methods based on Eq 8 and 10±3% for the methods based on the peaks of the distribution. The individuals that resist social influence are not the same in all questions. We only find a significant overlap between questions 1 and 2 (S5A Fig, p<0.05).
Resistance to social information may be viewed as a behavioral measure of confidence, and the estimation of those resisting social influence as ‘wisdom of the confident’. Its success is not a trivial result as other measures of confidence like declared confidence in a scale from 1 to 6 does not improve accuracy [28–31]. We thus decided to compare why the two measures give different results. We found a significant but very low correlation between resistance to social information and declared confidence (S5B Fig, p<0.001, R2 = 0.03). While there are approximately equal numbers of resisting and non-resisting individuals (Fig 1D), most of the population declares low values of confidence, even the majority of those resisting social influence (S5C Fig, triangles). Individuals declaring high values of confidence (S5D Fig, triangles), in general resist social influence more than those with low values, but a relevant proportion does not resist social influence. The two measures are correlated but are very different and it is then unsurprising than a method like the one proposed here for social resistance does not work for declared confidence (S6 Fig).
Discussion
We have here proposed to extract information from the collective using those individuals resisting social influence. The methods proposed extract the information a collective considers of high private quality. We obtained better collective estimations than the ‘wisdom of crowds’ [1–9] using the data from [9], especially for cases in which the crowd shows a very large bias. The methods work because resistance to social influence correlates with closeness to the true value. The correlation does not need to be very strong, that is, we do not need experts [10–12]. Instead, we use the geometric mean of those individuals that get influenced less by social information and this group can still show a large standard deviation.
We used two types of methods. One based on Eq 8, taking all individuals below a value of social weight that give a result different from WOC. This method gave predictions very close to true values for those cases in which the joint distribution of estimations and social weight does not show a complex structure at low social weights. When this method does not give significant results, one can resort to a method based on clustering in the space defined by estimations and social weights. This second type of methods takes into account less individuals, but we found they improve upon WOC. The two methods together can be used to understand the relevant subjects in the estimation. For example, Eq 8 does not give significant results for the question on the ‘Population density of Switzerland’ (Fig 3E). Inspection of the density shows that while there is a strong peak at low social weight with an estimation very different from WOC (Fig 3F), there are individuals giving much lower estimations and thus making the geometric mean of individuals with low social weight not different from WOC.
Our proposal makes use of individuality to improve upon WOC. It is interesting to speculate what type of individuality is most compatible with our results. One type of individuality would simply be that all individuals use a similar procedure to answer a question but their levels of noise are different. One way to model this would be to extend our models to incorporate that all individuals are most likely to give the correct answer but they have different levels of noise (S1 Text). This model gives very poor predictions (S1 Text). The reason is that the data seems more compatible with different subgroups of people with different biases from the truth, for example the low and high peaks in the joint density in Fig 2A. This can be modelled in that the most probable estimation is shifted away from the true value with different biases in different individuals. As biases are defined respect to truth, this extension of the models would not be predictive. Instead, we propose the methods in the main text, by which we extract the subgroup of individuals of low social weight as the more accurate ones on average.
The idea that different individuals or subgroups of individuals have different biases is compatible with the existence in the population of different procedures to solve a problem, each of them with a different bias. According to this view, a possible origin of the data for the question about the Swiss/Italian border as an example could be the following. This question might be answered estimating the approximate length of a straight line separating the two countries, which is 288 km as measured from a map in http://www.freemaptools.com/measure-distance.htm. Interestingly, the cluster of individuals with highest social weight is characterized by an estimation of 216±157 km compatible with these very low values. A procedure more sophisticated than simply the length of a straight line consists in using the shape of the border. Another procedure is to use memorized data to retrieve its value. The cluster at low social weight is characterized by an estimation of 512 ±269 km and the geometric mean at low social weights by values in the interval 650–800 km, compatible with these more sophisticated procedures. This idea of different procedures might also explain the different susceptibilities to social information. Those individuals using the shape of the Swiss/Italian border would in general not consider as very important social information with values so much lower than their estimations. This is because these values would be incompatible with the shape, for example values closer to a straight line. In contrast, individuals using a straight line approach might be willing to consider higher values, as they might have only taken this approach as a very rough approximation they could make because they had difficulties finding how to estimate the full shape. All individuals might declare low confidence levels as they can be very noisy within their approach, but they might still consider differently values more compatible with other approaches.
A second and complementary explanation of individuality is that individuals have different levels of expertise on the subject or even in general exercises of estimation. This level of expertise is probably not high enough for the individuals to declare it, but it would be enough to act upon it when confronted with social influence.
The methods proposed to improve upon WOC do not correspond to a common situation in which humans interact naturally. Instead, it is a protocol that can be used to extract high quality information in human collectives even if it is present only in a minority of the group. Its value relies on improving upon WOC by eliminating the people that are not confident in their private estimations. And using how much each individual is influenced by others as a measure of confidence seems to extract the correct individuals, unlike methods based on declared confidence [28–31]. Our results point to measures of confidence not based on declaration as a means to gather high quality private information in a group. Response time, perseverance or pay-offs in decision systems might be implementations to test experimentally. An open problem is in which circumstances social influence or these other measures of confidence can be used by humans to improve individual and collective decisions in naturalistic settings.
Methods
Experimental data from Lorenz et al. [9] can be downloaded from http://www.pnas.org/content/108/22/9020?tab=ds. In those experiments subjects were asked to estimate five consecutive times for each of six questions described in the main text.
Smoothing of distributions
The distributions were calculated using Gaussian kernel smoothing [25]. The 1D version of Gaussian kernel smoothing was applied for social weights in Fig 1D. (9) with the values of the social weights obtained from experiments using Eq 6, the length of the sample and the bandwidth with the standard deviation of the sample and γ the resolution coefficient. We set the resolution coefficient to half its optimal value [25], , a value that allows the visualization of the main structure of the distribution. We were interested in the interval [0,1] and did not then consider points outside (-1,2) in our calculations of the bandwidth, avoiding tail effects. The 2D case of Gaussian kernel smoothing is described in the main text, Eq 7.
Significance tests used for the difference of means or variances
A complete list of significance tests can be found in S1 Table. In the main text, unless otherwise stated, we computed p-values explicitly without assumptions about the data as the probability that the experimental result is obtained at random. For example, to find whether two distributions have a significantly different value of some parameter θ (in our case, the mean or the variance), we performed a permutations method. We mixed the two samples and randomly divided the resulting set into two subsets. Then, we computed the sample value of the parameter in each of the subsets and extracted the difference d ≡ |θ1-θ2|. We repeated this process 106 times, obtaining a distribution of differences d. The significance p is the proportion of d values bigger than the difference of the parameters between the two original samples.
To find whether the group of individuals with in Figs 2 and 3 has geometric mean significantly different from WOC, we used the following procedure. Each corresponds to a subgroup of individuals. We obtained 105 random sets of estimations from the whole crowd and computed the geometric mean of each set, g. The significance of is the proportion of values of g at least as far to the wisdom of the crowd (geometric mean) as .
Significance test used for the method using the distributions
To divide the region of maximum density into two clusters, we performed an Expectation Maximization (EM) algorithm to obtain a mixture of two Gaussians [26]. More specifically, for each value of we selected those individuals whose social weight and estimation lied in the zone of maximum probability, defined as that where the probability in Eq 7 is at least equal than half of the maximum. Then an EM algorithm was applied to the selected data points to find the maximum likelihood estimates of the parameters of a Gaussian mixture with two components.
Significance test of whether two questions share the same resisting individuals
To find whether two questions shared a significant number of individuals with low , we used the exact expression for the probability that two samples from a finite population have a certain number of elements in common (see S1 Text).
Supporting Information
S1 Fig. Distribution of estimations before and after receiving the mean estimation for each experiment.
Same analysis as in Fig 1B, but for each of the 24 experiments (A) and the sum of the 24 Gaussians (B) before (blue) and after (red) receiving the mean value of the estimations. Points are experimental frequencies at intervals of width 1 (A) and 0.25 (B). Shadowed surface is the area where the 95 per cent experiments are expected given the theoretical fit. Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s001
(TIFF)
S2 Fig. Distributions of estimations without interactions.
As Fig 1A and 1B in main text, probability distribution of z-score estimating twice without interactions in between (first: blue, second: red). Points are experimental frequencies at intervals of width 0.25. Solid line is a Gaussian fit. Shadowed surface is the area where the 95 per cent experiments are expected given the theoretical fit. Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s002
(TIFF)
S3 Fig. Joint probability distributions without ellipses.
(A) ‘How many assaults were officially registered in Switzerland in 2006?’, and (B) ‘What is the population density of Switzerland in inhabitants per square kilometer?’ Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s003
(TIFF)
S4 Fig. Collective estimations of those resisting social influence for two questions not analyzed in main text.
Same as in Fig 3 of main text but for the two remaining experimental questions: (A, B) ‘How many murders were officially registered in Switzerland in 2006?’, and’ (C, D) ‘How many more inhabitants did Zurich gain in 2006? No significant subgroup is found using the method of the geometric mean value (A, C). Using the joint distribution (B, D), we do not find a clear separation into a peak for a group of individuals resisting social influence (ws<0.5)and a peak for individuals not resisting the influence (ws>0.5) Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s004
(TIFF)
S5 Fig. Characterization of individuals resisting social information.
(A) Significance of the coincidence of resisting individuals (ws<0.5) for every pair of the 4 questions analyzed in main text. There is only a significant overlap of individuals resisting influence for questions 1 and 2 (‘What is the length of the border between Switzerland and Italy in kilometers?’ and ‘How many rapes were officially registered in Switzerland in 2006?’). (B) Correlation of social weight (only for |ws|≤1) and declared confidence is significant (p<0.0003) but weak (R2 = 0.03) respect to linear regression (straight line). Triangles at mean social weight for each confidence value. In colors the joint distribution of social weights and confidence values, showing large dispersion from regression line. (C) Probability of the declaration of confidence for individuals resisting (red triangles) and not resisting (blue circles) social influence. (D) Probability that an individual has a social weight when they declare a low (blue circles) and high confidence (red triangles). Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s005
(TIFF)
S6 Fig. Collective estimations for individuals declaring confidence.
We used a method analogous to that of Figs 2B, 3A, 3C and 3E in main text but for declared confidence instead of social weight. Geometric mean of individuals declaring a value of confidence (conf) in their estimation higher or equal than an integer . No value is found to be significant (pmin>0.08, ). The experimental questions are: (A) ‘What is the length of the Swiss/Italian border?’, (B) ‘How many rapes were officially registered in Switzerland in 2006?’, (C) ‘How many assaults were officially registered in Switzerland in 2006?’, (D) ‘What is the population density of Switzerland in inhabitants per square kilometer?’, (E) ‘How many murders were officially registered in Switzerland in 2006?’, and (F) ‘How many more inhabitants did Zurich gain in 2006?’ Data taken from Lorenz et al. [9]
https://doi.org/10.1371/journal.pcbi.1004594.s006
(TIFF)
S1 Table. Kolmogorov-Smirnov, permutations and Bayesian significance tests.
Summary of the results of the significance tests in main text. Kolmogorov-Smirnov tests were run with Matlab to check normality. Permutations method were performed as explained in the main text (Methods) to test for the equality of means and equality of variances. For the no difference of means, two sample t-tests were run with Matlab to check compatibility with permutations method. For the no difference of variances, two sample F-tests were run with Matlab with the same purpose. No discrepancies in the acceptance/rejection of the null hypothesis were found in any of the no difference tests. Bayesian tests are based on the likelihood of the experimental data given a certain value of the parameters. More specifically, we follow the reference [21] in the main text: Kruschke JK (2013) Bayesian estimation supersedes the t test. J. Exp. Psychol. Gen. 142(2), 573. The method generates a probability distribution of the most credible values of the parameters (or their difference for two distribution comparison). If a value falls outside the 95% highest density interval (HDI) then it is not considered to be a credible value of the parameter or difference of parameters. For the distribution to be considered credibly normal, a value for the degrees of freedom parameter of log10(v) > log10(30)≈1.48 is required. Only one discrepancy was found with the null hypothesis methods, and the Bayesian test cannot accept the normality of the estimation distribution generated in the second trial of the ‘aggregated information’ condition. Although the Kolmogorov-Smirnov test did not reject the normality hypothesis, the p-value was slightly above 0.05. In the main text and in S1 Fig this poor value is explained by the fact that the distribution is better explained the sum of 24 Gaussians with very similar parameters.
https://doi.org/10.1371/journal.pcbi.1004594.s008
(DOCX)
Acknowledgments
We are grateful to Eva Kobak, Andres Laan, Zach Mainen and Alfonso Pérez-Escudero for discussions, Victòria Brugada, Maria Cano-Colino, Raúl Gil de Sagredo, Antonia Gronenberg, Robert C. Hinz, Francisco Romero-Ferrero, Ángel Carlos Román and Julián Vicente-Page for a critical reading of the manuscript, and Olga Simón for help with visual art.
Author Contributions
Conceived and designed the experiments: GM GGP. Performed the experiments: GM GGP. Analyzed the data: GM GGP. Contributed reagents/materials/analysis tools: GM GGP. Wrote the paper: GM GGP.
References
- 1. Galton F (1907) Vox populi. Nature 75, 450–451.
- 2.
Surowiecki J (2005) The Wisdom of Crowds. Random House LLC.
- 3.
Page SE (2008) The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press.
- 4.
Lee MD, Shi J (2010) The accuracy of small-group estimation and the wisdom of crowds. Proc. 32nd Annu. Conf. Cognit. Sci. Soc. 1124–1129.
- 5. Wagner C, Vinaimont T (2010) Evaluating the wisdom of crowds. Proc. Issues Inf. Syst. 11(1), 724–732.
- 6.
Easley D, Kleinberg J (2010) Networks, Crowds, and Markets. Cambridge University Press.
- 7. Krause J, Ruxton GD, Krause S (2010) Swarm intelligence in animals and humans. Trends. Ecol. Evol. 25(1), 28–34. pmid:19735961
- 8. King AJ, Cheng L, Starke SD, Myatt JP (2012) Is the true ‘wisdom of the crowd’ to copy successful individuals? Biol. Lett. 8(2), 197–200. pmid:21920956
- 9. Lorenz J, Rauhut H, Schweitzer F, Helbing D (2011) How social influence can undermine the wisdom of crowd effect. Proc. Natl. Acad. Sci. 108(22), 9020–9025. pmid:21576485
- 10.
Wolfers J, Zitzewitz E (2004) Prediction markets. NBER Work. Pap.-Nat. Bur. w10504.
- 11. Whitehill J, Wu TF, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Adv. Neur. In. 2035–2043.
- 12. Lee MD, Steyvers M, De Young M, Miller B (2012) Inferring expertise in knowledge and prediction ranking tasks. Top. Cogn. Sci. 4(1), 151–163. pmid:22253187
- 13. Mavrodiev P, Tessone CJ, Schweitzer F (2013) Quantifying the effects of social influence. Sci. Rep. 3.
- 14. Pérez-Escudero A, de Polavieja GG (2011) Collective animal behavior from Bayesian estimation and probability matching. PLoS Comput. Biol. 7(11), e1002282. pmid:22125487
- 15. Arganda S, Pérez-Escudero A, de Polavieja GG (2012) A common rule for decision making in animal collectives across species. Proc. Natl. Acad. Sci. 109(50), 20508–20513. pmid:23197836
- 16. Dehaene S (2003) The neural basis of the Weber–Fechner law: a logarithmic mental number line. Trends. Cogn. Sci. 7(4), 145–147. pmid:12691758
- 17. Longo MR, Lourenco SF (2007) Spatial attention and the mental number line: Evidence for characteristic biases and compression. Neuropsychologia 45(7), 1400–1407. pmid:17157335
- 18. Petzschner FH, Glasauer S (2011) Iterative Bayesian estimation as an explanation for range and regression effects: a study on human path integration. J. Neurosci. 31(47), 17220–17229. pmid:22114288
- 19. Parkin TB, Robinson JA (1993) Statistical evaluation of median estimators for lognormally distributed variables. Soil. Sci. Soc. Am. J. 57(2), 317–323.
- 20. Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. BioScience 51(5), 341–352.
- 21. Kruschke JK (2013) Bayesian estimation supersedes the t test. J. Exp. Psychol. Gen. 142(2), 573. pmid:22774788
- 22. DeGroot MH (1974) Reaching a consensus. J. Am. Stat. Assoc. 69(345), 118–121.
- 23. Friedkin NE, Johnsen EC (1990) Social influence and opinions. J. Math. Sociol. 15(3–4), 193–206.
- 24. Moussaïd M, Kämmer JE, Analytis PP, Neth H (2013) Social influence and the collective dynamics of opinion formation. PloS One 8(11), e78433. pmid:24223805
- 25.
Silverman BW (1986) Density Estimation for Statistics and Data Analysis. 26, 45, 76, 86–87. CRC Press.
- 26.
Peel D, MacLahlan G (2000) Finite Mixture Models. John Wiley & Sons.
- 27.
Ribeiro MI (2004) Gaussian probability density functions: Properties and error characterization. Institute for Systems and Robotics, Lisboa, Portugal. Available: http://hans.fugal.net/comps/papers/ribeiro_2004.pdf
- 28. Sniezek JA, Henry RA (1989) Accuracy and confidence in group judgment. Organ. Behav. Hum. Dec. 43(1), 1–28.
- 29. Bahrami B, Olsen K, Latham PE, Roepstorff A, Rees G, Frith CD (2010) Optimally interacting minds. Science 329(5995), 1081–1085. pmid:20798320
- 30. Koriat A (2012) When are two heads better than one and why? Science 336(6079), 360–362. pmid:22517862
- 31. Mahmoodi A, Bang D, Ahmadabadi MN, Bahrami B (2013) Learning to make collective decisions: the impact of confidence escalation. PloS One 8(12), e81195. pmid:24324677