## Figures

## Abstract

Animals living in groups make movement decisions that depend, among other factors, on social interactions with other group members. Our present understanding of social rules in animal collectives is mainly based on empirical fits to observations, with less emphasis in obtaining first-principles approaches that allow their derivation. Here we show that patterns of collective decisions can be derived from the basic ability of animals to make probabilistic estimations in the presence of uncertainty. We build a decision-making model with two stages: Bayesian estimation and probabilistic matching. In the first stage, each animal makes a Bayesian estimation of which behavior is best to perform taking into account personal information about the environment and social information collected by observing the behaviors of other animals. In the probability matching stage, each animal chooses a behavior with a probability equal to the Bayesian-estimated probability that this behavior is the most appropriate one. This model derives very simple rules of interaction in animal collectives that depend only on two types of reliability parameters, one that each animal assigns to the other animals and another given by the quality of the non-social information. We test our model by obtaining theoretically a rich set of observed collective patterns of decisions in three-spined sticklebacks, *Gasterosteus aculeatus*, a shoaling fish species. The quantitative link shown between probabilistic estimation and collective rules of behavior allows a better contact with other fields such as foraging, mate selection, neurobiology and psychology, and gives predictions for experiments directly testing the relationship between estimation and collective behavior.

## Author Summary

Animals need to act on uncertain data and with limited cognitive abilities to survive. It is well known that our sensory and sensorimotor processing uses probabilistic estimation as a means to counteract these limitations. Indeed, the way animals learn, forage or select mates is well explained by probabilistic estimation. Social animals have an interesting new opportunity since the behavior of other members of the group provides a continuous flow of indirect information about the environment. This information can be used to improve their estimations of environmental factors. Here we show that this simple idea can derive basic interaction rules that animals use for decisions in social contexts. In particular, we show that the patterns of choice of *Gasterosteus aculeatus* correspond very well to probabilistic estimation using the social information. The link found between estimation and collective behavior should help to design experiments of collective behavior testing for the importance of estimation as a basic property of how brains work.

**Citation: **Pérez-Escudero A, de Polavieja GG (2011) Collective Animal Behavior from Bayesian Estimation and Probability Matching. PLoS Comput Biol 7(11):
e1002282.
https://doi.org/10.1371/journal.pcbi.1002282

**Editor: **Iain D. Couzin, Princeton University, United States of America

**Received: **April 7, 2011; **Accepted: **October 5, 2011; **Published: ** November 17, 2011

**Copyright: ** © 2011 Pérez-Escudero, de Polavieja. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Funding: **This work was funded by MICIIN (Spain) as Plan Nacional (http://www.micinn.es) and as partners of the ERASysBio+ initiative supported under the EU ERA-NET Plus scheme in FP7 (http://www.erasysbio.net/), and by Biociencia program (CAM, Spain) (http://www.madrimasd.org/). A.P-E. acknowledges a FPU fellowship from MICINN (Spain). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Animals need to make decisions without certainty in which option is best. This uncertainty is due to the ambiguity of sensory data but also to limited processing capabilities, and is an intrinsic and general property of the representation that animals can build about the world. A general way to make decisions in uncertain situations is to make probabilistic estimations [1], [2]. There is evidence that animals use probabilistic estimations, for example in the early stages of sensory perception [3]–[11], sensory-motor transformations [12]–[14], learning [15]–[17] and behaviors in an ecological context such as strategies for food patch exploitation [18]–[20] and mate selection [21], among others [13], [17], [21], [22].

An additional source of information about the environment may come from the behavior of other animals (social information) [23]–[28]. This information can have different degrees of ambiguity. In particular cases, the behavior of conspecifics directly reveals environmental characteristics (for example, food encountered by another individual informs about the quality of a food patch). Cases in which social information correlates well with the environmental characteristic of interest have been very well studied [29]–[37]. But in most cases social information is ambiguous and potentially misleading [26], [38]. In spite of this ambiguity, there is evidence that in some cases such as predator avoidance [39], [40] and mate choice [41], animals use this kind of information.

Social animals have a continuous flow of information about the environment coming from the behaviours of other animals. It is therefore possible that social animals use it at all times, making probabilistic estimations to counteract its ambiguity. If this is the case, estimation of the environment using both non-social and social information might be a major determinant of the structure of animal collectives. In order to test this hypothesis, we have developed a Bayesian decision-making model that includes both personal and social information, that naturally weights them according to their reliability in order to get a better estimate of the environment. All members of the group can then use these improved estimations to make better decisions, and collective patterns of decisions then emerge from these individuals interacting through their perceptual systems.

We show that this model derives social rules that economically explain detailed experiments of decision-making in animal groups [42], [43]. This approach should complement the empirical approach used in the study of animal groups [42]–[47], finding which mathematical functions should correspond to each experimental problem and to propose experiments relating estimation and collective motion. The Bayesian structure of our model also builds a bridge between the field of collective behavior and other fields of animal behavior, such as optimal foraging theory [18]–[22] and others [21], [22]. Further, it explicitly includes in a natural way different cognitive abilities, making more direct contact with neurobiology and psychology [3]–[10],[17].

## Results

### Estimation model

We derived a model in which each individual decides from an estimation of which behavior is best to perform. These behaviors can be to go to one of several different places, to choose among some behaviors like forage, explore or run away, or any other set of options. For clarity, here we particularize to the case of choosing the best of two spatial locations, and (see *Text S1* for more than two options). ‘Best’ may correspond to the safest, the one with highest food density or most interesting for any other reasons. We assume that each decision maker uses in the estimation of the best location both non-social and social information. Non-social information may include sensory information about the environment (i.e. shelter properties, potential predators, food items), memory of previous experiences and internal states. Social information consists of the behaviors performed by other decision-makers. Each individual estimates the probability that each location, say , is the best one, using its non-social information () and the behavior of the other individuals (),(1)where stands for ‘ is the best location’. , because there are only two locations to choose from. We can compute the probability in Eq. **1** using Bayes' theorem,(2)By simply dividing numerator and denominator by the numerator we find an interesting structure,(3)where(4)and(5)Note that does not contain any social information so it can be understood as the “non-social term” of the estimation. We can also understand as the “social term” because it contains all the social information, although is also depends on the non-social information . The non-social term is the likelihood ratio for the two options given only the non-social information. This kind of likelihood ratio is the basis of Bayesian decision-making in the absence of social information [5], [11]–[14]. Eq. **3** now tells us that this well known term interacts with the social term simply through multiplication.

We are seeking a model based on probabilistic estimation that can simultaneously give us insight into social decision-making and fit experimental data. For this reason we simplify the model by assuming that the focal individual does not make use of the correlations among the behaviour of others, but instead assumes their behaviours to be independent of each other. This is a strong hypothesis but allows us to derive simple explicit expressions with important insights. The section ‘Model including dependencies’ at the end of Results shows that this assumption gives a very good approximation to a more complete model that takes into account these correlations.

The assumption of independence translates in that the probability of a given set of behaviors is just the product of the probabilities of the individual behaviors. We apply it to the probabilities needed to compute in Eq. **5**, getting(6)where is the set of all the behaviors of the other animals at the time the focal individual chooses, , and denotes the behavior of one of them, individual . is a combinatorial term counting the number of possible decision sequences that lead to the set of behaviors , that will cancel out in the next step. Substituting Eq. **6** and the corresponding expression for into Eq. **5**, we get(7)Instead of an expression in terms of as many behaviors as individuals, it may be more useful to consider a discrete set of behavioral classes. For example, in our two-choice example, these behavioral classes may be ‘go to ’ (denoted ), ‘go to ’ () and ‘remain undecided’ (). Frequently, these behavioral classes (or simply ‘behaviors’) will be directly related to the choices, so that each behavior will consist of choosing one option. For example, behaviors and are directly related to choices and , respectively. But there may be behaviors not related to any option as the case of indecision, , or related to choices in an indirect way. These behaviors can still be informative because they may be more consistent with one of the options being better than the other (for example, indecision may increase when there is a predator, so the presence of undecided individuals may bias the decision against the place where the non-social information suggests the presence of a predator). Let us consider different behavioral classes, . We do not here consider individual differences for animals performing the same behavior (say, behavior ), so they have the same probabilities and . Thus, if for example the first individuals are performing behavior , we have that . We can then write Eq. **7** as(8)where is the number of individuals performing behavior , and(9)The term is the probability that an individual performs behavior when is the best option, over the probability that it performs the same behavior when is the best choice. The higher the more reliably behavior indicates that is better than , so we can understand as the reliability parameter of behavior . If , observing behavior indicates with complete certainty that is the best option, while for behavior gives no information. For , observing behavior favors as the best option, and more so the closer it is to 0. Note that and are not the actual probabilities of performing behavior , but estimates of these probabilities that the deciding animal uses to assess the reliability of the other decision-makers. These estimates may be ‘hard-wired’ as a result of evolutionary adaptation, but may also be subject to change due to learning.

To summarize, using Eqs. **3** and **8**, the probability that is the best choice, given both social and non-social information is(10)with in Eq. **4** and in Eq. **9**.

### Decision rule: Probability matching

We have so far only considered the perceptual stage of decision-making, in which the deciding individual estimates the probability that each behavior is the best one. Now it must decide according to this estimation. A simple decision rule would be to go to when is above a certain threshold. This rule maximizes the amount of correct choices when the probabilities do not change [48], but is not consistent with the experimental data considered in this paper. Applying this deterministic rule strictly, without any noise sources, one would obtain that all individuals behave exactly in the same way when facing the same stimuli, but in the experiments considered here this is not the case. Instead, we used a different decision rule called probability matching, that has been experimentally observed in many species, from insects to humans [49]–[55]. According to this rule an individual chooses each option with a probability that is equal to the probability that it is the best choice. Therefore, in our case the probability of going to (), is the same as the estimated probability that is the best location (), so(11)Probability matching does not maximize the amount of right choices if we assume that the probabilities stay always the same, but in many circumstances it can be the optimal behavior, such as when there is competition for resources [56], [57], when the estimated probabilities are expected to change due to learning [53], [55], or for other reasons [53], [58].

Finally, using Eqs. 10 and 11 we have that the probability that the deciding individual goes to is(12)The assumption of probability matching has the advantage that the final expression for the decision in Eq. **12** is identical to the one given by Bayesian estimation in Eq. **10**, with no extra parameters. Alternative decision rules could be noisy versions of the threshold rule, but at the price of adding at least one extra parameter to describe the noise. Also, decision rules might not depend on estimation alone, but also on other factors or constraints. These more complicated rules fall beyond the scope of this paper.

In the following sections, we particularize Eq. **12** to different experimental settings to test its results against existing rich experimental data sets that have previously been fitted to different mathematical expressions [42], [43].

### Symmetric set-up

We first considered the simple case of two identical equidistant sites, and , Fig. 1*A*. For a set-up made symmetric by experimental design there is no true best option. But deciding individuals must act, like for any other case, using only their incomplete sensory data to make the best possible decision. Even when non-social sensory data indicates no relevant difference between the two sites, the social information can bias the estimation of the best option to one of the two sites.

(*A*) Schematic diagram of individuals choosing between two identical locations and when there are already () individuals at (). (*B*) Probability of going to as a function of the difference between the number of individuals at and , Eq. **17**. (*C*) Sequential application of the behavioural rule in Eq. **17** with , for the simple case of a group of two individuals (bottom). The width of the arrows is proportional to the probability of each transition. The 3 possible final configurations, with different proportion of individuals going to (0, 0.5 and 1), have different probabilities of taking place, with both fish together at or being more probable than a group split (top).

Using Eq. **12** and that the three possible behaviors are ‘go to ’ (), ‘go to ’ () and ‘remain undecided’ (), we obtain(13)where and are the number of individuals that have already chosen and , respectively, and is the size of the group containing our focal individual and other animals. As the set-up is symmetric, the sensory information available to the deciding individual is the same for both options so and then according to Eq. **4**. Also, since indecision is not related to any particular choice, symmetry imposes , so indecision is not informative, (Eq. **9**). For the other two behaviors, going to () and going to (), Eq. **9** gives(14) and are the estimated probabilities of making the right choice, that is, going to when is the best option, or going to when is the best option. Since in this case the sensory information is identical for both options, the probability of making the correct choice must be the same for both options, . An analogous argument holds for the incorrect choices, , giving(15)In cases in which , we find it convenient to express reliability more generally as(16)which is the ratio of the probability of making the correct choice and the probability of making a mistake, for both behaviors. Using this definition and given that , Eq. **13** reduces to(17)with the variable . Eq. **17** describes a sigmoidal function that is steeper the higher the higher the value of (Fig. 1*B*). Therefore, for very reliable behaviors (high , meaning individuals that are much more likely to make correct choices than erroneous ones), grows fast with and the deciding individual then goes to with high probability when taking into account the behaviors of only very few individuals.

The behavior of the group is obtained by applying the decision rule in Eq. **17** sequentially to each individual (see *Methods*). After each behavioural choice, we update the number of individuals at and , using the new and for the next deciding individual (Fig. 1*C*, bottom). Repeating this procedure for all the individuals in the group, we can compute the probability for each possible final outcome of the experiment (Fig. 1*C*, top).

The relevance of the symmetric case is that the model has a single parameter and a single variable, enabling a powerful comparison against experimental data. We tested the model using an existing rich data set of collective decisions in three-spined sticklebacks [42], a shoaling fish species. This data set was obtained using a group of fish choosing between two identical refugia, one on their left and another one on their right (Fig. 2*A*), equivalent to locations and in the model (Fig. 1*A*). At the start of the experiment, () replica fish made of resin were moved along lines on the left (right) towards the refugia (Fig. 2*A*). The experimental results consisted on the statistics of collective decisions between the two refugia for 19 different cases using different group sizes = 2, 4 or 8 and different numbers of replicas going left and right, = {1∶1, 2∶2, 0∶1, 1∶2, 0∶2, 1∶3, 0∶3} (Fig. 2*B*, blue histograms). To compare against these experimental data, we calculated the probability of finding a collective pattern applying the individual behavioural rule in Eq. **17** iteratively over each fish for the 19 experimental settings. We found a good fit of the model to the experimental data using for the 19 graphs the same value (Fig. 2*B*, red line). The model is robust, with good fits in the interval (Fig. 3, red line).

(*A*) Schematic diagram of symmetric set-up with a group of sticklebacks (in black) choosing between two identical refugia and with different numbers of replica fish (in red) going to and . (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histogram) and results from the model in Eq. **17** in the main text (red line using reliability parameter ; red region: 95% confidence interval; green line with ). Different graphs correspond to different stickleback group sizes and different number of replicas going to and .

**Red:** Symmetric case (plots in Fig. 2). **Green:** Case with different replicas at each side (plots in Fig. 6. The ratios are re-optimized for each value of ). **Blue:** Asymmetric set-up with predator on one side (plots in Fig. 7 ; Parameter is re-optimized for each value of ). (*A*) Root mean squared error between the data and the probabilities predicted by the model. Grey dashed line shows the mean RMSE for the three cases. The absolute values for each case depend on the shape of the data and are not comparable, only the trends and the position of the minima should be compared. (*B*) Logarithm of the probability that the data come from the model. The height of each curve depends on the number of data for each experiment, only the trend and the position of the maxima should be compared. Grey dashed line shows the sum of the three coloured lines, but shifted by 1000 so that it fits on the scale. The peak of this global probability indicates the value of that best fits the three datasets ().

Despite the simplicity of the behavioral rule in Eq. **17**, it reproduces the experimental results, including the dependence on the total number of fish , even though the rule is independent of this parameter, except for determining the range of possible values of . The dependence of the final distributions on emerges from the application of the rule to the individuals in the group, as is illustrated in Fig. 4. Each small box represents a state of the system in which fish have already decided to go to and , respectively. The lines connecting each box with another two boxes on top represent the decision made by the next deciding individual, that takes the system to the next state. The width of the lines is proportional to the probability of the decision. As more individuals decide, the central states become less likely simply because they accumulate more unlikely decisions. Therefore, the U-shape or J-shape becomes more pronounced for larger groups, even though the individual decision rule in Eq. **17** is independent of the total number of individuals .

**Bottom:** Decision-making process according to Eq. **17** (with ). Time runs from bottom to top. Each box represents a state with a given number of fish having already decided or (). Each state can lead to another two states in the following time step, depending on whether the focal fish decides to go to or . The width of the lines connecting states is proportional to the probability of that transition (equal to the probability of the prior state times the probability of the focal fish making the decision that leads to the later one). **Top:** Probability of each state after 8 fish have made their decisions. (*A*) Case with no replicas, in which the final outcome is U-shaped. (*B*) Case with one replica going to (so initial state is already 0∶1), in which the final outcome is J-shaped.

Group decision-making in three-spined sticklebacks shows a single type of distribution in which probability is minimum at the center and increases monotonically towards the edges, denoted here as U-shaped distribution (or J-shaped when there is a bias to one of the two options). However, the model in Eq. **17** also gives two other types of distributions, Fig. 5*A*. For non-social behavior () the histogram is bell-shaped due to combinatorial effects. However, a bell-shape is also compatible with social animals for a certain range of and group size (white region on the bottom-left of Fig. 5*A*). For higher values of , the histograms are M-shaped, with two maxima located between the center and the sides (region coloured in black and blue in Fig. 5*A*). However, the M shape becomes clear only with enough number of bins because the drop in probability near the edge or at the center of the distribution disappears when binning is too coarse, producing a bell-shaped or U-shaped histogram, Fig. 5*B*. This is an important practical issue, because the amount of data that can be collected rarely allows for more than 5 bins. The colorscale in Fig. 5*A* reflects the number of bins needed to observe the M shape (black has been reserved for exactly 5 bins). For high values of , the histograms are U-shaped (white region on the top of Fig. 5*A*). Also, all the M-region above the black zone becomes of type U when the binning is too coarse.

(*A*) Shape of histogram of final configurations as a function of and the group size. Bell-shaped: white region on the bottom-left. M-shaped: region coloured in black and blue. As the observation of the M shape depends on the number of bins, the colorscale reflects the number of bins needed to observe the M shape (black has been reserved for exactly 5 bins). U-shape: white region on the top. Also, all the M-region above the black zone becomes U when the binning is too coarse. There is also a small region below the black zone where the M shape becomes a bell shape when the binning is too coarse. (*B*) Dependence of the apparent shape on the number of bins: Top, 80 bins. Middle, 10 bins. Bottom, 5 bins. On the left, a probability that seems U-shaped for 5 bins, but is M shaped for a higher number of bins. On the right, a probability that stays M-shaped for any number of bins. (*C–F*) Dynamics of the probability as the number of individuals increases for (*C*) , (*D*) , (*E*) and (*F*) .

An interesting prediction of our model is that, for a given number of bins, the shape of the distribution of choices changes with the number of decided individuals, and the dynamics of this change depends on . For high values of , the probability is U-shaped from the beginning and becomes steeper as more individuals decide (as is the case for the stickleback dataset), Fig. 5*C*. For lower values of , we observe M-shaped distributions for the first individuals and then U-shaped ones when more individuals decide, Fig. 5*D*. For even lower values of , we observe bell-shaped distributions for the first individuals, then M-shaped and finally U-shaped, Fig. 5*E,F*.

### Symmetric set-up with modified replicas of animals

An interesting modification of the experimental set-up consists in using replicas of the animals that we can modify to potentially alter their reliability estimated by the animals. We considered the particular case, motivated by experiments in [43], of two types of modified replicas with different characteristics (for example, fat or thin), Fig. 6*A*. We considered 7 behaviors: ‘animal goes to ’ (), ‘animal goes to ’ (), ‘most attractive replica goes to ’ (), ‘most attractive replica goes to ’ () ‘least attractive replica goes to ’ (), ‘least attractive replica goes to ’ (), and ‘animal remains undecided’ (). The probability of going to in Eq. **12** then reduces to(18)where subindex ‘f’ refers to real fish and ‘R’ (‘r’) to replicas of the most (least) attractive type. As in the previous section, symmetry imposes that and . It also imposes the following relations between the reliability parameters, , , . Therefore,(19)where , and . In the particular case of only two different replicas, one going to and the other to and for notational simplicity taking the convention that the most (least) attractive replica goes to (), we have and . Therefore,(20)Note that the probability in Eq. **20** does not depend on and separately, but only on their ratio. Therefore, in this case the model uses only two parameters ( and ). We compared the model with the stickleback data set from [43], Fig. 6. The data in Fig. 6*B* has a different type of replica pair in each row, so in principle we would fit a different ratio for each row. But note that the first three rows correspond to experiments with the same three replicas (large, medium and small), combined in different pairs. The same can be said for the second and third threesomes of rows. Therefore, there are only two free parameters for each three rows. On the other hand, should have the same value for all cases. The model again reproduces the experimental results reported in reference [43], obtaining the best fit for (Fig. 6*B*). The result is robust, with good fits for (Fig. 3, green line) in accord with the value obtained for the case shown in Fig. 2*B*.

(*A*) Schematic diagram of symmetric set-up with a group of sticklebacks (in black) choosing between two identical refugia and with one replica fish going to and a different one (in size, shape or pattern) going to (in red). (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [43] (blue histogram) and results from model in Eq. **20** in the main text (red line using reliability parameter and = 0.35, 0.7, 0.5, 0.52, 0.69, 0.75, 0.43, 0.55, 0.78, 0.43, for each row from top to bottom; red region: 95% confidence interval; green line with and same ratios as for red line). Different graphs correspond to different stickleback group sizes and different types of replicas going to and .

### Asymmetric set-up

We finally considered the case in which sites and are different and the three behaviors are ‘go to ’ (), ‘go to ’ () and ‘remain undecided’ (). Eq. **12** reduces to(21)The term represents the non-social information and in general because the set-up is asymmetric by design. This asymmetry might also affect how a deciding animal takes into account the behaviours of other animals depending on which side they chose, making in general . Also, indecision might be informative. For example, if non-social information indicates the possible presence of a predator at , the indecision of other animals might confirm this to the deciding individual, further biasing the decision towards . Therefore, we may have . But it may also be the case that the set-up's asymmetry does not affect the social terms, so we also tested a simpler model in which and , giving(22)

The stickleback dataset reported in reference [42] is ideally suited to test the asymmetric model for the experiments that were performed with a replica predator at the right arm (Fig. 7*A*). The model in Eq. **22** fits best the data with (Fig. 7*B*) and it is robust with a good fit in (Fig. 3, blue line). The more complex model in Eq. **21** gives fits very similar to those of simpler model. Specifically, parameter was rejected by the Bayes Information Criterion [59], [60], suggesting that fish do not rely on undecided individuals. The fact that fish rely differently on other fish depending on the option they have taken could not be ruled out by the Bayes Information Criterion, but in any case the impact of this difference on the data is small.

(*A*) Schematic diagram of asymmetric set-up (predator at , large fish depicted in red) with a group of sticklebacks (in black) choosing between two refugia, and replica fish (small fish depicted in red) going to . (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histogram) and results from model in Eq. **22** in the main text (red line using , ; red region: confidence interval. Green line using and same as for red line). Different graphs correspond to different stickleback group sizes and different number of replicas going to .

In the experiments in Fig. 2 and Fig. 7, we have assumed that the replicas are perceived by fish as real animals. However, it is reasonable to think that fish might perceive the difference, and rely differently on replicas and real fish. To test this, we considered different behaviors for fish and replicas, such as ‘fish goes to ’ and ‘replica goes to ’. Making that distinction, we get that Eq. **12** reduces to(23)The Bayes Information Criterion rejects only parameter . However, the addition of the new parameters that distinguish replica from real fish give very small improvements in the fits compared to results of the simpler models in Eq. **17** and Eq. **22** (see Fig. S1 and S3), suggesting that fish follow replicas as much as they follow real fish.

### Model including dependencies

In this section we will remove the hypothesis of independence among the behaviors of the other individuals (Eq. **6**). We now consider that the focal individual not only takes into account the behaviors of the other animals at the time of decision but the specific sequence of decisions that has taken place before, , being the number of individuals that have decided before the focal one. For example, the sequence may give different information to the focal individual than the sequence . This is illustrated in Fig. 8*A*, where there are two possible paths leading to states labeled as 1∶1, but these two states are in different branches of the tree (in contrast with Fig. 4, in which these two states were collapsed in a single one).

(*A*) Decision-making process according to the model with dependencies, Eq. **25**–**33**. Time runs from bottom to top. Each box represents one state, and each edge represents one option of the deciding individual, that either goes to or to . Edge width is proportional to the probability of the decision. (*B*) Probability of choosing as a function of the difference of the number of individuals that have already chosen each option (), for . In the new model the probability does not depend any more on alone, so states with the same have different values for the probability (black dots). The area of the dots is proportional to the probability of observing each state. Red line shows the expected value of the probability for each value of . The green line shows the probability for the model that neglects dependencies (Eq. **17**), for .

To calculate the probability of the observed sequence of behaviors provided that is the correct choice,(24)one can apply repeatedly to obtain(25)This expression substitutes the assumption of independence in Eq. **6**. Each of the terms in the product is simply the probability that the individual makes its decision, given the previous decisions, and also given that is the correct choice. This result was expected since if we look at the tree in Fig. 8*A* we see that the probability of reaching a given state is simply the product of the probabilities of choosing the adequate branches in each step.

So the problem reduces to computing the individual decision probabilities . We assume in the following that these probabilities are calculated by the focal individual by assuming that all animals use the same rules to make a decision. The rule for the focal individual is, as in previous sections,(26)where the non-social and social terms are(27)and(28)respectively, and where we have added subscript to , and to reflect that they apply to the focal individual, that makes its decision in the place.

The assumption that all animals apply the same rules translates into the following. To apply an equation like Eq. **26** but on a different individual (say, individual ) it is necessary to know the non-social information . Remember that all these computations are made from the point of view of the focal individual, and obviously the focal individual does not have access to the non-social information of the other individuals. It may seem reasonable for the focal animal to assume that all the other individuals have the same non-social information (), but this would result in no social behavior at all (if the other individuals have the same non-social information, their behaviors will not give any extra information). Instead, one can assume that the other individuals may have a different non-social information, . Furthermore, this non-social information depends on which is the best choice, because if for example is the best choice the other individuals have some probability of detecting it, and therefore their non-social information will be on average biased towards . We approximate this average bias by assuming that, if () is the best choice, all the other individuals will have non-social information () that will bias the decision towards (). It is therefore the same to assume that () is the best option as to assume that all the other individuals have non-social information (). Therefore, for the probabilities of individual behaviors in Eq. **25**, we have that(29)where now applies to the individual, so we can compute this probability simply by applying Eq. **26** to the individual,(30)where(31)Then, if we denote , we have that(32)These are the individual probabilities needed in Eq. **25**, that takes into account the correlations among the other individuals. So we can already calculate using Eq. **28**,(33)Eqs. **30** and **33** have a recursive relation, because we need the probabilities up to step to compute , and then we need to compute the probabilities in step . At the beginning no individual has made any choices, so we start with and work recursively from there until we obtain the probabilities for individual , that allow to compute . Then, we can already use Eq. **26** to compute the decision probability of the focal individual, this time using its actual non-social term (which is 1 for the symmetric cases, and fitted to the data in the non-symmetric case).

The equations above constitute the model taking into account dependencies. The new parameters of this model are and , which substitute and in the previous models, so the number of parameters is exactly the same. In the symmetrical case we must have that , so the model has a single parameter. For the non-symmetrical case these parameters may be independent of each other, but we find good results even assuming that they are not, as was the case for the simplified model. So for simplicity we always assume that(34)For the case with different replicas at each side, each of them has a different value of , thus making one replica more attractive than the other.

The new model also matches very well with the experimental data discussed in this paper. Results for the case of two different replicas are shown in Fig. 9, for the symmetric case in Fig. S4 and for the case with predator in Fig. S5. Fits are robust, and all cases are well explained by the model with the same value of , Fig. S6. See Figs. S1, S2, S3 for a comparison of all models.

(*A*) Schematic diagram of symmetric set-up with a group of sticklebacks (in black) choosing between two identical refugia and with one replica fish going to and a different one (in size, shape or pattern) going to (in red). (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [43] (blue histogram) and results from model that takes dependencies into account (red line, with and = 21.4, 11.8, 0.6, 9.9, 4.8, 0.9, 13, 8, 0.7, 14.5, 0.9, for each type of replica (large, medium, small, fat, etc.); red region: 95% confidence interval; green line with and same as for red line). Different graphs correspond to different stickleback group sizes and different types of replicas going to and .

We now ask how different is the model including dependencies from the model that neglects them. To compare the two models, we plot the probability of going to as a function of for the new model, as we did in Fig. 1*B* for the old one. The inclusion of dependencies has the consequence that the probability of going to does not depend only on , since now different states with the same may have different probabilities. Therefore, when we plot the probability of going to as a function of we obtain different values of the probability for each value of . This is shown by the black dots in Fig. 8*B*, where the size of the dots is proportional to the probability of observing each state when starting from 0∶0. The red line shows the average probability for each , taking into account the probability of each state. Both the dots and this line correspond to , which is the one that fits best the data. The green line corresponds to the probability for the simplest model neglecting dependencies, with the value that best fits to the data (). This line is close to the mean probability for the new model and to the values with highest probability of occurrence, so the simple model is as a good approximation to the model with dependencies.

We find an interesting prediction of the new model: There are some states in which the most likely option is to choose the option chosen by *fewer* individuals (for example, note in Fig. 8*D* that some points with are above 0.5). This surprising result comes from the fact that, as more fish accumulate at one side, their choices become less and less informative (because it is very likely that they are simply following the others). If then one fish goes to the opposite side, its behavior is very informative, because it is contradicting its social information. This effect can be so strong that it may beat the effect of all the other individuals, resulting in a higher probability of following this last individual than all the individuals that decided before.

## Discussion

We have shown that probabilistic estimation in the presence of uncertainty can explain collective animal decisions. This approach generated a new expression for each experimental manipulation, Eq. **17**–**22**, and was naturally extended to test for more refined cognitive capacities, Eq. **23**. The model was found to have a good correspondence with the data in three experimental settings (Figs. 2, 6 and 7), always giving a good fit with the social reliability parameter in the interval 2–4. Indeed, all the data have a very good fit with (Figs. 2, 6 and 7, green lines). According to Eq. **9**, this value for has the interpretation that, for the behaviors relevant for these experiments, the fish assume that their conspecifics make the right choice 2.5 times more often than the wrong choice.

For the data used in this paper, previous empirical fits used more parameters [42] (Figs. S1, S2, S3, blue line), and added more complex behavioral rules when the basic model failed [43] (Fig. S2, blue line). Our approach thus gains in simplicity. It also finds an expression for each set-up with expressions for complex set-ups obtained with add-ons to those of simpler set-ups, making the model scalable and easier to understand in terms of simpler experiments. Also, taking the models as fits to experimental data, the bayesian information criterion finds our models to be better than those in [42] and [43] (see captions in Figs. S1, S2, S3 for details).

Collective animal behavior has been subject to a particularly careful quantitative analysis. Previous studies have given descriptions led by the powerful idea that complex collective behaviors can emerge from simple individual rules. In fact, some systems have been found empirically to obey rules that are mathematically similar or the same as some of the ones presented in this paper, further supporting the idea that probabilistic estimation might underlie collective decision rules in many species. For example, a function like the one in Eq. **17** has been used to describe the behavior of Pharaoh's ant [61], a function like Eq. **22** for mosquito fish [62], and a function like the one in the right-hand-side of Eq. **22** for meerkats [63]. But despite the importance of group decisions in animals, little is known about the origin of such simple individual rules. This paper argues that probabilistic estimation can be an underlying substrate for the rules explaining collective decisions, thus helping in their evolutionary explanation. Also, this connection between patterns in animal collectives and a cognitive process helps to explain the similarities that exist between decision-making processes at the level of the brain and at the level of animal collectives [64], [65].

Our model is naturally compatible with other theories that use a Bayesian formalism to study different aspects of behavior and neurobiology, thus contributing to a unified approach of information processing in animals. For example, it may be combined with the formalism of Bayesian foraging theory [18], through an expansion of the non-social reliability . Related to this case, a very well studied example of use of social information is the one in which one individual can observe directly the food collected by another individual [29]–[33]. In this case the social information is as unambiguous as the non-social one, so in this case both types of information should have a similar mathematical form [29]–[33]. This is consistent with our model, that in this case will give a similar expression for and . Other kinds of social information (such as another individual's decision to leave a food patch or choices of females in mating [41]) would enter naturally in our reliability terms . In discussing these and similar problems, it has been proposed that animals should use social information when their personal information is poor, and ignore it otherwise [25], [26], [41]. Our model provides a quantitative framework for this problem, predicting that social information is always used, only with different weights with respect to other sources of information. Bayesian estimation is also a prominent approach to study decisions in neurobiology and psychology [3]–[17] and it would be of interest to explore the mechanisms and role played by the multiplicative relation between non-social and social terms.

Our approach also makes a number of predictions. For example, it derives the probability of choosing among options (see Eq. **S16** of the *Text S1*), that for the symmetric case reduces to(35)predicted also to fit the data for cases with options.

We also predict a quantitative link between estimation and collective behavior. The parameters and in our model are in fact not merely fitting parameters, but true experimental variables. Manipulations of and should allow to test that changes in collective behavior follow the predictions of the model. A counterintuitive prediction about the manipulation of is that external factors unrelated to the social component can nevertheless modify it. For example, a fish that usually finds food in a given environment should interpret a sudden turn of one of his mates as an indication that it has found food, and therefore will follow it. In contrast, another fish that is not expected to find food in that environment will not interpret the sudden turn as indicative of food, and will not follow. Thus, the model predicts that the *a priori* probability of finding food (to which each fish can be trained in isolation) will modify its propensity to follow conspecifics. An alternative approach that would not need manipulation of the reliabilities would consist in showing that the probability of copying a behavior increases with how reliably the behavior informs about the environment.

We can also extend the estimation model to use, instead of the location of animals, their predicted location. We would then find expressions like the ones in this paper but for the number or density of individuals estimated for a later time. Consider for example the case without non-social information, described in Eq. **17** for two options and in Eq. **35** for more options. We can rewrite these equations as with one of the options and is the normalization, , where is the number of options. Then, we would have for the continuous case using prediction. Future positions at times (where does not need to be constant) in terms of variables at present time would be given by for animals moving at constant velocity . Consider then a simple case of an animal located at and estimating the future position of a compact group at and moving with velocity . The deciding animal would be predicted to move with a high probability in the direction . Estimation of future locations thus naturally predicts in this simple case a particular form of ‘attraction’ and ‘alignment’ forces of dynamical empirical models [46], [66] as attraction to future positions, but in the general also deviations from these simple rules.

## Methods

### Obtaining group behavior from the model of an individual

The estimation rules presented in this paper refer to a single individual. To simulate the behavior of a group, we use the following algorithm: The current individual decides between and . After the decision, we recompute the relevant parameters of the model and use the new values for the next deciding individual. The undecided individuals are only those that are waiting for their turn to decide. We tested an alternative implementation in which individuals may remain undecided or in which two individuals can decide simultaneously, obtaining no relevant differences.

For the case of the model including dependencies, the model always starts at state 0∶0, with . Most experiments have initial conditions in which several replicas are already going to either side, and the fish have no information about the path followed to reach this state. In these cases, we average the probabilities of all the paths that might have possibly led to the initial state to compute the initial value of .

*Protocol S1* and *Protocol S2*, contain Matlab functions that run the models (extensions of the files must be changed from .txt to .m to make them operative). *Protocol S1* corresponds to the model without dependencies, and *Protocol S2* corresponds to the model with dependencies. These functions have been used to generate all the theoretical results presented in this paper.

### Fits

We computed log likelihood as the logarithm of the probability that the histograms come from the model. We searched for the model parameters giving a higher value of log likelihood, corresponding to a better fit. This search was performed by optimizing each parameter separately (keeping the rest constant) and iterating through all parameters until convergence. In all cases convergence was rapidly achieved. We performed multiple searches for best fitting parameters starting from random initial conditions and always found convergence to the same values, suggesting there are no local maxima. Indeed, we observed that log-likelihood is smooth and with a single maximum in all the cases with 1 or 2 parameters (see Fig. 3 for an example).

### Bayesian Information Criterion

For model comparison we used the Bayesian Information Criterion (BIC) [59], [60], which takes into account both goodness of fit and the number of parameters. According to this criterion, among several models that have been fitted to maximize log likelihood, one should select the one for which(36)is largest, where is the logarithm of the probability that the data comes from the model once its parameters have been optimized to maximize this probability, is its number of parameters of the model and is the number of measurements (which in our case is the same for all models).

More intuitive than the direct values in Eq. **36** are the BIC weights, defined as [60](37)when we assume that all models are *a priori* equally likely. Roughly speaking, can be interpreted as the probability that model is the most correct one [60].

We used BIC to compare different versions of our model, and also to compare our model with those of references [42], [43] (see Figs. S1, S2, S3). The models of refs. [42], [43] were originally fitted by minimizing the mean squared error instead of by maximizing logprob. For this reason, they score very poorly in BIC with their reported parameters. For this reason, we re-optimized for maximum logprob all their model parameters (these parameters are, using the notation of refs. [42], [43], , , , and , with only applicable in the case of predator present). For the case of different replicas going to each side, parameter takes a different value for each row in the figure, adding up to 10 parameters. The model in ref. [43] is computationally expensive, so it is not feasible to re-optimize these many parameters. Therefore, we treated them as if they were independently measured: we fixed in each case so that the results of the trials with a single individual matched exactly the model's prediction (as reported in [43]). We also followed this procedure with the ratios of our model without dependencies, and the pairs in our model with dependencies. Then, we performed BIC taking into account neither these parameters ( the ratios and the pairs ) nor the data from trials using single individuals.

## Supporting Information

### Figure S1.

**Comparison between different models for the symmetric set-up.** Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histograms). Red line: results from our single-parameter model assuming independence in Eq. **17** in the main text (). Green line: Enhanced model assuming independence with different reliability for the replicas (, ). Yellow line: Model including dependencies (). Blue line: Empirical model presented in Ref. [42], using the parameters reported there. Different graphs correspond to different stickleback group sizes and different number of replicas going to and . According to Bayesian Information Criterion (BIC, see *Methods*), the best model is our model with dependencies (yellow line, logprob , and BIC weight . Second-best is the complicated version of the model without dependencies (green line, logprob , and BIC weight ). Third-best is our one-parameter model assuming independence (red line, , ). And last (but not far from the third one) the model from Ref. [42] (blue line, ). For the model from Ref. [42], and correspond to a re-optimization of the model as described in *Methods*, because using the parameters reported in [42] would perform worse).

https://doi.org/10.1371/journal.pcbi.1002282.s001

(TIF)

### Figure S2.

**Comparison between different models for the condition with two different replicas.** Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [43] (blue histograms). Red line: results from model in Eq. **20** in the main text (, = 0.35, 0.7, 0.5, 0.52, 0.69, 0.75, 0.43, 0.55, 0.78, 0.43 for each row from top to bottom). Yellow line: Model including dependencies (, = 21.4, 11.8, 0.6, 9.9, 4.8, 0.9, 13, 8, 0.7, 14.5, 0.9 for each type of replica (large, medium, small, etc.). Blue line: Empirical model presented in Ref. [43], using the parameters reported there. Different graphs correspond to different stickleback group sizes and different types of replicas going to and . According to Bayesian Information Criterion (BIC, see *Methods*), our model neglecting dependencies gives the best representation of the data (red line, logprob , and BIC weight ). Second-best is out model including dependencies, (, ). Last, but near the second one, is the model from ref. [43] (blue line, . For the model from Ref. [43], these values of and correspond to a re-optimization of the model as described in *Methods*, because using the parameters reported in [43] would perform worse). The values of logprob () reported here do not include the data of the single-individual experiments (see *Methods*).

https://doi.org/10.1371/journal.pcbi.1002282.s002

(TIF)

### Figure S3.

**Comparison between different models in the asymmetrical set-up.** Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histograms). Red line: results from model neglecting dependencies in Eq. **22** in the main text (, ). Green line: Enhanced model neglecting dependencies with different reliability for the fish going to different locations and for the replicas (, , , . has no effect because there are no replicas going to ). Yellow line: Two-parameter model including dependencies (, ). Blue line: Empirical model presented in Ref. [42], using the parameters reported there. Different graphs correspond to different stickleback group sizes and different number of replicas going to . According to Bayesian Information Criterion (BIC, see *Methods*), the best two models are our complicated version neglecting dependencies (green line, logprob , and BIC weight ) and our two-parameter model including dependencies (yellow line, , ). Next (but very near) is our simplified model (red line, , ). And last (and significantly worse) the model from Ref. [42] (blue line, . For the model from Ref. [42], the values of and correspond to a re-optimization of the model as described in *Methods*, because using the parameters reported in [42] would perform worse. In two of the graphs for group size 1 that there are no data the prediction of the model from Ref. [42] and our model (especially the simplest version) are opposite. It might be that the results changed completely, depending on the results of these graphs, were the experiments performed. But we found that this is not the case: We performed simulations, adding experimental data in these two graphs. Even in the extreme case that the fabricated results matched exactly the predictions of the model in Ref. [42], BIC would still favour two of our models (we would get , for our model with dependence, , for our complicated model neglecting dependence, , for our simplified model neglecting dependence and , for the model in [42]).

https://doi.org/10.1371/journal.pcbi.1002282.s003

(TIF)

### Figure S4.

**Comparison between model including dependencies and stickleback choices in symmetric set-up.** (*A*) Schematic diagram of symmetric set-up with a group of sticklebacks (in black) choosing between two identical refugia and with different numbers of replica fish (in red) going to and . (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histogram) and results from the model that takes into account dependencies (red line using ; red region: 95% confidence interval; green line with ). Different graphs correspond to different stickleback group sizes and different number of replicas going to and .

https://doi.org/10.1371/journal.pcbi.1002282.s004

(TIF)

### Figure S5.

**Comparison between model including dependencies and stickleback choices in asymmetric set-up.** *A*) Schematic diagram of asymmetric set-up (predator at , large fish depicted in red) with a group of sticklebacks (in black) choosing between two refugia, and replica fish (small fish depicted in red) going to . (*B*) Experimentally measured statistics of final configurations of fish choices from 20 experimental repetitions [42] (blue histogram) and results from the model that takes into account the dependencies (red line using , ; red region: confidence interval. Green line using and ). Different graphs correspond to different stickleback group sizes and different number of replicas going to .

https://doi.org/10.1371/journal.pcbi.1002282.s005

(TIF)

### Figure S6.

**Goodness of fit of the model including dependencies for different values of ****.** **Red:** Symmetric case (data in Fig. S4). **Green:** Case with different replicas at each side (data in Fig. 9. The parameters are re-optimized for each value of ). **Blue:** Asymmetric set-up with predator on one side (data in Fig. S5; Parameter is re-optimized for each value of ). (*A*) Root mean squared error between the data and the probabilities predicted by the model. Grey dashed line shows the mean RMSE for the three cases. The absolute values for each case depend on the shape of the data and are not comparable, only the trends and the position of the minima should be compared. (*B*) Logarithm of the probability that the data come from the model. The height of each curve depends on the number of data for each experiment, only the trend and the position of the maxima should be compared. Grey dashed line shows the sum of the three coloured lines, but shifted by 1000 so that it fits on the scale. The peak of this global probability indicates the value of that best fits the three datasets ().

https://doi.org/10.1371/journal.pcbi.1002282.s006

(TIF)

### Protocol S1.

**Algorithm for the model that neglects dependencies.** This file contains Matlab code that runs the model without dependencies. Please, change extension from .txt to .m to make it operative. It can be run without any input argument. Once the extension is changed to .m, simply type ProtocolS1 in Matlab's command window to get results for default parameters. Documentation is given inside the file. Type help ProtocolS1 in Matlab's command window to see the documentation.

https://doi.org/10.1371/journal.pcbi.1002282.s007

(TXT)

### Protocol S2.

**Algorithm for the model that takes dependencies into account.** This file contains Matlab code that runs the model with dependencies. Please, change extension from .txt to .m to make it operative. It can be run without any input argument. Once the extension is changed to .m, simply type ProtocolS2 in Matlab's command window to get results for default parameters. Documentation is given inside the file. Type help ProtocolS2 in Matlab's command window to see the documentation.

https://doi.org/10.1371/journal.pcbi.1002282.s008

(TXT)

### Text S1.

**Derivation of the model with more options.** This file contains the derivation of the model for the more general case of different options (instead of only 2, as presented in the main text).

https://doi.org/10.1371/journal.pcbi.1002282.s009

(PDF)

## Acknowledgments

We acknowledge useful comments by Sara Arganda, Larissa Conradt, Iain Couzin, Jacques Gautrais, David Sumpter, Guy Theraulaz, Julián Vicente Page and COLMOT 2010 participants.

## Author Contributions

Conceived and designed the experiments: APE GGdP. Performed the experiments: APE GGdP. Analyzed the data: APE GGdP. Wrote the paper: APE GGdP.

## References

- 1.
Box G, Tiao G (1973) Bayesian inference in statistical analysis. New York: Addison-Wesley. Available: http://onlinelibrary.wiley.com/doi/10.1002/9781118033197.fmatter/summary.
- 2.
Jaynes ET, Bretthorst LG (2003) Probability Theory: The Logic of Science (Vol 1). Cambridge University Press.
- 3.
Helmholtz H (1925) Physiological Optics, Vol. III: The perceptions of Vision. Rochester, NY, USA: Optical Society of America.
- 4.
Mach E (1980) Contributions to the Analysis of the Sensations. Chicago, IL, USA: Open Court Publishing Co.
- 5. Knill DC, Pouget A (2004) The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27: 712–9.
- 6. Jacobs R (1999) Optimal integration of texture and motion cues to depth. Vision Res 39: 3621–3629.
- 7. Knill DC, Saunders JA (2003) Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Res 43: 2539–2558.
- 8. Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415: 429–33.
- 9. Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A 20: 1391.
- 10. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal bimodal integration. Curr Biol 14: 257–262.
- 11. Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends Cogn Sci 5: 10–16.
- 12. Kording KP, Wolpert DM (2004) Bayesian integration in sensorimotor learning. Nature 427: 244–247.
- 13. Körding KP, Wolpert DM (2006) Bayesian decision theory in sensorimotor control. Trends Cogn Sci 10: 319–26.
- 14. Gold JI, Shadlen MN (2007) The neural basis of decision making. Annu Rev Neurosci 30: 535–74.
- 15. Courville AC, Daw ND, Touretzky DS (2006) Bayesian theories of conditioning in a changing world. Trends Cogn Sci 10: 294–300.
- 16. Kruschke JK (2006) Locally Bayesian learning with applications to retrospective revaluation and highlighting. Psychol Rev 113: 677–99.
- 17. Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND (2011) How to Grow a Mind: Statistics, Structure, and Abstraction. Science 331: 1279–1285.
- 18. Oaten A (1977) Optimal foraging in patches: A case for stochasticity. Theor Popul Biol 12: 263–285.
- 19. Biernaskie JM, Walker SC, Gegear RJ (2009) Bumblebees learn to forage like Bayesians. Am Nat 174: 413–423.
- 20. Alonso J (1995) Patch use in cranes: a field test of optimal foraging predictions. Anim Behav 49: 1367–1379.
- 21. McNamara JM, Green RF, Olsson O (2006) Bayes theorem and its applications in animal behaviour. Oikos 112: 243–251.
- 22. Valone TJ (2006) Are animals capable of Bayesian updating? An empirical review. Oikos 112: 252–259.
- 23. Valone TJ, Templeton JJ (2002) Public information for the assessment of quality: a widespread social phenomenon. Philos Trans R Soc Lond B Biol Sci 357: 1549–57.
- 24. Blanchet S, Clobert J, Danchin E (2010) The role of public information in ecology and conservation: an emphasis on inadvertent social information. Ann NY Acad Sci 1195: 149–68.
- 25. Dall SRX, Giraldeau LA, Olsson O, McNamara JM, Stephens DW (2005) Information and its use by animals in evolutionary ecology. Trends Ecol Evol 20: 187–93.
- 26. Giraldeau LA, Valone TJ, Templeton JJ (2002) Potential disadvantages of using socially acquired information. Philos Trans R Soc Lond B Biol Sci 357: 1559–66.
- 27. Wagner RH, Danchin E (2010) A taxonomy of biological information. Oikos 119: 203–209.
- 28. King AJ, Cowlishaw G (2007) When to use social information: the advantage of large group size in individual decision making. Biol Lett 3: 137–9.
- 29. Valone TJ (1989) Group Foraging, Public Information, and Patch Estimation. Oikos 56: 357–363.
- 30. Templeton JJ, Giraldeau LA (1995) Patch assessment in foraging flocks of European starlings: Evidence for the use of public information. Behav Ecol 6: 65–72.
- 31. Templeton JJ, Giraldeau LA (1996) Vicarious sampling: The use of personal and public information by starlings foraging in a simple patchy environment. Behav Ecol Sociobiol 38: 105–14.
- 32. Smith JW, Benkman CW, Coffey K (1999) The use and misuse of public information by foraging red crossbills. Behav Ecol 10: 54–62.
- 33. Clark C, Mangel M (1986) The evolutionary advantages of group foraging. Theor Popul Biol 30: 45–75.
- 34. Doligez B, Danchin E, Clobert J (2002) Public information and breeding habitat selection in a wild bird population. Science 297: 1168–70.
- 35. Boulinier T, Danchin E (1997) The use of conspecific reproductive success for breeding patch selection in terrestrial migratory species. Evol Ecol 11: 505–517.
- 36. Coolen I, van Bergen Y, Day RL, Laland KN (2003) Species difference in adaptive use of public information in sticklebacks. Proc Biol Sci 270: 2413–9.
- 37. van Bergen Y, Coolen I, Laland KN (2004) Nine-spined sticklebacks exploit the most reliable source when public and private information conflict. Proc Biol Sci 271: 957–62.
- 38. Rieucau G, Giraldeau La (2009) Persuasive companions can be wrong: the use of misleading social information in nutmeg mannikins. Behav Ecol 20: 1217–1222.
- 39. Lima SL (1995) Collective detection of predatory attack by social foragers: fraught with ambiguity? Anim Behav 50: 1097–1108.
- 40. Proctor CJ, Broom M, Ruxton GD (2001) Modelling antipredator vigilance and flight response in group foragers when warning signals are ambiguous. J Theor Biol 211: 409–17.
- 41. Nordell , Valone TJ (1998) Mate choice copying as public information. Ecol Lett 1: 74–76.
- 42. Ward AJW, Sumpter DJT, Couzin ID, Hart PJB, Krause J (2008) Quorum decision-making facilitates information transfer in fish shoals. Proc Natl Acad Sci USA 105: 6948–53.
- 43. Sumpter DJT, Krause J, James R, Couzin ID, Ward AJW (2008) Consensus decision making by fish. Curr Biol 18: 1773–1777.
- 44. Couzin ID, Krause J (2003) Self-organization and collective behavior in vertebrates. Adv Stud Behav 32: 1–75.
- 45. Sumpter DJ (2006) The principles of collective animal behaviour. Philos Trans R Soc Lond B Biol Sci 361: 5–22.
- 46. Couzin ID, Krause J, Franks NR, Levin SA (2005) Effective leadership and decision-making in animal groups on the move. Nature 433: 513–516.
- 47. Katz Y, Tunstrom K, Ioannou CC, Huepe C, Couzin ID (2011) Inferring the structure and dynamics of interactions in schooling fish. Proc Natl Acad Sci USA. E-pub ahead of print.
- 48. Neyman J, Pearson E (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Transact A Math Phys Eng Sci 231: 289.
- 49. Herrnstein R (1961) Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav 4: 267.
- 50. Behrend ER, Bitterman ME (1961) Probability-Matching in the Fish. Am J Psychol 74: 542–551.
- 51. Greggers U, Menzel R (1993) Memory dynamics and foraging strategies of honeybees. Behav Ecol Sociobiol 32: 17–29.
- 52. Kirk KL, Bitterman ME (1965) Probability-Learning by the Turtle. Science 148: 1484–1485.
- 53. Vulkan N (2000) An Economist's Perspective on Probability Matching. J Econ Surv 14: 101–118.
- 54. Wozny DR, Beierholm UR, Shams L (2010) Probability matching as a computational strategy used in perception. PLoS Comput Biol 6: 7.
- 55.
Staddon J (1983) Adaptive Behavior and Learning. Cambridge: Cambridge University Press. Available: http://dukespace.lib.duke.edu/dspace/handle/10161/2878.
- 56. Fretwell S, Lucas H (1969) On territorial behavior and other factors influencing habitat distribution in birds. Acta Biotheor 19: 16–36.
- 57. Houston A, McNamara J (1987) Switching between resources and the ideal free distribution. Anim Behav 35: 301–302.
- 58. Gaissmaier W, Schooler LJ (2008) The smart potential behind probability matching. Cognition 109: 416–22.
- 59. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464.
- 60. Link WA, Barker RJ (2006) Model weights and the foundations of multimodel inference. Ecology 87: 2626–2635.
- 61. Jeanson R, Ratnieks FLW, Deneubourg JL (2003) Pheromone trail decay rates on different substrates in the Pharaoh's ant, Monomorium pharaonis. Physiol Entomol 28: 192–198.
- 62. Ward AJW, Herbert-Read JE, Sumpter DJT, Krause J (2011) Fast and accurate decisions through collective vigilance in fish shoals. Proc Natl Acad Sci USA 108: 6–9.
- 63. Bousquet CAH, Sumpter DJT, Manser MB (2011) Moving calls: a vocal mechanism underlying quorum decisions in cohesive groups. Proc Biol Sci 278: 1482–1488.
- 64. Marshall JA, Bogacz R, Dornhaus A, Planqué R, Kovacs T, et al. (2009) On optimal decisionmaking in brains and social insect colonies. J Roy Soc Interface 6: 1065–74.
- 65. Couzin ID (2009) Collective cognition in animal groups. Trends Cogn Sci 13: 36–43.
- 66. Couzin ID, Krause J, James R, Ruxton GD, Franks NR (2010) Collective memory and spatial sorting in animal groups. J Theor Biol 218: 1–11.