Logical Gaps in the Approximate Solutions of the Social Learning Game and an Exact Solution

After the social learning models were proposed, finding solutions to the games becomes a well-defined mathematical question. However, almost all papers on the games and their applications are based on solutions built either upon an ad-hoc argument or a twisted Bayesian analysis of the games. Here, we present logical gaps in those solutions and offer an exact solution of our own. We also introduce a minor extension to the original game so that not only logical differences but also differences in action outcomes among those solutions become visible.


Introduction
The original version of social learning game (see [1,2] for an introduction and a short review) is a problem with N learners in which each learner (denoted as learner j) attempts to identify and act accordingly to the true status of a world (s w ), which is either in a state 1 with probability q ext~0 :5 or in another state {1 with probability 1{q ext , from observing her own private signals (s j ) and all previous learners' actions (ã j{1~a1 a 2 Á Á Á a j{1 ð Þ ) but without explicitly knowing the previous learners' private signals (s j{1~s1 s 2 Á Á Á s j{1 ð Þ ). In the game it is assumed that the private signal received by each learner has a probability p §0:5 to be the true status of the world. It is usually required that one learner takes an action in every round. Usually, the turn order of learners' action is externally given. A learner receives a positive payoff (M z ) when her action is the same as the status of the world, and a negative payoff ({M { , here M z~M{ ) otherwise.
This model of observational social learning was proposed to describe herd behaviors such as formation of fads, fashion, or culture convention [3,4]. For example, in deciding to purchase an iPhone or an Android phone, although personal opinions about the quality and features of the various phones are important, the choices of friends both locally and on social media, are at least, equally important, or sometimes even more important as personal opinions. Of course, in this case of cellphone purchasing, following friends might lead to a decision that one will later be not happy with. The question of whether following group wisdom [5] leads to better or worse decisions has been actively investigated.
It has been shown [3,4] that in this typical setup there is an information cascade which can lead to either the proper status of the world or the wrong status even when all of the learners are fully rational. After the cascade happens, the rest of the learners choose the same action. This model and the implied cascade phenomenon even for fully rational learners are regarded to be to some degree a relevant model of dynamics of public opinion and formation of fashion and fads [3,4]. To use this model to describe real-world phenomena, we need a mathematical theory to calculate action outcomes of every learner in this game.
One key problem in doing so for a learner at the jth place in the social learning game is how to determine the probability of world's status being 1 (s w~1 ), given the historical record of previous learner's action a 1 a 2 Á Á Á a j{1 ð Þand her private signal s j , l j~P s w~1 ja 1 a 2 Á Á Á a j{1 ,s j À Á : Once a learner knows precisely this probability, she can always make an informed decision. Under the assumption that all other learners are as rational and capable as the jth learner herself, finding the right formula to calculate l j such that she will obtain the maximum payoff is a well-defined mathematical problem. An exact solution of this mathematical problem refers to a fully rational solution of the above problem from an ideal learner with potentially infinite capability of mathematical calculation. However, except in the case where the private signal is also open to the public, there is no exact procedure to calculate this l j in the literature.
A common method of avoiding the calculation of l j for the jth learner is to count the number of actions with an observed value of 1 (denoted as N j{1,a z ) and the number of actions with an observed value of {1 (N j{1,a { ) in the previous j{1 ð Þactions and then to act in concert with the majority after including her own private signals. We call this technique the blind action-counting approach and denote the calculated probability as l j,B in the following. Quite often in theoretical analysis of the social learning game, one focuses on the phenomenon of ''cascading'': After observing certain number of previous actions, the remaining learners choose the same action regardless of their private signals. Using the blind action-counting l j,B , it is easy to find that for the original game two consecutive actions, when they are the same, determine the action of the next player and thus all of the remaining players. Therefore, one can study the game by simply enumerating all of the cases where cascade happens. However, there is no solid mathematical foundation here to claim that this blind action-counting approach is the best or the exact solution, although this approach is commonly used in analysis of the social learning games [2,3,[6][7][8].
Another commonly used method is based on a Bayesian analysis of the game [9][10][11][12][13][14]. We will comment on these two solutions and demonstrate where the logical gaps exits in the two solutions in the next section.
The main contribution of this manuscript, in addition to showing the gaps in the two approximate solutions, is presenting an exact solution of our own. We will first present our calculation and then compare it against the two approximate solutions on the original game. Although as we will see later, our own solution is in principle different from the other two, we will see that there is almost no difference among the three solutions on the original social learning game. While we will also provide a reason of the same action outcomes from the three solutions, we propose a minor extension of the social learning game, to which all the solutions, if proper to the original game, should be applicable as well. We will, however, demonstrate that the three solutions lead to different average payoffs and that, on average, our exact solution has the highest payoff in the extended games. We believe this will be sufficient to illustrate that the two approximate solutions are not as good as the exact ones.

Logical Gaps in the Blind Action-Counting and the Twisted Bayesian Approaches to the Social Learning Game
In the original definition of the social learning game, all learners know that the true status of the world follows a known priori distribution of all possibilities, which is usually taken as 1,{1 ð Þ: with a probability of q ext~0 :5 the status of the world is 1, i.e., q ext~P s w~1 ð Þ: However, after the world's status is initiated it stays at that status during the entire learning process. After the above-mentioned l j~P s w~1 ja 1 a 2 Á Á Á a j{1 ,s j ð Þ is known, the rest of decision making process is trivial, i.e. a j~1 when and a j~{ 1 when l j v 1 2 . For the case of l j~1 2 additional tie-breaking rules are required, for example a j~1 Here random(1,{1) means to take one value from 1 and {1 with equal probability. Other tie-breaking rules are also possible [4,9]. This relation between a j and l j can also be denoted as which in the special case of sign 0 ð Þ is assumed to be random(1,{1) instead of its usual value sign 0 ð Þ~0. To use this relation between a j and l j conveniently in later derivations, Eq. (4) can also be represented by a distribution function of where d i,j ð Þ is the Kronecker d notation that it is 1 when i~j and 0 otherwise. One can check that Eq. (4), Eq. (5) and Eq. (6) are in fact the same even though the latter takes a form of probability distribution, The probability distribution form of Eq. (6) is very important in deriving our exact procedure, as we will see later. Now that all of our terminologies and notations have been defined, let us start our discussions on solutions to the social learning game. We have mentioned that the only non-trivial part of the decision making process of the social learning game is the calculation of l j . As we stated in the introduction, there are usually two approaches for this calculation. One is to simply count how many times action 1({1) has been taken previously and denote it as N From this formula, it seems that this decision making process does not really need l j . We will show later that in fact, it assumes a very special form of l j as in Eq. (22), and from which Eq. (8) can be derived. There we will see clearly what is missing in this argument. Here, we first want to present a count argument to note that there might be better decision making mechanisms than this blind action-counting approach.
Consider the case of a private signal sequence s 1 ,s 2 ð Þ~1,{1 ð Þ and the corresponding action sequence being a 1 ,a 2 ð Þ~1,1 ð Þ, which is possible under the random tie-breaking rule. Up on observing this action sequence, according to the blind action-counting approach, the third learner will definitely choose the action a 3~1 no matter what her private signal is. Assuming that her private signal is s 3~{ 1, there is in fact a higher chance that the world is in state s w~{ 1 other than s w~1 . However, as argued above, the third learner will choose a 3~1 thus, future learners as well, leading to a wrong cascade.
Generally speaking when a 1 ,a 2 ð Þ~1,1 ð Þ is observed, it more likely that the world is indeed in a state of s w~1 , so it is not that wrong to choose action a 3~1 . However, at least in principle, when the third learner gets s 3~{ 1, she should be more careful about simply discarding her own signal especially when she is fully aware of the random tie-breaking rule. Is there any possibility to take this into consideration? According to the blind action-counting approximate solution l j,B , the answer is no. Will any other solutions be able to take care of this and do it better?
As we will see later, using the proper form of Bayesian analysis we can, in fact, do better: By figuring out all possibles j{1 from observedã j{1 , better decisions can be made.
The second commonly used approach [9][10][11][12][13][14] of calculation of this l j is more involved than counting N j,a + . Let us rephrase the formula originally from [9] here in terms of our own notations. Assuming j j{1~P s w~1 jã j{1 ð Þis known to the jth learner for some reasons, which will be explained later, using Bayesian formula, we can have This leads to This formula linking j j{1 to l j , while it has a very confusing meaning as we will show latter, is mathematically sound. In order to form a closed formula system, it requires a formula linking l j to j j , such that the next iteration will give l jz1 , In a sense, this assumes that upon observing a j~1 , the jz1 ð Þth learner will effectively think that s j~1 and similarly when observing a j~{ 1. However, this step, exactly this step, is not necessary true.
To summarize, the above Bayesian analysis can be expressed as Taking effectively that s j~aj may not be that far off overall, is problematic and the exact source of the logical mistake of this approach. We call the above l j calculated by the twisted Bayesian approach especially from Eq. (11) the twisted-Bayesian approach and denote it as l j,tB . If we are going to follow this line of thinking we need a better formula from l j to j j . There is another potentially misleading part in the above derivation of Eq. (9): While it is not mathematically wrong, letting P s j js w~1 ,ã j{1 ð ÞP s j js w~1 ð Þis logically not straight. In the left-hand side, we are thinking that knowing only the action historyã j{1 thus we need to figure out distribution of s w first according to thisã j{1 , and then use this 'figured out' distribution of s w and limit ourselves in considering only the subset of s w~1 , and then to calculate probability distribution of s j within the subset of s w~1 ; in right-hand side, we are thinking that when s w is known then the distribution of s j depends only on s w but not onã j{1 . These two expressions are not at the same level of logic. One way of out of this insecure practice of mathematics is to completely avoid P s j jã j{1 ð Þ and consider instead P s j js w~1 ð Þand Pã j{1 js w~1 ð Þ , which are absolutely well-defined. We do so in the next section when constructing the exact formula of l j .
The blind action-counting approach does not make use of the full information so that there are potentially rooms for better solutions and the twisted Bayesian analysis misses one important step in its mathematical formalism: There is no solid mathematical ground for Eq. (11), which links l j to j j .
In the rest of this manuscript, we will present a solution that makes use of the full information and also every step of it has a solid mathematical ground. The only catch is that it is quite mathematically involved and the idea originates from statistical physics, which might not be a common or familiar toolbox for researchers in social learning, game theory or even other fields of economics. In statistical physics, non-interacting systems are much easier to address and quite often, they provide a good starting point for building up formalism to tackle interacting systems. It is exactly this beauty of statistical physics that makes it possible to develop our own calculation of l j .
Exact Formula of l j Using the Bayesian formula differently from Eq. (9), we can rearrange l j~P s w~1 jã j{1 ,s j Here, in the last step, we used the fact that the previous actions and the current private signal are two independent events. There is only one unknown term in Eq.
pã j{1 s w~Pã and we denote it as pã j{1 s w , which is the probability of history of a specific previous j{1 actions beingã j{1 , given world status s w .
Notice the above l j is subjective, i.e., it is in the jth learner's mind that how much she believes the status of the world is s w~1 with given informationã j{1 and s j . Therefore, pã j{1 s w is also subjective. In the future, when calculating the rate of accuracy and the probability of cascading, we will need an objective probability Pã s w . The way to find this Pã s w is to use where ps s w~1 is totally objective and has nothing to do with learners' decision making and pã s is the probabilities of all action outcomesã given signal sequences and depends on the learners' decision making. The way to calculate Pã s w~1 is to determine all action outcomesã of a givens and then sum over alls leading to the sameã according to Eq. (15). To calculate pã s , we need to generate all possible signal sequencess and go through the decision-making process, according to given approaches of calculating l j , to determine action sequencesã for each of the sequencess. Notice that pã s w~1 and Pã s w~1 are potentially different.
the probability that the world's status is s w~1 can be expressed as where DN j{1,s~X j{1 l~1 s l . When q ext~0 :5, this probability is more than DN j{1,s zs j w0. Similar procedure for this public-signal case has also been discussed in [15]. Next, we generalize the above calculation to a case where only the actions but not the private signals are known to learners.
Blind action-counting l j,B First, we want to spend a little bit of our time on revisiting of the blind actioncounting approach. By revisiting it, we will clearly see what information is missing in the blind action-counting approach.
Even when only previous actions but not private signals are available to the public, let us assume for now that N j{1,a + includes as much information as N j{1,s + , thus from Eq. (21) we have, where DN j{1,a~X j{1 l~1 a l . Learners adopting this decision-making mechanism are regarding that action sequences provide as much information as signal sequences. This is obviously wrong. It can be shown that according to this formula, l j,B w 1 2 if DN j{1,a zs j w0 and therefore it lead exactly to Eq. (8).
In many previous studies of the social learning, calculations of probability of cascades and other quantities were based on this l j,B [2, 3, 6-8]. This is potentially sub optimal because action sequences are treated as reliable as signal sequences. Next, we are going to present one solution, in which signal sequences and their distributions are figured out first from action sequences and then decision is then made upon the inferred signal sequences. The idea is very simple. If we can turn the history of actions into a history of signals, we can make use of the above ps j{1 s w and then everything is done. That is to say, we want all possible signalss j{1 which lead to actionã j{1 . Making use of the law of total probability, we have pã j{1 s w~X s j{1 where Pã j{1 j s j{1 ,s w ð Þis related to the decision making process, and the decision making process does not directly involves s w because s w is not explicitly known to learners. Therefore, We first notice that because we assume that learners make no mistakes,ã j{1 is fully determined bys j{1 and the signal vectors j{2 of previous learners has no direct effect on the current j{1 ð Þth learner's decision. Thus, pã j{1 s j{1~Pã j{2 j s j{1 À Á P a j{1 j s j{1 À Á Pã j{2 j s j{2 À Á P a j{1 j s j{2 ,s j{1 À Á pã j{2 s j{2 P a j{1 jã j{2 ,s j{1 À Á pã j{2 s j{2 d a j{1 ,sign l j{1ãj{2 ,s j{1 In the second last step, we have used the fact that a j{1 is not directly determined bys j{2 ,s j{1 whereã j{1 , pã j{1 s w and the action outcome a j are public information, while the l j is private to the jth learner since s j is hidden from the public. We call the l j calculated from the procedure defined in Eq. (27) the exact solution l j,A since it solve exactly the mathematical problem of finding P s w jã j{1 ,s j ð Þ . Here the Superscript A refers to the fact that it is calculated from histories of actions.
Examples showing that l j,A is potentially different from l j,B and l j,tB Next we illustrate one example of the results from the above iterative calculation l j,A and the other two approximate solutions l j,B and l j,tB . Assuming the private signals ares 4~1 ,{1,{1,{1 ð Þ , and when the second learner breaks the tie she ends up with a 2~1 , we get the l j S w~1 ð Þs and action outcomes listed in Table 1. Given this sequence of private signal, it is more likely the true state of the world is S w~{ 1. However, here we intentionally considered the case where the second learner chooses action a 2~1 so that there is a chance of misleading the rest of the population.

A theorem on new criteria of cascades
Cascades have been a central topic in studies of social learning. With the idea behind l j,B , being to join the majority, the cascading problem becomes to identify the probability of two consecutive steps with the same actions occurring while the actions before the consecutive two steps are evenly distributed. This is in fact the working criteria of cascading used in many works including the well-known work of [3]. In this way, one gets the analytical results of the probability of a cascade before the end of the game N, Instead of l j,B , here we want to discuss the phenomenon of cascading based on our exact l j,A . We will show that the event in which two consecutive learners take the same action is not a sufficient condition for cascading if decisions are made according to l j,A . 1{q ext q ext ). Proof: Cascading happens when a j~1 (a j~{ 1) no matter what value the jth learner's private signal s j is. First, we find the cascading condition for the jth learner. In this proof without loss of generality, we consider only the case of a j~1 . From Eq. (13), we know a j~1 is equivalent to l j,A w0:5. Considering the case s j~1 and s j~{ 1, that condition becomes respectively Therefore, Eq. (29) is again satisfied for pã j s w . From this theorem, we know that cascading happens whenever the action taken by a learner does not depend on its private signal. Furthermore, this theorem shows that whenever that happens, later learners can not change this trend of cascading. We call such a sequence of actionsã as cascadingã. This is different from the prior criteria of cascading [3] being that two consecutive learners take the same action while the actions before the two were evenly divided between the two actions. In fact, if l j,B is plugged into Eq. (29), rather than the exact l j,A , we arrive at the prior criteria. However, when l j,A is used, such criteria is no longer sufficient because even after such ''criteria'' are met it is possible for the next learner to jump out of the old ''cascade''. In fact, this is intuitively why the rate of accuracy based on l j,A is higher than that of l j,B .

Comparison between the exact and the two approximate solutions
With the exact decision making procedure l j,A equipped to every learner, let us now discuss action outcomes of this social learning game. Given values of p and q ext , we define as the rate of accuracy as where the product a j s w~1 when action a j is the same as s w . This r describes average payoff of all learners in a game. Pã s w~1 is the objective probability of the action seriesã as discussed in Eq. (15). Next we will compare this average payoff of the three decision making procedures according to, respectively l j,A , l j,B and l j,tB . We are also interested in the whole probability of cascading towards respectively a N~+ 1, i.e. those action sequenceã satisfying that either l jãj{1 ,s j ð Þw 1 2 or l jãj{1 ,s j ð Þv 1 2 no matter s j~+ 1 for a given jƒN, Cascades mean that a learner's action does not depend on her private signal.
Here, all numerical results will be presented only for the case of s w~1 since the cases of s w~{ 1 and s w~1 are exactly symmetric. For reasons that will be clear later, all reported results in the following are from games with N~30, if not explicitly stated.
From Fig. 1, we can see that there is no difference on respectively the rates of accuracy (a) or the probabilities of cascading(a) at all from all three solutions. In Fig. 1(b), analytical results from [3] are plotted and are shown to exactly agree with our numerical results.
This finding agrees with our example in the previous subsection: Although the numerical values of l j s from the three solutions can be different, the action outcomes are always the same because the three solutions agree with each other on whether l j is larger or smaller than 0:5. Next, we want to further illustrate that in more general situations these different numerical values may lead to different actions.

Minor Modifications of the Game and Difference between the Exact and the Approximate Solutions
We have seen that although both l j,B and l j,tB are only approximate solutions while l j,A provides the exact solution of the mathematical problem of finding l j~P s w jã j{1 ,s j ð Þand these solutions have different numerical values, when applied to real games, there is no difference among the three solutions. Why? The different numerical values agree on being larger or smaller than 1 2 , although their difference from 1 2 are different. Due to this observation, here we propose some slight extensions of the game. The extension is so marginal that in principle, if all of them are proper solutions of the original game, the three solutions should also be proper solutions to the extended game. We will, however, show that on the extended game there are visible differences among results of the three solutions. We introduce a parameter D to represent reservation of a learner to take an action, i.e. given l j~P s w~1 ja 1 a 2 Á Á Á a j{1 ,s j ð Þ , a learner j can make a decision according to We take D[ 0, 1 2

!
. Such a reservation can be related with a service charge of taking actions in real game playing. In that case, allowing the learners to take no actions in their turn when 1 2 {Dvl j v 1 2 zD, i.e. replacing random 1,{1 ð Þwith simply 0, makes even better sense. However, we will not discuss this additional modification or the motivation of introducing this D here. What we want to argue is that nothing forbids the applicability of each of the three above l j s to this extended model if they are indeed applicable to the original one, where simply D~0 and is the only difference between the extended and the original games. From where we stand, we simply want the different numerical values of l j from the three solutions to make a difference on action outcomes. Next, we want to compare results of the three solutions on the extended model with D=0. l j,A is different from l j,B and l j,tB when Dw0 Above we have shown that although the resulted formulae and the numerical values are different among l j,A , l j,B and l j,tB , the resulted actions are not different at all when D~0. Now let us do the same comparison when D=0: Table 2 shows the numerical values of l j s and action outcomes in the case of D~0:1. The third learner chooses to act randomly according to l 3,A and chooses action 1 according to l 3,B and l 3,tB . This randomness in the third learners' actions leads to a chance of correcting the unintentional and undesired actions of the second learner (being a 2~1 ), which will lead to a wrong cascade when solutions l j,B or l j,tB is used. This chance of correction makes it possible to generate higher payoffs for the rest of the population.
We also see manifestation of this difference in simulation results. Where D~0:1, we see that rate of accuracy and the probability of cascading calculated from l j,A , shown in Fig. 2, are different from and in fact higher than those calculated from l j,B and l j,tB . Therefore, action sequences from the strategic state l j,A must be different from those from l j,B or l j,tB .

N~30 is large enough
We have argued earlier that due to the fact that because a cascade quite often happens after only a few learners taking their actions, it is not necessary to run this simulation of social learning games for a very large population size N. To test this further, we plot the rate of accuracy found from the exact solution for various values of N in Fig. 3. This plot shows that the difference between N~25 and N~30 is very small. This confirms that it was reasonable to set N~30 for all previous game results.

Conclusions
In this work, an exact solution to the original social learning game is proposed and logical gaps in the two approximate and commonly used solutions are discussed. To demonstrate that our own solution is not only different in principle but also lead to different outcomes in actions and payoffs when compared with the other two solutions, we modified the game to incorporate parameter D, which stands for level of reservation for risk taking. Our calculations and simulations indeed show that higher payoffs can be achieved in this case when using the exact solution.
With this exact solution, other essential questions in the social learning game, such as conditions of correct information cascades and mechanisms to improve probability of correct information cascades, should all be re-analyzed. Recently, there are extension of the social learning game to consider effect of complex networks [16] and changing environments [9]. Discussions of all such extended models should also in principle be analyzed using the exact solution.
We have also confirmed via the example calculation in Table 2 and the above numerical simulations that action outcomes and thus the received payoffs from l j,A are different from outcomes from l j,B or l j,tB in the extended model.
It is very costly, however, for a learner to really implement the exact solution, in that a large amount of tricky mathematical calculation is required to adopt l j,A . How close are results from our human decision making processes, such as a dynamical process of public opinion, to the exact solution? This is another interesting question. A trivial solution to this is that once learners understand that a more mathematically involved formula is needed to play the social learning game, they can always use a computer to help them in their decision making. However, in the real world, should we expect that action outcomes to be close to those predicted by our exact solution or the other two? This highly non-trivial question remains and should be a question for further investigation.