Estimating a Path through a Map of Decision Making

Studies of the evolution of collective behavior consider the payoffs of individual versus social learning. We have previously proposed that the relative magnitude of social versus individual learning could be compared against the transparency of payoff, also known as the “transparency” of the decision, through a heuristic, two-dimensional map. Moving from west to east, the estimated strength of social influence increases. As the decision maker proceeds from south to north, transparency of choice increases, and it becomes easier to identify the best choice itself and/or the best social role model from whom to learn (depending on position on east–west axis). Here we show how to parameterize the functions that underlie the map, how to estimate these functions, and thus how to describe estimated paths through the map. We develop estimation methods on artificial data sets and discuss real-world applications such as modeling changes in health decisions.


Introduction
In studies of decision-making and health, social influence is becoming increasingly recognized. Coordinated behavior has benefits for groups and the individuals within them. When successful behaviors of the community are socially learned, cooperation can evolve in social networks extending beyond the limits of Hamiltonian inclusive fitness among kin [1][2][3][4][5][6][7][8]. Provided that some fraction of agents learn individually [9], either as ''specialists'' or ''generalists'' [10], social learning can be seen as an adaptive strategy among ''scroungers'' for the exploitation of the information gains made by the ''producers'' who track the environment through individual learning [11][12][13]. Most evolutionary approaches expect the most adaptive state to equilibrate to a mix of individual and social learners whose proportions are dictated by the degree of spatial and temporal autocorrelation of the environment and the cost of individual learning [7,12,[14][15][16][17][18]. This assumption of adaptive equilibrium is an ideal, however, and not necessarily attainable in conditions of continual transition. As social learners increase in frequency, they are increasingly copying from each other, and so the quality of their information about decision payoffs likely diminishes [12]. At the same time, individual learners may be overwhelmed by rapid change, poor information, or simply too much information in order to make informed decisions.
For this reason, there is the important factor of how well informed decision makers are -what we might call the ''transparency'' of payoffs in their decisions. A relevant question about online social media, for example, is whether their searchability makes decision makers more well informed, or whether the deluge of social influence and similar options makes decisions less transparent in terms of payoffs [19]. Traditional decision theory typically assumes that agents are informed about their behavioral options, or if not, then are at least knowledgeable about the people from whom they might learn, preferably the most skilful, informed, or prestigious members of the group [12,[20][21][22]. In contrast, models of collective flocking or herding behavior assume no such knowledge -agents are often represented by vectors, with choice as the direction and transparency as the magnitude. Even as most agents follow neighbors with no particular preference, a collective direction (consensus) can nonetheless favor of the minority, if there exists high transparency of choice [23][24][25].
We see two major factors in decision making: social/individual learning and transparency of choice [19], as depicted in Figure 1. This heuristic map represents the relative magnitude of social versus individual learning on the horizontal axis and the transparency of a decision on the vertical axis. Following the call for evolutionary theory as the integrating principle of behavioral science [26], the map is intended to unify quantitative approaches from multiple branches of social science, ranging from rationalactor approaches in the northwest, to more anthropological sociallearning theory in the northeast, to the ''information overload'' of the southwest and southeast.
At the macro scale, the map reduces the complexity of social decision process analysis to the coarse-grained simplicity of two axes, analogous to a principal components analysis reduced to two dominant factors. The north-south axis, which we parameterize as b t , represents a measure of transparency in the payoff differences among available alternatives, from opaque at the south (b t~0 ) to absolutely transparent in the north (b t~? ). Along the east-west axis, the measured parameter J t increases from west to east, from a decision made individually at the western edge (J t~0 ) to pure social decision making -copying, for example -at the eastern edge (J t~? ).
The framework has broad applicability, and one particular application we envisage is toward fertility decisions for example, using an exceptional long-term dataset on about 250,000 people, collected in the Matlab region of Bangladesh since 1966 [27]. Those data are excellent, long-term monthly records of the decisions that have been made over many years, along with associated (anonymous) details of the individuals making those decisions, such as total fertility, religion, surviving children, age at marriage, household income, education and other observable covariates that impact fertility. We can also consider social variables as well, such as density of the behaviour within the local social network. Other health-related examples would be smoking, where national health services such as the NHS in the United Kingdom or the ALSPAC dataset at University of Bristol hold long-term data on anonymous individuals and their relevant binary choices (to smoke or not, be vaccinated or not), along with a wealth of covariate information on the individuals (wealth, education, religion, and so on) but also often on social visibility (e.g., kin members in the same dataset).
Work on peer effects in smoking behavior is vast, but we have not found any work that attempts to estimate the dynamics of the intensity of choice function, as we propose. Work that is most closely related to ours [28,29] attempts to control for the effects of self-selection into peer groups, correlated unobservables, and contextual effects that tend to bias received estimates of peer effects on smoking (e.g., estimates of our J parameter). This valuable precedent, however, does not estimate the intensity of selection function as we propose.
Our approach could be applied to far different scenarios than health, including criminal records or consumer sales, where longterm choice data are available alongside individual covariates. We see the framework as especially appropriate to online choices in the big-data era, as the covariate data could be comprehensive, including vast records of previous choices. In all cases, the characterization on social influence and transparency of choice would provide a novel insight into the decision dynamics at the population scale.
We previously described the map in terms of generalized data patterns diagnostic of each of its four quadrants [19]. We focused on population-scale data patterns and left specific empirical estimation concerning individuals to future work. Here we show how real-world data could be plotted as locations on the map in Figure 1 and, if the data allow, as trajectories across the map through time. This requires us to develop a method to estimate b and J, either for each agent or for each agent's group, from realworld data. We assume the available data include the (a) covariates that may influence the agent's choice, (b) variability of the agent's choices, and (c) strength of social influences upon the agent's choices. All three of these associations may change through time.

The Model
In parametarizing our two-dimensional map, we divide transparency of choice into separable components for intrinsic utility and social influence. Our model builds on previous work in discrete choice theory by parameterizing the transparency of choice as a function of observable covariates (see Methods). To begin, let there be G groups with I players in each group. We can think of I as being a large number so that the law of large numbers gives a good approximation in what follows. For now, assume the groups are disjoint, i.e., nonoverlapping. Agent i in group g chooses choice k if the random utility agent i gets from choice k is greater than the random utility available from any other choice, at time t for all j=k. Here, symbols with a * denote random variables (deterministic quantities will not have tildes). Assume thatŨ where the U's are deterministic and the random variablesẼ E igtk are Independent and Identically Distributed Extreme Values (IIDEV) across all dates, choices, groups, and individuals. Then the transparency of choice is inversely proportional to how strongly the noise in the payoff is amplified, 1=b igt . We then choose units so the constant of proportionality is one (so that when noise is small, transparency is high). As we will see, this noise can occur in intrinsic utility and/or social utility of the choice. The probability, Pr itg (k), that agent i in group g chooses k is then the term for choice k divided by the sum of terms across all choices, Z: where i, k and g take the integer index values from 1 to I, K t , and G, respectively. The higher the transparency of choice is, the more sensitive the probability is to the utility. Note that when transparency of choice b is zero, utility has no effect on choice, and agents are effectively just guessing among all the choices, i.e., Pr(k)~1 k when b~0. In order to incorporate the ''east-west'' axis of social influence, one option is to add a term for frequency-dependent social learning [30], whereas another is to add a social component by which agent i makes pairwise expectations on choices of j others [31]. Building on both, we define utility with respect to choice k~0 and then divide Pr igt (k) by Pr igt (0), so that the partition function, Z itg , cancels out from equation 3, such that If we then take the natural logarithm of both sides, we are left with the transparency b multiplied by the difference in utility U. We can then expand the utility function into an individual component and a social component as follows: ln for agent i, for choice k, in group g, at date t (recall that P tkg is the fraction of group g that chose k at date t). Table 1 in the Methods section (below) summarizes the different parameters and variables involved. Equation 5 separates, from left to right on its right-hand side, an individual-choice component, bQ 1 , and a social component, bJ. The individual component of choice is governed by Q 1 and acts on the payoff difference between options (x 1 {x 0 ). The social-influence component, governed by JQ 2 , acts on the popularity of the option (P tkg {P t0g ) that is expressed as the relative popularity of choice k compared to the choice of reference k~0. Although the transparency of choice parameter, b, is part of both the individual term and the sociallearning term, our map depicts the transparency of choice and social influence as orthogonal dimensions.
The model is intended to allow the estimation b and J from the data and potentially map a trajectory through time for agent i in group g. The transparency of choice, b, increases from south to north on the map and the social influence, J, increases from west to east ( Figure 1). We may estimate b itg~b (h,z itg ) once we have adequate time series data set on a vector of covariates, fx itg g, and we have parameterized the transparency of choice function, b itg~b (h,z itg ). The parameter vectors h and Q 2 can be normalized to fit the functional specification, b itg~b (h,z itg ), and social-utility function, J(Q 2 ,y itg ), respectively (discussed below).
The covariates for each agent include those that predict the propensity of the behavior, denoted by fx igt g, those associated with the presence of social influence fy itg g, and those that relate how variable the choices were through time, fz itg g. These realities are amplified by Q 1 (individual) and Q 2 (social), which govern the sensitivity to inherent differences of the choice and social influence, respectively. In other words, the parameter vectors h, Q 1 , and Q 2 operate on aspects of the real world denoted by positive scalars x, y, and z, respectively. Estimating the parameter vector Q 1 determines the individual sensitivity to differences in choice (x 1 {x 0 ). Estimating the parameter vector h, along with the scalar observable z, determines the transparency of choice, b. Estimating the parameter vector Q 2 specifies the social-influence function, J(:).
We tested to see how these estimates can be used to describe a path, fb(h,z itg ), J(Q 2 ,y itg )g, through the map for each agent i in each group g for which we have data at date t.

Results
We generated artificial data to yield four different paths through the map to test whether our suggested estimation procedure actually works (see Methods). We can use equation 5 for a log odds regression, where P tkg is the fraction of group g that chose k at date t. We then specified the social-influence and transparency of choice functions as follows, We can now use this parameterisation to explore how the parameterised map applies to artificial datasets (see Methods on how these data were generated). In Figure 2, we show simulations of the binary choice (e.g., to have a child or not) with Q 2~1 , Q 1~1 , and h~5 from equation 7, with group size G~100 and M~100 agents per group. We then vary the initial starting proportions of the 10,000 agents (over all groups) choosing one (blue) versus the other (red). We can see that all simulations converge to nearly 100% of agents choosing the blue option in fewer than ten time steps, even if we start with a majority choosing red at the starting point. This is what we expect when both Q 2 is positive and h is high and positive -the population selects the option with the better payoff.
The specification in equation 7 allows other variations that yield more novel results. To convey the effects of varying h and Q 2 , Figure 3 shows the change in behavior for a binary choice under different values of h and y 2 (for clarity, Figure 3 shows just the proportion of one of the two choices). In varying these two parameters, we find variation not only in final outcomes after 30 time steps, but in the dynamics of choice as well ( Figure 3). When Q 2 is negative, for example, we move toward social independence, and, for positive Q 2 , decisions tend to be made socially. Similarly, when h is negative, we move south toward ambivalence, and, as h increases, we move north toward a transparency between the binary options. If y 2 is low and h high, a member (or a whole group) might be able to choose something different from the norm. This event, however, becomes rarer as social influence increases.
We then used the modelling to explore how the parameters b and J can be estimated from the simulated data. Figure 4 shows how some estimates of the parameters, based on the data generated via simulation, vary as we move along the axis on the map displayed in Figure 1. We use a nonlinear least squares (NLLS) method to estimate y 2 and h; for y 2 large (Figure 3, bottom row), for example, estimating h accurately is nearly impossible because there is little variation in behaviour as the social dominance of the group dominates individual utilities.

Discussion
In these tests, we found that estimation is reasonable for y 2 but less precise for h. We see the source of this ''weak identification'' problem in equation 5 where, because b multiplies w 1 and J, there can be difficulty in disentangling parameters in b from w 1 and parameters in J unless we have the right kind of specifications of b and J as well as variation in the observables that go into estimating their parameters. In equation 11 we can also see the challenge in disentangling the size of b for the size of the variance of the random variable on the right-hand side (multiplying numerator and denominator by a scalar cancels out). This suggests that the variance of the numerator has to be normalized to one, say, in order to identify parameter theta in b.
We note that the inability to correctly estimate h for large values of y 2 is not important because the uncertainty around h when y 2 is large shows that payoff/costs are irrelevant when social influence is extremely high. In future work we will focus on how to better estimate h. Moreover, as we focus on Q 2 and h, we may   consider that Q 1 is unnecessary. We prefer to keep Q 1 embedded in the model, as it allows for a priori assumptions regarding the strength of individual vs social learning -from Q 1 v0 (aversion to choice 1) to Q 1~0 (no individual bias) to Q 1 w1 (bias towards choice 1). Further, removing Q 1 would change the inference on b and J; that is, e azbx and ae bx can be equivalent representations when a and b are positive, but otherwise the relevance of a depends on the magnitude of x in the former and allows negative outputs in the latter.
The simple bi-axial map of behavior in Figure 1 aims to extract from aggregated data the transparency of decisions (north-south) and the extent to which a behavior is acquired socially versus individually (east-west). We have proposed a means by which to parameterize the functions that underlie the map and thus estimate paths through it. Rather than assume how well agents are informed in their learning, we can let transparency of choice be a variable parameter in our models, with the aim of using the models to infer transparency of choice from real data [19,23]. For example, we might have a vector h such that the transparency of choice would be parameterized as the specification b~h 0 zzh 1 .
A hypothetical real-world example for z might be the fraction of unvaccinated in group g at time t. We would consider an idealized binary choice of whether or not to get vaccinated at time t, where k~1 would designate vaccinated and k~0 designate not vaccinated. In this example, b grows linearly with zh 1 , which would imply that the more unvaccinated there were in the group, the more transparent the decisions would become about vaccination. Other binary-choice examples might include whether or not to use contraception, smoke, use hand sanitizer, or perhaps the fertility decision of whether or not to have a child.
Social-influence studies that treat transparency of choice as a variable suggest that it has a complex interaction with social learning. Some of this interaction might be captured, for example, by a specification such as J~Q 20 zyQ 21 , by which, assuming Q 21 is positive, the presence of social influence increases with the scalar y. Hypothetical real-world examples for a group-specific y include the fraction in the group that are high income or perhaps the Gini coefficient of the group. Given the formulation for J, the scalar factor y then affects the social influence associated with other observables.
These parameters relate to debates on modelling fertility decisions, for example, as explanations range from an intrinsic individual utility decision [27] versus the social influence of the frequency of a particular fertility level in their local community [32]. For example, it may be that poor, uneducated women living in a wealthy, educated group tend to adopt the low-fertility level of the group rather than the higher fertility that would otherwise be associated with their low income and low education as individuals [32]. In this case the social-choice transparency, Q 21 , might reflect the tendency to have the same number of children as other mothers, whose success and/or education has become more socially visible.
Fertility research also generates the sort of long-term, timestratified demographic datasets that are appropriate to our proposed estimation method for the map. More generally, the growth of so-called ''big data'' on collective decisions also seems suited to this map [19], which links different scales of analysis, such as the microprocesses that produce observed scaling relationships in social-network formation [33].
As new digital technologies filter and search social influences and information, transparency of choice may be increased, but conversely if agents are overwhelmed the online deluge of information, options, and social influences [34,35], then transparency of choice, b t , may decrease (by decreasing h and/or z igt ). This may be central to herding effects in online product ratings [36], for example. Also, the transparency of payoffs may well be changing for many health decisions -the rapidly changing conditions of the modern world may effectively lower b as the connection between the decision and its actual future payoffs are obscured by the ''noise'' of socio-economic change. Seemingly straightforward social interventions may therefore have unanticipated consequences [37].
The dimensions of the map are also relevant to studies that compare technological complexity with population size [38][39][40][41], which assumes relatively transparent individual and social learning. Adding agents who are uninformed (payoffs not transparent) tends to cause a group consensus to regress to a single mode [24,42]. When it is much easier, and less costly (essentially free), to see what others do, then the balance could shift to the east and south. When survey respondents, for example, can see the aggregated guesses from other people, they simply change from their original, individual guesses in linear proportion to the distance from the group mean [43].

Conclusion
Having presented a two dimensional map ( Figure 1) as a schematic abstraction of human decision-making [19], we have now gone further toward making this into an empirical tool to project population-scale decision data onto axes of social influence and transparency of choice. As the decision maker proceeds from south to north, the precision of understanding which choice is best increases. As the decision maker moves from west to east, the strength increases of social influence or peer group influence on which choice is best. Starting with a basis in discrete-choice modeling with social influence, we have discussed how a path through the map for a group of decision makers can be estimated from data sets. Through experiments with artificial data sets, we showed how the suggested estimation methods work and how parametric specifications can be estimated. For smaller datasets, we recommend maximum likelihood as the best way to estimate a path through the map, and then for larger datasets it becomes possible to use NLLS as the estimation method.
The map can now be applied to real-world case studies, especially those that feature large, time-stratified demographic data sets on binary decisions, such as those regarding health decisions. The parametrization we have presented allows us to extract, from these sorts of datasets, locations on the map representing degree of social learning and transparency of choice. As we apply this method in the future, we may be surprised to find that standard, universal assumptions regarding certain decisions may be becoming less appropriate, as the nature of such decisions changes through time or in different cultural contexts.

Artificial data generation
We generated data for T periods, G groups with M members each as follows. We first generated a random noise component, E i1gt {E i0gt , for each agent choice over the time span, which means T|G|M logistic random variates with mean 0 and variance s 2 . We then simulated variability in payoffs and social influence for all of these choices as well. In doing so, we generated three sets of T|G|M normally distributed random numbers, each with time varying means and variance, one set for the x's, one set for the y's and one set for the z's. In this case we allowed the means of x,y, and z to increase over time, by choosing x i1gt and x i0gt from normal distributions with mean t=10 and variance 0.1, and choosing both y igt and z igt from normal distributions with mean 10t and variance 1. In generating these artificial data sets, we found that our simulation outcomes were well determined after T~20 time steps, during which the effect of varying group size and members per group was minimal when both G and M are greater than 100. We then specified the social-influence and transparency of choice functions as indicated in equation 7.

Functional forms and estimation
In order to identify parameters of the model that describe movements across the map, we need to separate transparency of choice from social influence. To do this, we start with the northsouth axis of the map (the transparency of choice) and then add, via the east-west axis, social influence on individual choices. We can start with the north-south axis. In discrete-choice theory, we assume we have a certain number of choices available and a certain amount of utility that is divided up among those choices. We then effectively toss the choices randomly into bins of a certain utility and find how many choices we expect in each bin. To start, consider a population of individuals making a binary choice ({1 or z1), each seeking to maximize payoff function U: in which k i represents the binary choice and X i represents covariates of agent i such as family, peer group, previous choices, or education level. The parameter E i represents idiosyncrasies, which are treated as random, even if privately sensible to each individual agent. Following [31], we will assume that the values of E i are what are known as Independent and Identically Distributed Extreme Values (IIDEV). For a given individual, a standard approach assumes the probability to make a particular choice is equivalent to the probability that the difference in idiosyncrasies, E i,z {E i,{ , is less than some threshold z: where b i is transparency of choice for agent i. To illustrate, Figure 5 shows, for two values of b i , how the probability that option {1 (versus option z1) is chosen depends on this payoff difference E i,z {E i,{ . The probability transition is more abrupt or decisive for the higher value of b i (farther north on our map), representing greater transparency of choice. Our use of the Fermi/Boltzmann function as our equation (9) is fairly standard in discrete-choice theory, but it is also seen in some studies of evolutionary games in finite populations, in which a ''temperature,'' or ''noise,'' parameter is varied (e.g., taken to zero) in order to characterize the equilibrium in terms of cooperators in the population. The same function has been used, for example, to model the probability of outcome between two randomly selected individuals playing Prisoners Dilemma or related pairwise game [44,45]. In that case, the parameter (analogous to temperature in the Boltzmann function) is intensity of selection rather than our transparency of choice, which operates on the payoff difference. The difference in our approach from game-theoretic approaches is twofold. First, rather than play pairwise games, agents choose among available options and the model relates how well choice popularity corresponds to covariates among the individuals of the population. Second, our focus is on econometric identification of parameters and estimation of parameters as well as on the ability to retrieve the model parameters from noisy data. In particular, we are interested in estimating the intensity of selection as a function of observable covariates. This effort appears new to the literature on estimation of social influences on choice, and it raises difficult identification issues that we have addressed through simulation methods. We developed this approach in order to show that our method works before applying it to field data.
Although this established approach does not model social influence directly, it has been used as a baseline to infer it from appropriate datasets. Aral et al. [46], for example, applied this to daily data on the social network links and the date when individuals downloaded a certain mobile-service application (app). Aral et al. [46] considered individuals of similar propensity, p it , to have adopted the app by time t, which for agent i was estimated using a logistic regression equivalent to: where b it functions as transparency of choice and X it is a vector of observable characteristics and behaviors for agent i at time t (we have subsumed one of their other parameters into the idiosyncrasies term E it ). Having collected data over a 4-month period, Aral et al. [46] were able to distinguish homophily -the tendency of similar individuals to associate with each other -from genuine influence (roughly 50/50 in their final estimation). Now, to build from this background to an explicit consideration of social influence and transparency of choice, suppose that we have G different subpopulations, each with I individuals. Within each population g~1,2, . . . ,G individuals are considered potential peers. Suppose we have observed covariates for all dates t and each agent i in every group g, as well as the estimated propensity to be vaccinated, denoted by fx igt g. Also suppose, based on previous studies, we have another set of social-influence covariates, fy igt g, on agent i and group g at date t, and yet another set of covariates, fz itg g, that relate how variable the choices were through time.
Our goal is to plot J(Q Q 2 ,y igt ) and b(ĥ h,z igt ) for each agent i for each date t for each group g on the map, by which we could describe a temporal path for each agent i in each group g. We start with the scalar covariate case where choice number 1 is made over choice 0 at date t. This happens when the utility of choice #1 versus choice #0, comprising the individual and social-choice components, exceeds the random variable given the choice transparency, i.e., where, again, the covariates, fx,y,zg, are all positive, onedimensional scalars. Note that Q 2 and h can be either positive or negative. Table 1 (below) summarizes the different parameters and variables involved. The parameter vector, (Q 1 , Q 2 ; h), represents the intrinsic and social sensitivities, given the transparency of payoffs. The estimates of these from the data set are denoted (Q Q 1 ,Q Q 2 ;ĥ h)we use ''hats'' to denote estimates. With sufficient data on x,y, and z from a particular case study, we can estimate the parameter vector, (Q 1 , Q 2 ; h), of the structural model in equations 3 and 5 using the observed fractions fP tg1 g of vaccinated individuals in group g at date t (recall that the estimates are denoted byĥ h,Q Q 1 ,Q Q 2 ). The model predicts that agent i in group g gets vaccinated at date t if the difference in noisy payoffŨ U is greater than zero. In other words, the probability of positive payoff for vaccination, PrfŨ U i1gt {Ũ U i0gt w0g, is equivalent to the probability favoring the intrinsic plus social payoffs of parenting over the random idiosyncrasies of choice: Here, F (x): PrfẼ E itg0 {Ẽ E itg1 ƒxg is the cumulative distribution function of the random variable,Ẽ E itg0 {Ẽ E itg1 .

Functional forms for b,J
To specify b(h,z igt ) we might start with the simple specification b(h,z igt )~exp (hz igt ), with z igt representing the variability of choices through time t. With this specification, the larger b(h,z igt ) is, the less variable the choices of agent i are through time.
We can then discuss several different specifications of the socialinfluence function J(Q 2 ,y igt ). To work within the borders of the map we might, for example, specify the social-influence function as J(Q 2 ,y igt )~minf0,Q 20 zQ 21 y igt g: ð13Þ This function allows J(Q 2 ,y igt ) to take the value zero with positive probability, and we require the function J(Q 2 ,y igt ) §0, i.e., to not allow J(Q 2 ,y igt )v0, so that the farthest west part of the map corresponds to J(Q 2 ,y igt )~0. In the absence of social influence, the value of y igt~0 . If we assume that J(Q 2 ,y)~J( Q Q 2 ,y) for an open set of y's implies Q 2~ Q Q 2 , and that b(h,z)~b( h h,z) for an open set of z's implies h~ h h, then the absence of social influence y igt~0 implies Q 20 ƒ0 and Q Q 20 ƒ0: Further, we see that if the data have a wide enough range over individuals, groups, and dates, of values of y igt , it must be the case that i.e., social influence is zero for all values of the parameters that cannot be specified by the data alone. This level of identification can be enough when we simply want to determine the strength of social influence over time for different individuals and groups in different choice settings. We discuss another specification of the social-influence function J(Q 2 ,y igt ) in Section 5, where we test the estimation procedure on artificial data.

Estimation methods
In order to consider two popular estimation methods, Maximum Likelihood (ML) and Non-Linear Least Squares (NLLS), we consider a binary decision via equation 12, with the probability statement where F(:) is the cumulative distribution function of the random variable,Ẽ E itg0 {Ẽ E itg1 . We have added a constant term, Q 10 , and a slope term, Q 11 , in equation 16 to capture variation among agents in how they respond to different alternatives, independent of social influences. Now, denote by S igt Ef1,0g the random variable, which is either 1 if agent i in group g succeeds (or chooses ''yes'') at date t or 0 if agent i in group g fails (or chooses ''no'') at date t. From equation 16 we can write the likelihood function for the probability of Q 1 , Q 2 , and h, given the real-world data ( [47], section 17.3), which relates to how well the model predicts all the observed successes (yeses) and failures (nos): where F(:) is the cumulative distribution function of the random variable,Ẽ E itg0 {Ẽ E itg1 , and In the standard versions of estimation of discrete-choice models, the transparency of choice, b(h,z igt ), is typically assumed to be constant (absorbed into the other parameters by a normalization convention). We are interested in how b varies, however, so we must modify the conventional textbook approach [47]. One popular way of proceeding is to formulate dynamic discrete-choice models [48], which often use Markov chain formulations or hazard-function formulations. However, for simplicity we wish to remain as close as possible to the static framework with independent stochastic drivers. Therefore, we shall work with the likelihood function (equation 3.7) above, where the ultimate stochastic drivers are IIDEV across individuals, groups, and dates.
In the scalar case, formulas for the partial derivatives of the likelihood function with respect to Q 10 ,Q 11 ,Q 2 ,h are straightforward. The maximum-likelihood estimator, at the peak of the likelihood function, is found by setting these four partial derivatives of the likelihood function equal to zero. This yields four nonlinear equations in four unknowns. When one takes these four partial derivatives and sets the four resulting equations equal to zero, one will see that when, for some reason, the socialinfluence function is always zero, then the pair Q 10 ,Q 11 is determined only up to scale. The nonlinearity of the social influence helps resolve this particular identification problem.
If the social-influence function is zero, however, or restricted to be zero, we normalize the four equations by dividing the equations by Q 10 =0, which can be further simplified by setting Q 10~1 and solving the remaining three equations for Q 11 ,Q 2 , and h. Three nonlinear equations in three unknowns is still more challenging than simple Ordinary Least Squares regression analysis.
Packages such as Matlab or R are good for general ML estimation, which works well when we have few observations per cell and enough observations per cell to allow for logistic regression [49]. Also, ML estimation does not assume that errors follow a specific distribution, whereas NLLS assumes normality of the errors, and one can use least squares estimators as starting points on the ML solver. ML estimation is less demanding of data sets than NLLS [47, chapter 14], but in cases where there is a large amount of data, we can also consider NLLS estimation methods that require larger data sets. Equation 5 suggests the NLLS regression equation of the log odds of agent i in group g of choosing choice 1 rather than choice zero at date t,

ln
Pr igt (1) Pr igt (0)~b (h,z igt )f½Q 10 zQ 11 (x i1gt {x i0gt )z in which the right-hand side again consists of transparency of choice multiplied by individual, social, and noise components (note that the noise terms E in equation 19 are part of the standard regression equation framework and are not the same as the terms in the logit equations above). Here, we assume the standard regression orthogonality condition on the regression errors, . To avoid some problems of large sample size and data overflow, an improved NLLS estimation process might use a growing window in time, i.e., start with points corresponding to time 0ƒtƒ5 and locate a plausible region on the parameter space, add more points for time 0ƒtƒ10 and update, and so on. Consider the functional-form specifications for the transparency of choice function and the social-influence function, b(h,z igt )~e hz igt J(Q 2 ,y igt )~e Q 2 y igt : is the prediction error of the model. In other words, NLLS chooses the parameter vector to minimize the sum of prediction errors. Taking the four partial derivatives of SSE with respect to Q 10 ,Q 11 ,Q 2 ,h, and setting all four of them equal to zero, we have the four following four nonlinear equations in four unknowns: LSSE LQ 10~X iEg,g,t g igt ({e hz igt )~0 LSSE LQ 11~X iEg,g,t g igt ({e hz igt x igt )~0 LSSE LQ 12~X iEg,g,t g igt f({y igt e Q 2 y igt (P t1g {P t0g )e hz igt x igt )g~0 LSSE Lh~X iEg,g,t g igt f({z igt ½Q 10 zQ 11 x igt ze Q 2 y igt (P t1g {P t0g )e hz igt )g~0 ð25Þ We can see that if for some reason the term P 1gt {P 0gt is always zero, then the pair of parameters, Q 10 ,Q 11 , of the direct utility difference is determined only up to scale. To put it another way: if we set the social-influence function J(Q 2 ,y igt ) equal to zero, then the parameter pair, Q 10 ,Q 11 , is not identified, i.e. any parameter pair lQ 10 ,lQ 11 will solve the last equation of 25 with the third equation dropped for all values of l. We resolve this problem if it occurs by ''normalizing'' by setting Q 10~1 and dropping the first equation of 25. We recommend the same procedure for the ML estimation above.