Between Pleasure and Contentment: Evolutionary Dynamics of Some Possible Parameters of Happiness

We offer and test a simple operationalization of hedonic and eudaimonic well-being (“happiness”) as mediating variables that link outcomes to motivation. In six evolutionary agent-based simulation experiments, we compared the relative performance of agents endowed with different combinations of happiness-related traits (parameter values), under four types of environmental conditions. We found (i) that the effects of attaching more weight to longer-term than to momentary happiness and of extending the memory for past happiness are both stronger in an environment where food is scarce; (ii) that in such an environment “relative consumption,” in which the agent’s well-being is negatively affected by that of its neighbors, is more detrimental to survival when food is scarce; and (iii) that having a positive outlook, under which agents’ longer-term happiness is increased by positive events more than it is decreased by negative ones, is generally advantageous.


Introduction
Happiness and other emotional states play a central role in human existence by mediating the regulation of behavior [1][2][3][4]. While this construal of emotions was originally motivated by classical control theory [5], it fits well within the emerging integrative computational framework for understanding the brain/mind, which holds that minds are bundles of computational processes implemented by embodied and physically and socially situated brains [6].
The computational framework allows one to put forward and test very explicit functional models of emotions. In this paper, we use such a model to investigate, in an evolutionary setting, a series of questions pertaining to happiness. These include (i) the role of balancing momentary well-being against longer-term contentment (the "happiness of pursuit" [7]); (ii) the effects of drawing a contrast between oneself and one's social circle (what economists term "relative consumption" [8,9]; cf. the concept of social comparison [10,11]); and (iii) the adaptive role of differential sensitivity to positive and negative turns in momentary well-being.
Happiness and other emotions are experienced subjectively; indeed, it is the subjective wellbeing (SWB) that psychologists who study happiness require that the participants in their experiments report [12]. To study happiness in simple computational models that are obviously devoid of any phenomenality or subjectivity [13,14], we need an objective "handle" onto emotions, which would make explicit their role in behavior and evolution. For this purpose, it suffices to limit our consideration to the valuation aspects of emotions [12,15]. Our theoretical approach is therefore based on the following set of interlinked premises: 1. Subjectivity, including phenomenal awareness, evolves to serve as an effective tool for connecting action outcomes to motivation [16,17].
2. Subjective valuation and the affective states or emotions that mediate it, including happiness, serve as a key pressure point through which evolution acts on the mind [7,18,19].
5. Happiness-related traits affect the evolutionary fitness of their carriers [26], in ways that depend on physical and social circumstances.
In the remainder of this paper, we report the results of six experiments in which we explored the evolutionary dynamics of happiness as a mediator between action outcomes and life evaluation on the one hand and action selection on the other hand. We begin with a brief review of related work and of the literature that supports our working assumptions.

Related work and the present approach
In this section, we briefly discuss (i) the role of emotions and motivation in driving behavior; (ii) agent-based evolutionary simulation as a tool for studying these topics; (iii) the hedonic and eudaimonic components of happiness and their further factorization; (iv) social context as a key factor; (v) the role of variables that control the temporal dynamics of hedonic and eudaimonic well-being.

Emotion, motivation, action
Behavior is considered to be motivated if it is at least partly determined by its expected consequences [2]. Because the consequences of a planned or future action are not available prior to its execution, the control of motivated behavior involves internal states, which represent goals or expected outcomes. For evolutionary reasons briefly mentioned above, some such states come to be experienced as emotional. Specifically, emotional states, including happiness, convey valuational information, about which the agent by definition cares, and which therefore serves as an effective motivational mechanism for action selection [2,27]. Motivation is influenced by many types of emotions in addition to happiness. Our goal in the present paper is not to compare the effects of different emotions or to model them; rather, we are interested specifically in the effects of hedonic and eudaimonic well-being.

Agent-based evolutionary simulation modeling
While in principle it is possible to study the complex interplay of emotions, motivation, and evolution analytically (e.g., [28]), it is often more practical to do so by resorting to simulation, using an evolutionary agent-based modeling (ABM) approach [29][30][31]. In ABM, simulated actors (agents) carrying various traits of interest share an environment in which they undertake actions and compete for resources; the agent's cumulative outcomes determine its fitness, which in turn affects its chances for reproduction. The effectiveness of traits can then be assessed by tracking their prevalence in the population over evolutionary time.
The ABM approach has been previously applied to the study of emotions and motivation (e.g., [32][33][34]). For instance, Malfaz and Salichs [32] used it to model motivation as a combination of internal drives and external stimuli. The drive variables were energy, thirst, health, sociability, and fear; the external stimuli were water, food, and the presence of other agents.
In an ABM setting, the relative contributions of the various factors that jointly affect behavior can be tuned using any of a number of learning approaches, in particular reinforcement learning (RL) [35][36][37], as it was done in [32]. Because our goal in the present work was to determine the effects of specific combinations of the values of relevant parameters, we chose not to allow our agents to learn; a companion paper (Gao and Edelman, in preparation) will report results from a learning-enabled version of our model.

The factorization of happiness
A distinction is commonly drawn between two major components of happiness, operationalized as subjective well-being: the hedonic component, estimated through responses to questions such as "How happy are you right now?", and the eudaimonic component, based on responses to questions such as "How happy are you with your life in general?" [12,38]. To be useful in an ABM setting, each of these components must be given an explicit mathematical definition in terms of the independent variables of the model.
The precise form of such a definition can itself be the subject of an investigation. For instance, Rutledge et al. [39] considered various ways of quantifying subjects' momentary hedonic SWB H in response to outcomes in an economic game and showed that it is best modeled by combining current task earnings (CR), recent reward expectations (EV), and reward prediction errors (RPE), as follows: where t is the number of days in memory and 0 γ 1 is a forgetting factor that makes days in more recent trials more influential than those in earlier trials. In the present project, we likewise assume that happiness is related to a time average of outcomes (excluding the RL factors, as noted above) and explore the evolutionary dynamics of traits that control the contributions of momentary and time-averaged values.

The social dimension of hedonic well-being
In behavioral economics, it is well known that people's perceived conditions with regard to the so-called positional goods depend on those of their social circle or comparison group [8,9]. Intuitively, subjects can be more or less happy with the same absolute level of a positional good (say, a house of a given size), depending on the levels of their neighbors. The study of Baggio and Papyrakis [33] focused on the effects of this type of social comparison. Specifically, they assumed that the hedonic SWB H of agent k in a given year t depended positively on their own income that year, Y t k , and negatively on the average social income for the same period, Y t , as well as its own income in the previous year, Y tÀ1 k : where α and β are sensitivity parameters in the range of [0, 1]. They then examined the dependence of H on individual income and social comparison (economic inequality) by considering the effects of the sensitivity parameters in an ABM setting.

The dynamic relationships between hedonic and eudaimonic well-being
Intuitively, the eudaimonic SWB, which we shall refer to as E, is expected to be related to the integral of the hedonic SWB, H, but the details of this relationship are up to the modeler. Taking inspiration from Strogatz's model of love [40], Sprott [41,42] offered the following secondorder linear differential equation for cumulative happiness, which he denoted by R: where F(t) is a time-dependent function that quantifies the effects of external events. This equation can be re-written in terms of the momentary well-being, which we call H, in the following Note that this formulation makes explicit the dependence of happiness on outcomes, as well as on its history and changes (the integral and derivative terms, respectively).

Methods
We now turn to the description of our own work, which uses evolutionary agent-based simulation and draws on some of the ideas mentioned above. In this section, we state the details of our model (see Algorithm 1 for a pseudocode formulation and the S1 Appendix for the ODD protocol [43,44] The individual agent: motivation, action, rewards, and well-being We simulate a population of foraging agents, each of which operates according to the actionoutcome-valuation cycle illustrated in Fig 1 (for a more elaborate conception of what the block diagram of an autonomous agent could look like, see [45]). The motivation M of agent k is a weighted sum of its hedonic and eudaimonic well-being, H and E, with the parameter 0 c 1 controlling their relative direct contributions; note that H also contributes to M indirectly, via its effect on E (see below): The effect of motivation is controlled by a threshold θ, which depends on the agent's past motivation: where the threshold for agent k at time t is a weighted sum of past motivation values, γ 2 [0, 1] being the forgetting factor, which gives more weight to recent motivations. When an agent's motivation exceeds the threshold θ, it chooses a more aggressive action, by venturing farther away from its present location, in a random direction. If the motivation is below the threshold, the exploration range is shorter. After carrying out the chosen action, the agent updates H, which consists of two components: f H, based on finding food during exploration, and s H, based on social comparison: where s k , agent k's sociality/food weighting parameter, is in the range of [0, 1]. The food-based component is computed as follows: where F is the number of food units that the action yielded and α k controls the contribution of Motivation prompts actions, which lead to outcomes. Outcomes reap external hedonic rewards (food-related f H, and social s H) and affect reproductive fitness. Hedonic states influence motivation, both directly, with the weight c, and through longer-term ("eudaimonic") wellbeing E, via the weight 1 − c. The parameters Δt and λ control, respectively, the time window over which E is estimated and the relative contributions of positive and negative changes of H (see Eq 9). After a set number of action cycles, each agent in the top half of the fitness distribution is allowed to produce offspring, which form the next generation; agents that belong to the current generation are terminated.
where the trait n k represents the size of agent k's social comparison group. Intuitively, an agent is happier when it is doing better than group average and less happy otherwise. β s is the base hedonic well-being gain from socialization. The agent's eudaimonic well-being E is then computed from its present value of H, the memory of the past values of H extending over a number of cycles, and the rates of rise and fall of H. Specifically, where Δt k is the extent of the memory window for agent k. The step function s p selects the weight assigned to upswings of H and and s nto downswings; p λ k and n λ k are the respective weights. Thus, every individual can in principle value positive and negative events differently. In parallel with computing H and E (which are subsequently fed back and used to determine motivation, as per Eq 4), the agent's fitness is updated, as follows: Thus, the fitness F of agent k at time t is the total amount of food F t k that agent k consumed, less the cost A t k of its actions (with aggressive and conservative actions weighted appropriately) and a base metabolic expenditure β 0 .
At the end of each generation, agents within the top 50% of the fitness distribution reproduce, passing their traits to their offspring, whose number is proportional to the parent's fitness. All agents in the current generation are then terminated.

The simulated environment
We simulated four types of environment, which differed in the spatial distribution of food, as illustrated in Fig 2: random scarce (top left); random average (top right); patchy average (bottom left); and patchy abundant (bottom right). Each 200×200 environment was populated in every generation by 400 agents. At the end of every generation cycle, the environments were initialized anew with their corresponding food distributions.
In each of the experiments described below, the parameter (trait) of interest was discretized as appropriate, so that equal proportions of the population carried each level of the trait; the other parameters were kept fixed. Each experiment was repeated 10 times; each of the Figs 3-13 below shows the means and the 95% confidence intervals for the number of carriers in the population of each level of the trait of interest, plotted against generation number.
The implementation of our ABM model and the code for the experiments are available at https://www.openabm.org/model/4934. In addition, the details of the implementation are described in the S1 Appendix, which follows the ODD (Overview, Design concepts, and Details) protocol [43,44].  In this experiment, we explored the effect of the parameter c, which, as per Eq 4, determines the relative contributions of the agent's hedonic (H) and eudaimonic (E) well-being to its motivation (cf. Fig 1). Specifically, for values of c that are close to 1, the dominant contribution is that of H and the agent is motivated primarily by immediate external outcomes of its actions (that is, food or social rewards). In comparison, for values of c that are close to 0, the longerterm well-being E dominates. To estimate the effect of c over its entire range of [0, 1], we first stepped its value from 0.0 by 0.1, resulting in 11 cohorts of 400/11 % 36 agents each, which had the same value of c in the first generation. Fig 3 shows how the number of agents in each cohort evolved over successive generations; the four panels correspond to the four different map types shown in Fig 2. The data for each panel were generated by repeating the experiment 10 times with same parameter settings; the points and error bars show the means and 95% confidence intervals. Different colors represent different value of c as indicated in the legend.
Because at the end of each generation, only the agents with the higher fitness are allowed to reproduce, the number of agents with different values of the trait c changes over evolutionary time. After about 40 generations, the population sizes begin to stabilize. As Fig 3, top left, suggests, when the food is dispersed and scarce, agents with c = 0.2 predominate. When food is no longer scarce, agents with this value of trait c still have a higher mean population than the rest. When food distribution is patchy, a higher value of c = 0.8that is, more weight given to immediate well-being His more advantageous. When food is abundant (bottom right), agents with c = 0.9 are much more dominant after 40 generation cycles.
To focus on the contrast between lower and higher values of c, we repeated this experiment with an initially dichotomous population in which half of the agents had c = 0.3 and the other half c = 0.7. The results are shown in Fig 4. The fates of agents carrying each of the two traits are now clearly divergent in each of the four environment types. Specifically, when food is random and scarce (top left), having the lower value of cthat is, being motivated more by longer-than by shorter-term well beingis advantageous. When food is random and no longer scarce (top right), there is no clear advantage to either value of c. In the remaining two environment types (bottom panels), agents with the higher value of c dominate. Thus, the evolutionary performance of agents with different contributions of hedonic and eudaimonic well-being to motivation depends on the environment type and resource distribution.

Experiment 2: the effect of the contribution of food to H
In this experiment, we investigated the effects of the amount of hedonic boost that the agent gets from each encounter with food, controlled by the parameter α in Eq 7. To that end, we first explored the general effect of this parameter, by running simulations with six initial cohorts, created by discretizing α into six values in the range of [0.5, 3] with a step size of 0.5.
The results appear in Fig 5. Not surprisingly, in an environment where food is random and scarce (top left), assigning food a larger weight is advantageous. In comparison, when food is abundant (bottom right), this advantage is smaller. Pitching cohorts with α = 0.5 and α = 2 against each other (Fig 6) offers a starker contrast between the evolution of the two populations. The direction of the effect is, however, the same as before.

Experiment 3: the effect of preference of exploration and socialization as a factor in H
The social-competitive aspect of well-being is simulated in our experiments by letting the average hedonic well-being H of the agent's friends contribute negatively to its own hedonic wellbeing. As per Eq 6, agent k's hedonic well-being H depends on trait s k , which determines the relative contribution of socialization and exploration. In terms of the social hedonic well-being, the assumption is that an agent is happier when its H is above the average of its comparison group and less happy when it is below the average. For each agent, there is an individual parameter that controls how much weight it assigns to food vs. social comparison. Agent k's social comparison group is determined at the outset by randomly choosing n k other agents, regardless of their positions on the map. The agent then maintains the same social circle during its lifetime. In the following experiment, the size of social circle n k was set to 8 for all agents. Fig 7 shows the effect of the parameter s k that controls how much social comparison contributes to H. After 40 generations, the population sizes begin to stabilize. As Fig 7, top left, suggests, when food is dispersed and scarce, agents with s = 0.8 dominate. When food is more abundant, the advantage of the trait of having social comparison contribute more towards H is diminished. When food distribution is patchy and abundant (bottom right), agents with s = 0.2 and s = 0.4 dominatethat is, agents that are less prone to social comparison perform better when resources in the environment are plentiful. To focus on the contrast between lower and higher values of s, we repeated this experiment with an initial population in which half of the agents had s = 0.2 and the other half s = 0.8. The results are shown in Fig 8. The fates of agents carrying each of the two traits are now clearly divergent in each of the four environment types. Specifically, when food is random α, which sets the contribution of food rewards to H. As before, the plots show, for each successive generation, the mean cohort sizes and 95% confidence intervals over 10 runs. The results indicate that letting food contribute more strongly to hedonic wellbeing, and hence to motivation, is more advantageous under conditions of food scarcity. and scarce (top left), having the higher value of sthat is, being more socially awareis advantageous. When food is patchy and no longer scarce (bottom left), having a lower value of s is more advantageous. Thus, the overall effect of social comparison seems to be to promote success in harsh environments (at least for the chosen settings of the other relevant parameters, notably, the social circle size n k , for which a range of values around 8 was explored, with similar results). This experiment and the next one (described in the following section) focus on the parameters that control the dynamics of the contribution of hedonic well-being H to eudaimonic wellbeing E. In experiment 4, we studied the effect of the duration of the temporal window over which H is accumulated before contributing to Ethe parameter Δt in Eq 9. The larger Δt, the stronger the smoothing effect that E exerts over H as they jointly influence motivation.
The results of Experiment 4A, in which five cohorts with different values of Δt (1, 2, 4, 8, 16) were tested, are shown in Fig 9. The four panels correspond to the four environment types of Fig 2. As the top left panel shows, when food is scarce, agents that have a longer memory of hedonic states (Δt = 16) do better. Agents with this trait also do better in other map types Social competitiveness (allowing one's social group H to drag down one's own happiness) is shown to play an important role. When food is abundant, agents weight more on foraging is more successful. When food is scarce, agents weight more on socialization emerge as more successful. where food is random-average. However, as more food resources appears on the map, it takes more generation cycles for this trait to dominate. When food is random-abundant, it no longer does better.
In Experiment 4B, the population consisted of two cohorts, one with Δt = 2 and the other with Δt = 8. The results appear in Fig 10. The fates of agents carrying each of the two traits are now clearly divergent in each of the four environment types. Specifically, when food is random and scarce (top left), having a longer memory of past hedonic well-being (Δt = 8) is advantageous. When food is average (top right and bottom left), having a higher value of Δt is still advantageous. However, when food is abundant (bottom right), agents with a smaller Δt do better. In Experiment 5, we investigated the effect of balancing the contributions to E of positive and negative changes in H. In Eq 9 these contributions are represented by the possibly different weights, p λ and n λ, assigned to the time derivative of H. Intuitively, the eudaimonic ("life evaluation") state of an agent with a negative outlook is affected more strongly by a drop than by a rise in H ( p λ > n λ); for an agent with a positive outlook, the relationship is opposite (cf. the computational formulation of optimism and pessimism in [46]). Fig 11 shows the results of pitching against each other two cohorts, one with p λ = 2.0, n λ = 0.5 and the other with p λ = 0.5, n λ = 2.0. The four plots correspond to the four map types of  It is interesting to observe that having a positive outlook has an advantage in all four types of environment that we have tested. This finding may be compared to the positive-mood bias that characterizes the general human population and to the evolutionary accounts that have been offered for this bias [21,47]. In natural foraging environments, agents typically experience both positive events (e.g., encounters with food) and negative ones (e.g., encounters with noxious or poisonous items). In this experiment, we investigated how agents perform in an environment that contains both positive and negative events. Specifically, we focus on the contributions to E of positive and negative changes in H in such environments.
To characterize the relative prevalence of the two types of events, we introduce a variable p, in the range of [0, 1], which controls the proportion of negative ("poison") items on the map. In this experiment, we used maps with p = 0.7, illustrated in Fig 12. The agent's fitness is assumed to decrease by one unit when a poison item is consumed. When p is small, its influence on the convergence of parameters p λ and n λ that control the weight of the rise and fall of hedonic well-being is minimal. For instance, when we set p = 0.1, the pairwise comparison of p λ and n λ yielded results that were very similar to those in Fig 11 in experiment 5. It is interesting to observe that having a negative outlook has an advantage in all four types of environments. This finding suggests that in harsh environments agents with a conservative outlook (larger n λ) have higher fitness.

Summary and Discussion
The evolutionary experiments described in this paper operationalized the happiness of an agent as a pair of state variablesmomentary, H (for hedonic), and cumulative, E (for eudaimonic)that jointly help regulate behavior by mediating between outcomes and motivation. The agent's behavior and interaction with the environment arose from the dynamics of the control loop (motivation to action to outcome and back via happiness to motivation ; Fig 1), along with a handful of parameters. Lifetime outcomes accrued to form fitness, which in turn determined how many, if any, of the agent's clones became part of the next generation.
As an exercise in simulated evolution, this setup is limited in many respects. In particular, the agents' parameters were fixed, making them capable only of what Dennett called "learning by death" [48]. This necessitated a manual search for a viable combination of parameters (so as to avoid unprovoked mass extinctions) prior to running the actual experiments. This, however, turned out to be not too difficult, which suggests that our results are relatively general. More importantly, our working hypothesis regarding the nature of happiness and its role in behavioral control proved fruitful in that it yielded unambiguous and intuitively interpretable results. Stated concisely, our main findings are as follows:  Exp. 3 The effects of the size of social comparison circle: When food is scarce, agents that give more weight to a comparison of their outcomes with those of their "friends" do better. When food is abundant, this effect is reversed.
Exp.4 The effects of memory for past H values: When food is scarce, agents that integrate their hedonic states over a longer time window dominate. When food is abundant, the advantage of having longer memory dissipates.
Exp.5 The effects of the differential contribution of positive and negative changes in H: Agents with a more positive outlook, which attach more importance to upswings in H than to downswings, dominate in all four types of environment.
Exp. 6 The effects of the differential contribution of positive and negative changes in H in environments with both positive and negative events: In relatively harsh mixed-valence environments, agents with a more conservative outlook (larger n λ) have higher fitness.
The general lesson from these findings is that the mix of happiness-related behavioral control parameters that works the best, in that agents that are endowed with it come to dominate the population, depends on the environment. In reality, environmental conditions (such as the amount and the distribution of food) are subject to change, typically on multiple time scales. It would be interesting, therefore, to see what happiness and happiness-tuning traits emerge under various schedules of environmental stress. Note that coping with environmental changes does not necessarily require learning in the phenotype [49], although such an ability (as in, for instance, reinforcement learning [21,36]) may make for smarter and more efficient agents [48].
A more specific lesson that can be drawn from our results is that the effects of attaching more weight to longer-term than to momentary happiness and of extending the memory for past happiness are both stronger in an environment where food is scarce. Furthermore, in such an environment the "arms race" of relative consumption [8,9], in which the agent's well-being is diminished if its neighbors are also well or better off, is more detrimental to survival. Finally, we saw that agents with a positive outlook, whose longer-term happiness gets more increase from positive events than decrease from negative ones, is generally advantageous, except in particularly harsh environments.
On a normative-philosophical note, this set of findings may be loosely compared to the sentiment expressed in Laozi's Dao De Jing under the heading of curbing desire: "The satisfaction of contentment is an everlasting competence" [50]. It may indeed be advisable, at least under conditions of scarcity or adversity, to focus on longer-term well-being or eudaimonia ("contentment") over momentary pleasures and to be less envious of one's neighbors; also, in general, to mark happy events more than unhappy ones.
To make the parallels between evolutionary agent-based simulations findings and psychological (let alone philosophical) works on happiness somewhat less strained, a number of extensions to the present work can be undertaken. Specifically, the studies reported here should be repeated with more realistic agents, endowed with evolvable genotypes and capable of reinforcement learning (e.g., [37]). Such agents should then be faced with changing environments: it would be interesting to see whether mechanisms can evolve that not only use happiness for dynamically controlling actions, but also control the dynamics of happiness in response to environmental and other stress. On the basis of previously offered arguments regarding the importance of "learning the world" [51], we conjecture that agents capable of intrinsically motivated model-based reinforcement learning in particular would, like people, attach no less value to the pursuit of happiness than to its attainment [7].
Supporting Information S1 Appendix. The supporting information follows the ODD protocol. In the document, more details are provided such as the agents' initialization parameters and their values or ranges of values. In addition, the values of the key parameters used in each experiment are presented. (PDF)