Some Work and Some Play: Microscopic and Macroscopic Approaches to Labor and Leisure

Given the option, humans and other animals elect to distribute their time between work and leisure, rather than choosing all of one and none of the other. Traditional accounts of partial allocation have characterised behavior on a macroscopic timescale, reporting and studying the mean times spent in work or leisure. However, averaging over the more microscopic processes that govern choices is known to pose tricky theoretical problems, and also eschews any possibility of direct contact with the neural computations involved. We develop a microscopic framework, formalized as a semi-Markov decision process with possibly stochastic choices, in which subjects approximately maximise their expected returns by making momentary commitments to one or other activity. We show macroscopic utilities that arise from microscopic ones, and demonstrate how facets such as imperfect substitutability can arise in a more straightforward microscopic manner.

If we consider the rewards to be continuous (instead of quantised: delivered exactly when the price is attained) or if we consider expected times spent in work or leisure only, we can construct a budget constraint (BC): the total amount of work ω and leisure l is the trial duration T ω + l = N P + l = T (A-1) where P and N are the price and number of rewards earned, respectively. Note that this budget constraint is linear in N and l.
In general, given that we maximise macroscopic utility (according to the function in Eq.(1)) subject to a BC, we can derive the time allocation which increases with RI and (for s ≥ 0) decreases with price ( Figure 2)

A-2 Micro SMDP methods
We formulate our model as a infinite-horizon (unichain) Semi-Markov Decision Process (SMDP) [1]. A state s contains all the information necessary for making a decision. The subject's next state in the future S depends on its current state S, the action a, and the duration τ a of that action, but is independent of all other states, actions and durations in the past. We further assume subjects jointly choose both the actions and their durations, as in [2][3][4]. We discretize leisure durations to 1s time-bins in our software implementations, but make sure that our results are the same for even finer discretizations. Note that when choosing an action a for duration τ a , the subject commits to executing this action to its completion, rather than choosing actions at each moment in time. An alternative to our model is one in which choices are made at the finest possible temporal granularity rather than having determinable durations. So a duration of leisure is equivalent to a sequence of 'leisure-leisure-leisure' choices. We have called this a 'nanoscopic' model [5], and noted its straightforward formal relationship to our microscopic SMDP model. The distinction between these formulations cannot be made behaviorally, but may be possible in terms of their neural implementations.
A choice rule or policy π([a, τ a ]|S) specifies the subject's probability of taking action a for time τ a in state S. Under a given policy, we can define the expected reward rate, or the average reward per unit where r + t and r − t denote the benefits and costs at time points t . Note that the expected reward rate is independent of the starting state. Supposer + (S, [a, τ a ]) andr − (S, [a, τ a ]) are the expected benefits and costs of taking action a, for duration τ a from state S.
Normatively, a subject should try to (approximately) maximise its expected return. The expected return or (differential) Q-value of taking action a, for duration τ a from state S is is the value of state S, averaged across all actions and their times. The subject forgoes average reward ρ π τ a 1 for taking action a for time τ a [2][3][4].
Other choices of optimization criterion would be possible -notably an exponentially discounted return. However, it is conventional to use the long-run average reward in recurrent problems such as this [2,6], particularly given that macroscopic measurements are themselves typically also couched in terms of rates (in this case, of responding). Further not only does using the average reward obviate the requirement to set the discount rate, but also it is known that sufficiently shallow exponential discounting leads to exactly the same policy as the average reward [1,7].
While simultaneously solving Eqs. (S-3) and (S-4) for the reward rate and the Q-values, we have more unknowns than equations. As conventional, we therefore set the value of one of the states to 0, and solve for the Q values relative to this baseline. The Q values reported here are therefore differential and not the actual ones. We drop differential denotations and simply refer to them as Q-values. Since the policies depend on Q-values, which themselves recursively depend on the policies, except in the case of the optimal policy, we cannot solve for them in closed form. We use policy iteration to find them [1,8].
For simplicity, we assume that if the subject works, then it works continuously for the entire duration of the price, after which it engages in leisure in the post-reward state. The Q-value of working in the pre-reward state then comprises: (i) the reward of reward intensity RI, (ii) an AFR ρ π P , and (iii) the value of the post-reward state Since we define leisure to be possible in the post-state only, we simplify notation by dropping the is an exponential distribution with mean E[τ L ] = 1 β(ρ π −K L ) . Thus, for linear C L (·), leisure bout durations are always exponentially distributed with a mean which depends on the reward rate. The greater the reward rate, the shorter is the mean leisure bout.

A-4 Logarithmic microscopic utility of leisure yields gamma distributed leisure durations
For a logarithmic microscopic utility a leisure C L (τ L ) = (k − 1) log(τ L ); the Q-value of engaging in leisure in the post-reward state is a unimodal bump. The leisure duration distribution is the exponential of this bump: π(τ L ) = 1 Γ(1/βρ π ) τ β(k−1) L exp(−βρ π τ L ). This is a gamma distribution with shape parameter k = β(k − 1) + 1 and scale parameter 1 βρ π . The mode of this gamma distribution is k−1 ρ π . Thus, if the reward rate does not change substantially, neither does this mode. For the special case, of k = 1, the gamma distribution becomes an exponential distribution.

A-5 Reward rate and mean leisure duration for a linear microscopic utility of leisure
For a linear C L (·), the reward rate and mean leisure duration can be analytically, self-consistently derived. The reward rate in Eq.(2) is simply, As discussed above, leisure durations in the post-reward state are exponentially distributed with mean Re-arranging terms of this equation, Equating Eqs. (S-6) and (S-8) and solving for the mean leisure duration E[τ L |post], we derive This is the mean leisure duration as long as RI − K L P > 1/β, and E[τ L ] → ∞ otherwise. When the former condition holds, we may substitute Eq. (S-9) into Eq. (S-6) and solve for ρ π

A-6 Macroscopic utility derived from linear and non-linear microscopic utilities
The point of the utility function in Eq. (11) is to lead to choices whose macroscopic characterization is the same as those of the micro-SMDP. In particular, this means that if we maximize U (l, ω) subject to a budget constraint l + ω = T for some total duration T , then we will recover what we know to be true of the optimum l * P/ω * = E π [τ L |post] = P β(RI−K L P )−1 (Eq.(S-9)). Given the form of the optimal microscopic policy associated with Eqs. (3) and (4), we also require that the Lagrange multiplier ξ associated with Eq.(10) should take on the value ρ * = RI+E π * [C L (τ L )] As required for macroscopic utility functions considered in economics, macroscopic utilities with respect to both work (or rewards) ( ∂U ∂ω ) and leisure ( ∂U ∂l ) are positive. Then, since macroscopic utility is constant on an indifference curve, the total derivative with respect to a good (say leisure) is zero: This shows that indifference curves have negative slopes ( dω dl < 0). The optimum (l * , ω * ) associated with the budget constraint occurs when Consider the case of a linear microscopic utility of leisure, C L (τ L ) = K L τ L . In this case, the optimum π * of Eq.(10) is exponential, implying that π * (τ L )|l, ω = ω lP exp − ω lP τ L whose entropy H(π * ) = log(lP/ω) + 1. Consequently, the derived macroscopic utility function in Eq.(11) becomes Then it turns out that the optimum has just the correct properties in terms of choice. We merely claim that this a possible g(·, ·) -it need not be unique.
If, instead, the microscopic utility of leisure is logarithmic: C L (τ L ) = (k − 1) log(τ L ), then, as in the case for linear C L (·), we can derive E π [C L (τ L )] and H(π) analytically in closed form for policies associated with Eq.(10).
Here Γ(·) and ψ(·) represent the gamma and digamma functions, respectively andk = β(k − 1) + 1 as above. It is easy to see that for the special case of k = 1, i.e. when the gamma distribution becomes an exponential distribution,k becomes simply 1. In that case, H(π) = log(E π [τ L ]) + 1 as above, since all other quantities in Eq.(S-15) vanish. Further, if we considered a general microscopic utility which was a sum of logarithmic and linear components,Ĉ L (τ L ) = (k − 1) log(τ L ) + K L τ L , then we could treat the linear version as a special case by simply setting k = 1. Using the quantities in Eq.(S-15), we may derive a macroscopic utility from a microscopic logarithmic utility to a BC yield the appropriate mean leisure duration: E π [τ L |post] = l * P/ω * when C L (·) is logarithmic, but also reduces to the derived macroscopic utility for a linear C L (·) (when k = 1, see Eqs.(S-13) and (S-14)). Furthermore, as required for self-consistency, the Lagrange multiplier ξ that leads to the policy π * (τ L ) ∝ exp(β(C L (τ L ) − ξτ L )) is the "shadow price" ξ = ρ * = RI+E π * [C L (τ L )] , which is the average reward rate. If we had enforced the full budget constraint via a Lagrange multiplier, the same average reward rate would have been the shadow price for this too, i.e., the extra (macroscopic) utility arising from relaxing the total budget T , (i.e. taking an extra second total time for work and/or leisure).