Figures
Abstract
For optimal stopping problems with time-inconsistent preference, we measure the inherent level of time-inconsistency by taking the time required to turn the naive strategies into the sophisticated ones. In particular, when in a repeated experiment, the naive agent can observe her actual sequence of actions that is inconsistent with what she has planned at the initial time, and she would then choose her immediate action based on the observations of her later actual behavior. The procedure is repeated until her actual sequence of actions is consistent with her plan at any time. We show that for the preference value of cumulative prospect theory, in which the time-inconsistency is due to the probability distortion, the higher the degree of probability distortion, the more severe the level of time-inconsistency, and the more time required to turn the naive strategies into the sophisticated ones.
Citation: Hu S, Zhou Z (2024) From time-inconsistency to time-consistency for optimal stopping problems. PLoS ONE 19(11): e0310774. https://doi.org/10.1371/journal.pone.0310774
Editor: Vijayalakshmi Kakulapati, Sreenidhi Institute of Science and Technology, INDIA
Received: June 7, 2024; Accepted: September 5, 2024; Published: November 12, 2024
Copyright: © 2024 Hu, Zhou. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: The funding of National Natural Science Foundation of China (Grant No. 12271462) and Science, Technology and Innovation Commission of Shenzhen Municipality (2022 College Stable Support Program).
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Optimal stopping problems arise in many economic and financial decision-making scenarios—for example, when an entrepreneur completes a project, an investor sells stock, or a gambler quits gambling at the casino. The stopping decision depends on the preference of the entrepreneur, the investor, or the gambler. In a dynamic decision process, the agent plans a sequence of actions to be taken at each time point. Then, it is possible that later when the agent revisits the same optimal stopping problem at a different time than the initial time, she finds that her previously planned sequence of actions is no longer optimal according to her current preference. This phenomenon is called time-inconsistency.
Time-inconsistency can be observed in many dynamic decision problems. In the casino gambling problem, for example, a gambler may first plan her strategy as follows: She would continue gambling if she gains and stop gambling if she loses—termed the “loss-exit” strategy. When she does gambles, her actual behavior may be opposite to what she has planned: She stops gambling if she gains and continues gambling if she loses—termed the “gain-exit” strategy. This is because her preference characterized by cumulative prospect theory leads to time-inconsistency. See [1–3] for more detailed discussions. Other possible scenarios that induce time-inconsistency include mean-variance preference, state-dependent reference, or non-exponential discount factors.
[4] concludes that there are three approaches for dealing with time-inconsistency. First, the agent ignores the time-inconsistency issue and derives the optimal strategy at the initial time to be her strategy. This is called the pre-committed type. Second, the agent constructs a strategy such that her current decision is the best based on expectations on her future decisions. This is called the sophisticated type. Third, the agent continuously derives her current best strategy and deviates from what she has planned before. This is called the naive type. The classification of three types of agents parallels the one used in the literature on hyperbolic discounting, which is one reason for the time-inconsistency; see [5, 6] for more discussions on three types of agents.
A large literature studies the behavior of three types of agents in various decision problems, but the connections between them and even possible transformations into each other remain unclear. [7] first prove that in a one-dimensional diffusion process when the payoff functional satisfies regularity conditions, the sophisticated equilibrium of the stopping problem can be obtained as a fixed point of an operator, which represents strategic reasoning that takes the future selves’ behaviors into account. In other words, the strategic reasoning may turn a naive agent into a sophisticated one.
In this work, we consider the time-inconsistent optimal stopping problem with randomization in a discrete-time setting. In particular, we propose a measure of time-inconsistency by measuring the gap between the naive strategy and the sophisticated strategy. In a finite time horizon, we use the binomial tree to describe the underlying state process, where each node represents a pair of time and state. The sophisticated strategies can be backward derived from the terminal time to the initial time, whereas naive strategies are derived by optimizing the stopping problem at each node. By taking the naive agent’s actual behavior into consideration, the agent’s strategies eventually match with sophisticated strategies after several rounds of training on strategic reasoning. In the example of cumulative prospect theory preferences (CPT)—one of the well-known non-expected utility theories, the higher the degree of probability distortion, the more severe the level of time-inconsistency, and the more time required to turn the naive strategies into the sophisticated ones; details are shown in Section 4.
According to the time-inconsistency measure, we design an algorithm to transform the naive strategy into the sophisticated one. The algorithm can be applied to any time-inconsistent stopping problem. In addition to the cumulative prospect theory preferences—where the time-inconsistency is due to the probability distortion—we also consider the present-biased preferences—where the time-inconsistency is due to the non-exponential discount factor. We derive analytical results on how many rounds are needed to achieve the sophisticated strategy from the naive one in the optimal stopping problem with immediate cost and with immediate reward. The transformation on the stopping strategies can also be made in other types of time-inconsistent stopping problem.
In addition to the above mentioned literature, our work is also related to the following literature on time-inconsistent optimal stopping problem with cumulative prospect theory: [8–13]. The key difference is that these literatures study the optimal stopping problem in continuous time, while our focus is on discrete time. There is also extensive literature on general optimal stopping problem, e.g., [14, 15], etc. The following literature discusses the general time-inconsistent decision problem: [16–25], etc.
The rest of this paper is organized as follows. In section 2, we introduce the model of the optimal stopping problem with randomization, time-inconsistent preferences, and the different types of agents. In section 3, we establish iterations to turn the naive strategies into the sophisticated ones as a measure of time-inconsistency. We illustrate the iteration procedure in section 4, using CPT as an example of time-inconsistent preferences. Section 5 presents the analytical results of transforming the time-inconsistent strategies into time-consistent ones under the present-biased preferences. Section 6 provides the conclusion.
2 Model
2.1 Optimal stopping
Consider an optimal stopping problem faced by an agent in a discrete-time simple symmetric random walk. Let Δ0,0 stand for the set of all feasible time-state pairs in the simple symmetric random walk up to time T. Consider Markovian stopping strategies with external randomization allowed. At time 0 with initial state 0, the agent determines her stopping strategy by choosing a sequence of actions , where a0,0(t, x) ∈ [0, 1] stands for the probability to stop at node (t, x). If a0,0(t, x) = p for some p ∈ [0, 1], the agent tosses a (biased) coin to determine whether to stop or not. If the coin lands on heads, the agent chooses to continue; otherwise, she chooses to stop. The probability of tossing a tail is equal to p. In particular, if a0,0(t, x) = 1, the agent chooses to stop for sure at node (t, x); if a0,0(t, x) = 0, she chooses to continue for sure at node (t, x). See Fig 1 for illustration.
The pair above each node stands for the current time and state.
Suppose that the agent tosses a head at time 0, which means that she would continue. Then, at time 1 and state j, where j ∈ {−1, 1}, let Δ1,j stand for the set of all the remaining feasible time-state pairs in simple symmetric random walk after time 1 and starting at state j, up to time T. Suppose that the agent revisits the problem to choose her action, regardless of a0,0(1, j)—the plan she made at time 0 for node (1, j). Again, at time 1 she chooses a sequence of actions denoted by , where a1,j(t, x) ∈ [0, 1] stands for the probability to stop at node (t, x) according to the plan made at time 1. This pattern continues until the terminal time T > 0 if the agent has not stopped yet.
2.2 Time-inconsistent preferences
Let Vt,x(a) denote the preference value of the agent at time t and state x when applying a sequence of actions a = {a(t, x)} afterward. The preference value function V is called time-inconsistent if there exist (t, x) and (t′, x′) with t < t′ and (s, y) ∈ Δt,x ∩ Δt′, x′ such that
where
is the optimal action sequence planned at (t, x), that is,
where
stands for the n-dimensional vector taking values in [0, 1] in a point-wise manner, and |A| is the number of elements in set A. In other words, as long as there exist certain differences in the actions between those that are planned at different times, the preference value V is time-inconsistent. It is straightforward to verify that the total number of nodes in the set Δt,x is (T − t + 1)(T − t + 2)/2. Note that there exists a one-to-one correspondence between the elements in the |Δt,x|-dimensional vector a to the action a(s, y) taken at node (s, y) for some value s ≥ t and y ∈ {−s, −(s − 2), …s − 2, s}. In particular, with current time t and state x, the j-th element in the vector a corresponds to at,x(s, y), where s is such that (s − t)(s − t + 1)/2 < j ≤ (s − t + 1)(s − t + 2)/2 and y = s − 2j + (s − t)(s − t + 1) + 2. Hence, we do not differentiate the vector a and the action sequence {a(t, x)} in the following.
Many situations arise in which the preference value is time-inconsistent. For example, if the preference is mean-variance, it is time-inconsistent because the variance term is nonlinear in the probability distributions. It may also be due to the non-exponential discount factor, which makes the preference time-inconsistent. On the other hand, some behavioral preference such as cumulative prospect theory has a probability distortion component in its preference value. This makes the preference value nonlinear in the probability distributions; therefore, the CPT preference is time-inconsistent.
2.3 Agents
Regarding the time-inconsistency issue, naturally, one may raise the question of which plan is going to be adopted by the agent. According to whether the agent is aware of time-inconsistency and whether the agent can commit herself to a predetermined plan, we classify the agents into three categories: naive, sophisticated, and pre-committed.
2.3.1 Naive agent.
A naive agent does not realize the time-inconsistency. At any given time the naive agent just seeks an optimal solution at that moment, but she is only able to implement this solution at that moment. In other words, the naive agent continuously deviates from the plans that were made by herself previously. Her actual stopping strategies are called naive strategies.
At time 0, the naive agent determines her plan at that time to be that solves
She then takes action . If she does not stop at time 0 and continues to time 1, then at time 1 with state j ∈ {−1, 1}, she disregards
—the action she should take at that time according to the optimal plan
made at time 0. Instead, she determines her strategy at that time to be
that solves
Then, her actual action taken at time 1 with state j is . Under a time-inconsistent preference,
may be completely different from
. The naive agent continuously seeks the optimal solution to be her action taken at time t until terminal time T if she has not stopped yet. Her actual strategy is as follows:
where Xt ∈ {−t, −(t − 2), …t − 2, t} is the state variable at time t.
2.3.2 Sophisticated agent.
A sophisticated agent realizes the time-inconsistency but has no commitment to any predetermined plan. Hence, the sophisticated agent chooses consistent planning in the sense that she optimizes today by expecting her actions in the future. The agent’s selves at different times are considered to be the players of a game, and a consistent plan becomes an intra-personal equilibrium of the game, from which no selves are willing to deviate. Let be the strategy determined by the agent at time T with state XT. Note that the agent can only choose to stop at time T, that is,
Then, at time T − 1, the sophisticated agent determines her action taken at time T − 1 according to the action taken at time T. Denote her plan at time T − 1 to be , which solves
Compared with the decision made by a naive agent at time T − 1, it is the same because both essentially choose the best action at time T − 1, which is a single-period problem. The situation is different at time T − 2. The sophisticated agent determines her action taken at time T − 2 according to the action taken at time T − 1 and T. Denote her plan at time T − 2 to be , which solves
Compared with the decision made by a naive agent at time T − 2, there exists an additional constraint in that
The sophisticated agent determines her strategy sequentially until time 0 in the same fashion. Denote her plan at time t to be , which solves
Consequently, the sophisticated agent’s plan at any time is consistent with her actual strategy which is
2.3.3 Pre-committed agent.
The pre-committed agent can commit to her predetermined plan that is made at time 0, although her preference is time-inconsistent. Hence, her actual stopping strategy is consistent with her plan, which is called a pre-committed strategy. It is simply the optimal solution that maximizes the preference value V0,0 at time 0 with the initial state 0, which is same as the problem faced by the naive agent at time 0:
Since the pre-committed strategy can be solved when solving naive strategy, in the following, we focus on the naive and sophisticated strategies.
3 Turn to time-consistency
In this section, we provide an algorithm to “train” a naive agent into a sophisticated one as a measure of time-inconsistency. Unlike a pre-committed agent, both the naive and sophisticated agents have no commitment device. To achieve a consistent plan, a naive agent needs to be trained to realize the time-inconsistency and modify her decisions accordingly.
Suppose that in a repeated experiment, the naive agent can observe her actual stopping behavior, based on which she realizes that her optimal plan made at any time is deviated by her decisions made at future time points. Then at time 0 when she plans a sequence of actions taken in a simple symmetric random walk up to time T, she (possibly incorrectly) anticipates that her future selves are going to adopt a certain strategy which is consistent with her observation on her actual stopping strategy. This anticipation changes her optimal solution at time 0 to be , which solves
Note that compared with the agent without training at time 0, there exist additional constraints such that
Suppose that the agent does not stop at time 0 and continues to time 1. Then she revisits the problem by the same logic: she anticipates that her future selves are going to adopt the actual stopping strategy and then chooses her optimal solution at time 1 with state j ∈ {−1, 1} accordingly. Denote her plan at time 1 to be , which solves
At time t with state Xt, she plans her action to be taken according to her actual stopping strategy afterwards. Denote her plan at time t to be , which solves
After one round of training, the naive strategy becomes
If aN(1) = aS, then the naive agent has been successfully trained into a sophisticated one with consistent strategies. If aN(1) ≠ aS, then the naive agent is still not fully sophisticated and requires more rounds of training. In other words, suppose that after k rounds of training, the naive agent’s actual strategy does not equal the sophisticated agent’s strategy, that is, aN(k) ≠ aS, k ≥ 1. Then, at time t with state Xt, the naive agent at the (k + 1)-th round plans her action to be taken according to her actual k-th round of stopping strategy. Denote her (k + 1)-th round’s plan at time t to be , which solves
The constraints a(s, Xs) = aN(k)(s, Xs) show that the agent anticipates that at time s ≥ t + 1, she is going to behave according to k-th stopping strategy.
In a T-period time horizon as above, the naive strategies are turned into the sophisticated ones after T − 1 rounds of training at most, because the naive strategies are closer to the sophisticated ones by at least one time step after one round of training. The following proposition presents this result.
Proposition 1 Consider a T-period binomial tree. The naive strategies are the same as the sophisticated ones after T − 1 rounds of training, that is, aN(T−1) = aS.
Indeed, one may need less than T − 1 rounds to turn the naive strategies into the sophisticated ones. Once the naive agent’s actual stopping strategy is the same as the sophisticated strategy, no more training is needed because the naive agent’s actual stopping strategy is consistent with her plan eventually. The total number of rounds hence measures the level of time-inconsistency. The more the number of rounds needed, the higher the level of time-inconsistency. The iteration steps are summarized in the following algorithm.
Algorithm 1 From the naive strategies to the sophisticated strategies in a T-horizon binomial tree
1: for each do
2: for each do
3: x = t − 2 * (j − 1);
4: optimize Vt,x(a);
5: take the optimal solution at node (t, x) to be the action aN(0)(t, x);
6: end for
7: end for
8: for each do
9: t = T − l;
10: for each do
11: x = t − 2 *(j − 1);
12: optimize Vt,x(a), subject to a(s, y) = aS(s, y), , y = s − 2 * (b − 1) for
;
13: take the optimal solution at node (t, x) to be the action aS(t, x);
14: end for
15: end for
16: k = 1;
17: while aN(k−1) ≠ aS do
18: for each do
19: for each do
20: x = t − 2 * (j − 1);
21: optimize Vt,x(a), subject to a(s, y) = aN(k−1)(s, y), , y = s − 2 * (b − 1) for
;
22: take the optimal solution at node (t, x) to be the action aN(k)(t, x);
23: end for
24: end for
25: k = k + 1;
26: end while
4 CPT preferences
In this section we apply the measure of time-inconsistency defined in previous section to a specific optimal stopping problem. In particular, we consider the cumulative prospect theory of [26], which is a time-inconsistent preference because of the probability distortion in the preference value.
4.1 Cumulative prospect theory
The expected utility (EU) framework in classical economics theories has been challenged by more and more empirical evidence. As an alternative to EU, the cumulative prospect theory (CPT) proposed by [26] is one of the well-known non-expected utility theories and has been widely studied in recent years. Cumulative prospect theory can accommodate both risk-averse and risk-seeking behaviors, which are difficult to reconcile in classical expected utility framework, thus providing new explanations for many well-known empirical puzzles, such as the disposition effect in [27, 28] and the equity premium puzzle in [29–31]. In this subsection, we briefly review the cumulative prospect theory.
In evaluating uncertain payoffs according to CPT, there are four important features that differentiate CPT from the traditional EU. First, there exists a reference point in the utility function u(⋅). The values above the reference point are called gains and those below the reference point are losses. In CPT the values applied to the utility function are gains and losses, rather than the total wealth level in EU. Second, there is one diminishing sensitivity in both gains and losses. It then implies an S-shaped utility function: in gains, the utility function exhibits concavity and in losses, convexity. Third, for the same magnitude of gains and losses, one is more sensitive to the disutility of losses compared with the utility of gains, which is termed loss aversion. [26] proposed an analytical form of such an S-shaped utility u(⋅):
(1)
where 0 < α± < 1 signifies that u(⋅) is concave in the gain domain and convex in the loss one, and λ > 1 is the degree of loss aversion.
Forth, the probability weighting functions w±(⋅) are applied in the preference evaluation. The existence of probability weighting makes the evaluation of risky payoff nonlinear in probability distribution because one does not use objective probabilities to evaluate events. An inverse-S shaped probability weighting function is concave in the lower-left corner for small probabilities close to 0 and convex in the upper-right corner for large probabilities close to 1. Note that w+ is applied to gains, and w− is applied to losses. Then inverse S-shaped probability weighting functions lead to the effect that events with small probabilities of either large gains or losses are overweighted, events with large probabilities of small gains or losses are overweighted, and events with moderate probabilities of moderate gains or losses are underweighted. [26] suggested an analytical form of w±(⋅), which is inverse S-shaped:
(2)
where δ± ∈ (0.278, 1) are the degrees of probability distortion in gains and losses, while δ± = 1 simply degenerates to no distortion.
Suppose that a sequence of actions is applied in the simple symmetric random walk. Let
be the probability of achieving state n, starting from (0, 0), with a strategy a, n = −T, …, −1, 0, 1, …T. Suppose that the reference point is the initial state 0. Then at time 0 the CPT preference value for this sequence of action
is
(3)
Note that the CPT preference is time-inconsistent because of the probability distortion. From the mathematical point of view, V is nonlinear in neither p(n) nor the cumulative probabilities ∑nT p(n). The agent at different time points evaluates the same event with inconsistent probability weights since the probabilities are distorted differently. In particular, at time t = 1, …T − 1 with state x ∈ {−t, −(t − 2), …, (t − 2), t}, the preference value of applying a sequence of actions is
(4)
where
stands for the probability of achieving state n, starting from node (t, x), with strategy a, n = −T, …−1, 0, 1, …T.
4.2 Five-period example
We show in this subsection the procedure of training the naive strategies into the sophisticated ones through numerical examples in a five-period binomial tree. Recall the utility function (1) and probability weighting function (2).
First, let α± = 0.9, δ± = 0.5, λ = 1.5. The first graph of Fig 2 shows the naive strategy in a five-period binomial tree, which is essentially stopping in gains except node (1, 1), continuing in losses, and taking randomization at node (2, 0) with probability to stop equal to 0.23454. The initial action is continuing at node (0, 0). After observing such actual behavior and taking the subsequent actions into consideration, the naive agent updates her strategy at each node, as shown in the second graph in Fig 2. The action at node (0, 0) is stopping, in sharp contrast to the previous one. After one more round of training the naive strategy becomes exactly the same as the sophisticated one shown in the third graph of Fig 2. In other words, the naive strategy is trained into the sophisticated one after two rounds. The corresponding objective value as characterized by CPT increases as the naive strategy approaches the sophisticated one.
After two rounds the naive strategy is turned into the sophisticated strategy. The black nodes stand for stopping, the white nodes stand for continuing, and the grey nodes stand for randomization with the number above being the probability to stop.
Next, let α± = 0.5, δ± = 0.9, λ = 1.5. The first graph of Fig 3 shows the naive strategy under this group of parameter values, which is essentially stopping in gains and continuing in losses. The initial action is stopping at node (0, 0). After observing such actual behavior and taking the subsequent actions into consideration, the naive agent updates her strategy at each node, as shown in the second graph in Fig 3. The action at node (0, 0) is no longer stopping, but taking randomization with a large probability to stop. The naive strategy becomes exactly the same as the sophisticated one shown in the second graph after only one round of training. The corresponding objective value also becomes larger as the naive strategy approaches the sophisticated one.
The naive strategy is turned into the sophisticated strategy after one round. The black nodes stand for stopping, the white nodes stand for continuing, and the grey nodes stand for randomization with the number above the node being the probability to stop.
Note that the degree of probability distortion is implied by the value of δ. The smaller the δ, the higher the degree of probability distortion. On the other hand, the time-inconsistency is due to the probability distortion. Then, the higher the degree of probability distortion, the more severe the level of time-inconsistency. It then takes more time to turn the naive strategies into the sophisticated ones when δ = 0.5 than that when δ = 0.9.
4.3 Without randomization or arbitrary start
We consider two extensions of the previous examples. First, we assume that the strategies are pure ones without randomization. This means that the agent can only choose a probability of 1 or 0 to be her action at each node. We find that the results are consistent with the previous ones with randomization. Fig 4 shows that for α± = 0.9, δ± = 0.5, λ = 1.5, after two rounds of training, the naive strategy is the same as the sophisticated one. Fig 5 shows that for α± = 0.5, δ± = 0.9, λ = 1.5, the naive strategy is exactly the same as the sophisticated one so no training is needed.
After two rounds the naive strategy without randomization is turned into the sophisticated strategy without randomization. The black nodes stand for stopping and the white nodes stand for continuing.
The naive strategy without randomization is the same as the sophisticated strategy without randomization. The black nodes stand for stopping and the white nodes stand for continuing.
One can also start with an arbitrary strategy and then update it based on strategic reasoning, which eventually turns it into the sophisticated strategy. For α± = 0.9, δ± = 0.5, λ = 1.5, if one initially chooses the strategy as shown in the first graph of Fig 6—half the chance to continue and half the chance to stop—after two rounds of training, this randomized strategy is turned into the sophisticated one, with an increasing CPT preference value. For α± = 0.5, δ± = 0.9, λ = 1.5, if one initially chooses the “half-half” strategy shown in the first graph of Fig 7, then after two rounds of training this randomized strategy is also turned into the sophisticated one.
After two rounds the “half-half” strategy is turned into the sophisticated strategy. The black nodes stand for stopping, the white nodes stand for continuing, and the grey nodes stand for randomization with the number above being the probability to stop.
After two rounds the “half-half” strategy is turned into the sophisticated strategy. The black nodes stand for stopping, the white nodes stand for continuing, and the grey nodes stand for randomization with the number above being the probability to stop.
5 Present-biased preferences
In this section, we consider the time-inconsistency problem due to present-biased preferences. Following [32], we consider two types of optimal stopping problem—with immediate cost and with immediate reward—and show that how naive strategies are turned into sophisticated strategies through a finite number of steps of reasoning.
5.1 Immediate cost
Suppose the optimal stopping problem with immediate cost is in T-period. If the agent stops immediately, the agent has an immediate cost c, while the reward v is paid in the future and hence is discounted by a factor β ∈ (0, 1]; if the agent stops later, both the cost and reward are discounted and the reward is reduced by k amount proportionally. In other words, if the agent chooses to stop immediately, her preference value is βv − c. If the agent chooses to stop at time 1, her preference value is β(v − k − c). If the agent chooses to stop at time 2, the preference value of such strategy perceived at time 0 is β(v − 2k − c), so on so forth. If the agent stops at terminal time T, her preference is β(v − Tk − c). Suppose k is small enough such that k < v/T to make the reward always positive.
Note that the problem is state-independent, so the vertical nodes at the same time in the binomial tree can be collapsed into one. Consider time . Let pj be the probability of stopping at time t + j from the perspective of time t, j = 0, 1, …T − t. Then p0 + p1+ …+ pT−t = 1. Let a = (p0, p1, …pT-t). Then the present-biased preference value of a at time t is
If (1 − β)c ≤ βk, it is always optimal for the naive agent to stop immediately at any time. Hence,
where 1 means that the agent stops immediately and 0 means that she continues for sure. It is straightforward to check that this naive strategy is equivalent to the sophisticated strategy, where the latter is derived through backward induction.
On the other hand, if (1 − β)c > βk, it is never optimal for the naive agent to stop immediately due to the relatively large immediate cost c until terminal time T. Hence, the naive strategy is to stop at the terminal time T:
Observing such stopping behavior, the naive agent would then update her strategy. If further (1 − β)c ≤ 2βk, then
so on so forth, which eventually equal to the sophisticated strategy:
In general, the following proposition shows how many steps are required to transform a purely naive strategy into a sophisticated one through strategic reasoning.
Proposition 2 Suppose (1 − β)c > βk. Let ϱ ≔ ⌈(1 − β)c/(βk)⌉. Then ϱ ≥ 2. The naive strategy is trained into the sophisticated one after 2(⌈(T + 1)/ϱ⌉ − 1) rounds.
5.2 Immediate reward
Suppose the optimal stopping problem with immediate reward is in T-horizon. If the agent stops immediately, the agent has an immediate reward θTv, θ < 1; if the agent stops one period later, the reward is increased by a degree of θ. The discount factor β ∈ (0, 1]. In other words, if she stops at time 0, the preference value is θTv. If the agent stops at time 1, the preference value is βθT−1v. If the agent chooses to stop at time 2, the preference value is βθT−2v. If the agent chooses to stop at terminal time T, the preference value is βv.
Similar to the case of immediate cost, the stopping problem of immediate reward is also state-independent. Consider time . Let pj be the probability of stopping at time t + j from the perspective of time t, j = 0, 1, …T − t. Then p0 + p1+ …+ pT−t = 1. Let
. Then the preference value of a at time t is
If θT ≥ β, it is always optimal for the naive agent to stop immediately at any time. Hence,
which is equivalent to the sophisticated strategy. If θ < β, it is optimal for the naive agent to stop as late as possible. Hence,
which is also equivalent to the sophisticated strategy. The following proposition shows the remaining cases of transformation from naive strategy to sophisticated one.
Proposition 3. Suppose θ ≥ β > θT. Let ν ≔ ⌊log β/log θ⌋. Then . For ν ≥ 1, the naive strategy is turned into the sophisticated one after ⌈T/ν⌉ rounds.
6 Conclusion
We consider the optimal stopping problem with time-inconsistency in a discrete-time setting with randomization allowed. Because of time-inconsistency, the naive agent deviates from any of her predetermined plans through optimizing her action at each time-state node, whereas the sophisticated agent plans a consistent strategy by taking her future selves actions into consideration. When the naive agent can observe her actual behavior and take her subsequent real actions into consideration, her strategies eventually match with sophisticated strategies after several rounds of training. Under the cumulative prospect theory preferences where the time-inconsistency is due to the probability distortion, the higher the degree of probability distortion, the more time required to turn the naive strategies into the sophisticated ones and hence the more severe the level of time-inconsistency. For strategies without randomization or arbitrary initial strategies, the same algorithm can be applied to turn them into the sophisticated strategies. For time-inconsistency due to the present-biased preferences, we provide analytical results on transforming the time-inconsistent strategies into time-consistent ones in optimal stopping problem with immediate cost and with immediate reward. The analysis shows that strategic reasoning is powerful in achieving time-consistent plans in time-inconsistent problems.
Proofs
Proof of Proposition 1 For a T-period binomial tree, the actions taken at the terminal time T should be 1 because the agent should stop at time T if she has not stopped yet. Therefore, for both the naive and the sophisticated agents, aN(0)(T, x) = aS(T, x) = 1, x ∈ {−T, −(T − 2), …, T − 2, T}. Meanwhile, the naive agents’ actions at time T − 1 is aN(0)(T − 1, x) = p*, which is obtained by optimizing VT−1,x(a), where a = (p, a(T, x + 1), a(T, x − 1)), subject to p ∈ [0, 1] and a(T, x + 1) = a(T, x − 1) = 1. This is exactly same to the sophisticated agent’s action planned at time T − 1, that is, aN(0)(T − 1, x) = aS(T − 1, x), for x ∈ {−(T − 1), −(T − 3), …, T − 3, T − 1}. After the first round of training, the naive agent’s actions taken at time T − 2 become aN(1)(T − 2, XT−2) = p*, which is obtained by optimizing , where
, subject to p ∈ [0, 1], a(T − 1, XT−1) = aN(0)(T − 1, XT−1), and a(T, XT) = aN(0)(T, XT). This is exactly same to the sophisticated agent’s action planned at time T − 2, that is, aN(1)(T − 2, XT−2) = aS(T − 2, XT−2). With the same logic, one obtains that aN(k−1)(T − k, XT−k) = aS(T − k, XT−k), where k = 1, 2, …T. Then, after T − 1 rounds, aN(T−1)(t, x) = aS(t, x) for any node (t, x). Q.E.D.
Proof of Proposition 2 If ϱ > T, then (1 − β)c > Tβk. In this case, realizing that herself is going to stop at the terminal time T according to aN(0), the naive agent finds that doing so is indeed optimal. Then aN(1) is exactly same as aN(0), which means that the naive strategy is same as the sophisticated one.
If , then
. Then comparing the preference value of stopping at the terminal time T according to aN(0) with the value of stopping immediately at time t < T, the naive agent updates her strategy to be
Similarly, her strategy through another round of strategic reasoning becomes
If T − ϱ + 1 ≤ ϱ, then no further change is made and aN(3) is same as aN(2), which means that it has become the sophisticated strategy. If T − ϱ + 1 > ϱ, then the third-round of strategic reasoning leads to
so on so forth, until it is turned into the sophisticated strategy:
where ω ≔ ⌈(T + 1)/ϱ⌉. In total, it takes 2(ω − 1) rounds to achieve aS. Q.E.D.
Proof of Proposition 3 Note that according to the definition of ν, we have θν ≥ β > θν+1. Then it is not optimal for the naive agent to stop immediately until the time horizon is no longer than ν, that is,
If T − ν ≤ ν,
which is equivalent to the sophisticate strategy. If T − ν > ν,
so on so forth, until it is equivalent to the sophisticate strategy:
In total, it takes ⌈T/ν⌉ rounds from the naive strategy to the sophisticated strategy. Q.E.D.
Supporting information
S1 Code. Description of code.
This file explains how to generate strategies and figures in the examples in Section 4.
https://doi.org/10.1371/journal.pone.0310774.s001
(PDF)
References
- 1. Barberis N. A Model of Casino Gambling. Management Science. 2012 Jan;58(1):35–51.
- 2. Hu S, Obłój J, Zhou XY. A Casino Gambling Model Under Cumulative Prospect Theory: Analysis and Algorithm. Management Science. 2023;69:2474–2496.
- 3. He XD, Hu S, Obłój J, Zhou XY. Optimal exit time from casino gambling: strategies of precommitted and naive gamblers. SIAM Journal on Control and Optimization. 2019;57(3):1845–1868.
- 4.
Björk T, Murgoci A. A General Theory of Markovian Time Inconsistent Stochastic Control Problems; 2010. SSRN:1694759.
- 5. Machina MJ. Dynamic Consistency and Non-expected Utility Models of Choice under Uncertainty. Journal of Economic Literature. 1989;27:1622–1668.
- 6. He XD, Hu S, Obłój J, Zhou XY. Randomized and Path-Dependent Strategies in Barberis’ Casino Gambling Model. Operations Research. 2017;65:97–103.
- 7. Huang YJ, Nguyen-Huu A, Zhou XY. General Stopping Behaviors of Naïve and Non-Committed Sophisticated Agents, with application to Probability Distortion. Mathematical Finance. 2020;30(1):310–340.
- 8. Xu ZQ, Zhou XY. Optimal Stopping under Probability Distortion. Annals of Applied Probability. 2012;23(1):251–282.
- 9. Ebert S, Strack P. Until the bitter end: on prospect theory in a dynamic context. American Economic Review. 2015;105(4):1618–1633.
- 10.
Ebert S, Strack P. Never, Ever Getting Started: On Prospect Theory Without Commitment; 2018. SSRN: 2765550.
- 11. Henderson V, Hobson D, Tse A. Randomized Strategies and Prospect Theory in a Dynamic Context. Journal of Economic Theory. 2017;168:287–300.
- 12. Belomestny D, Krätschmer V. Optimal stopping under probability distortions. Mathematics of Operations Research. 2017;42(3):806–833.
- 13. Henderson V, Hobson D, Tse ASL. Probability weighting, stop-loss and the disposition effect. Journal of Economic Theory. 2018;178:360–397.
- 14. Dayanik S, Karatzas I. On the optimal stopping problem for one-dimensional diffusions. Stochastic Processes and their Applications. 2003;107(2):173–212.
- 15.
Shiryaev AN. Optimal stopping rules. Springer Science and Business Media; 2007.
- 16. Strotz RH. Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies. 1955-1956;23(3):165–180.
- 17.
Ekeland I, Lazrak A. Being serious about non-commitment: subgame perfect equilibrium in continuous time; 2006. ArXiv:math/0604264.
- 18. Ekeland I, Pirvu T. Investment and consumption without commitment. Mathematics and Financial Economics. 2008;2(1):57–86.
- 19. Ebert S, Wei W, Zhou XY. Weighted discounting—On group diversity, time-inconsistency, and consequences for investment. Journal of Economic Theory. 2020;189.
- 20. Tan KS, Wei W, Zhou XY. Failure of smooth pasting principle and nonexistence of equilibrium stopping rules under time-inconsistency. SIAM Journal on Control and Optimization. 2021;59:4136–4154.
- 21. Christensen S, Lindensjö K. On finding equilibrium stopping times for time-inconsistent Markovian problems. SIAM Journal on Control and Optimization. 2018;56(6):4228–4255.
- 22. Christensen S, Lindensjö K. On time-inconsistent stopping problems and mixed strategy stopping times. Stochastic Processes and their Applications. 2020;130(5):2886–2917.
- 23. Huang YJ, Nguyen-Huu A. Time-consistent stopping under decreasing impatience. Finance and Stochastics. 2018;22(1):69–95.
- 24. Huang YJ, Yu X. Optimal stopping under model ambiguity: a time-consistent equilibrium approach. Mathematical Finance. 2021;31:979–1012.
- 25.
He XD, Zhou XY. Who are I: Time inconsistency and intrapersonal conflict and reconciliation. In: Yin G, Zariphopoulou T, editors. Stochastic Analysis, Filtering, and Stochastic Optimization: A Commemorative Volume to Honor Mark H. A. Davis’s Contributions. Switzerland: Springer; 2022.
- 26. Tversky A, Kahneman D. Advances in Prospect Theory: Cumulative Representation of Uncertainty. Journal of Risk and Uncertainty. 1992;5(4):297–323.
- 27. Shefrin H, Statman M. The disposition to sell winners too early and ride losers too long: Theory and evidence. Journal of Finance. 1985;40(3):777–790.
- 28. Odean T. Are Investors Reluctant to Realize Their Losses. Journal of Finance. 1998 Oct;53(5):1775–1798.
- 29. Mehra R, Prescott EC. The Equity Premium: A Puzzle. Journal of Monetary Economics. 1985 Mar;15(2):145–161.
- 30. Benartzi S, Thaler RH. Myopic Loss Aversion and the Equity Premium Puzzle. Quarterly Journal of Economics. 1995 Feb;110(1):73–92.
- 31. Barberis N, Huang M. Mental Accounting, Loss Aversion and Individual Stock Returns. Journal of Finance. 2001 Aug;56(4):1247–1292.
- 32. O’Donoghue T, Rabin M. Doing It Now or Later. American Economic Review. 1999 Mar;89(1):103–124.