A theoretical foundation of state-transition cohort models

Following its introduction over three decades ago, the cohort model has been used extensively to model population trajectories over time in decision-analytic modeling studies. However, the stochastic process underlying cohort models has not been properly described. In this study, we explicate the stochastic process underlying a cohort model, by carefully formulating the dynamics of populations across health states and assigning probability rules on these dynamics. From this formulation, we explicate a mathematical representation of the system, which is given by the master equation. We solve the master equation by using the probability generation function method to obtain the explicit form of the probability of observing a particular realization of the system at an arbitrary time. The resulting generating function is used to derive the analytical expressions for calculating the mean and the variance of the process. Secondly, we represent the cohort model by a difference equation for the number of individuals across all states. From the difference equation, a continuous-time cohort model is recovered and takes the form of an ordinary differential equation. To show the equivalence between the derived stochastic process and the cohort model, we conduct a numerical exercise. We demonstrate that the population trajectories generated from the formulas match those from the cohort model simulation. In summary, the commonly-used cohort model represent the average of a continuous-time stochastic process on a multidimensional integer lattice governed by a master equation. Knowledge of the stochastic process underlying a cohort model provides a theoretical foundation for the modeling method.


Introduction
Decision models have been used in various applications from clinical decision making to 2 screening guideline development. The objective of decision modeling is to integrate and 3 present evidence within a coherent and explicit mathematical structure that can be used 4 to link evidence to decision-relevant outcomes. In decision modeling, a state-transition 5 Markov model is often used to simulate the prognosis of a patient or a group of patients 6 following an intervention. Beck and Pauker [1] introduced the modeling method over 30 7 years ago with the aim to provide a simple tool for prognostic modeling and for 8 practical use in medical decision making. They applied standard methods from the 9 Markov chain theory [2,3] to simulate a life history of an individual which is structured 10 as transitions across various heath states over time, i.e. a stochastic process on a finite 11 state space (S). However, the current literature in decision modeling lacks clarity in the 12 following two concepts. First, a cohort model, an extension of the state-transition model 13 from one individual to a group of individuals, is often used to introduce and describe a 14 dead. In contrast, N |S| may refer to a partitioning of a cohort of individuals into the 23 numbers of healthy, sick and, dead individuals. The convolution of these two different 24 processes in decision modeling literature lead to the second issue: practitioners are 25 taught that cohort models capture the average behavior of the individuals. [7, 8] 26 However, this claim is often stated without any clear reference to which stochastic 27 process. Frederix et. al. [9] proposes an ordinary differential equation (ODE)-based 28 method to approximate Markov models whilst acknowledging that the ODEs can 29 describe only "the typical (mean) change," albeit, without providing any rational or 30 reference. One of the standard textbooks [7] describes cohort model as a representation 31 of the average experience of patients in the cohort without providing any rationales to 32 support this description. Furthermore, wide acceptance of methodologies does not 33 automatically imply veracity. [10] In the context of cohort models, the wide acceptance 34 is ingrained by the ISPOR-SMDM Modeling Good Research Practices Task Force-3. [11] 35 The published best practices cites the work of Beck et al. [1,4] as the main and only 36 references for cohort models, thereby standardizing cohort models as the recommended 37 method for simulating population trajectories despite of the lack of a proper description 38 of the theoretical support in the decision modeling area.

39
This paper aims to address the aforementioned problems by placing cohort models in 40 a larger framework based on existing literature on stochastic process. Therefore, in this 41 paper, we explicate the stochastic process underlying a cohort model and derive its 42 corresponding probability function from which the prevailing notion of an average is 43 based on. For this paper, we adopt a continuous-time perspective, i.e. for a 44 discrete-time process, a corresponding continuous-time process exists (embeddability 45 assumption). In addition, we assume that transition rates are time-invariant and the 46 process of interest is Markovian, i.e. knowledge of the current state conveys all 47 information that is relevant to forecasting future states. This paper is organized into 48 four main parts. Firstly, we review the prevailing descriptions of cohort model in 49 decision modeling literature and formulate their mathematical representations to 50 substantiate our claim of lack of clarity in the current description of a cohort model.

51
Secondly, we postulate a precise description of the cohort dynamics and the probability 52 rules on these dynamics to derive the underlying stochastic process. Thirdly, we solve 53 for the probability function associated with the stochastic process by method of 54 evolution equation. To show that the derived stochastic process corresponds to cohort 55 model, we also derive the mathematical formulas for the average and the variance of the 56 process. Fourthly, the correspondence is then demonstrated by showing the equivalence 57 between the first two moments of the process and the cohort model via a numerical 58 example. Throughout this exposition, we try to strike a balance between mathematical 59 rigor and accessibility to practitioners.

61
Prevailing views of cohort model 62 Beck and Pauker [1] introduced a discrete-time Markov model as a method for 63 representing a patient's prognosis which can be viewed as a sequence of states of health 64 in discrete time steps. We distinguish between three prevalent descriptions of a Markov 65 model [4,5,7,8]: (1) a model for an individual or a Markov chain on S, (2) a cohort 66 model or a process on N |S| , and (3) the continuous-time analogue of a cohort model 67 from which the ODE-based method [9] arises.

68
Markov chain on S 69 A discrete-time Markov chain on a finite set of mutually exclusive and completely exhaustive states: S = {S 1 , S 2 , . . . , S s }, is a stochastic process, {S(n)} = {S(t n )} where |S| = s, S(n) ∈ S, t n = t 0 + nτ , t 0 is the initial time, τ is the Markov cycle and n ∈ {0, 1, . . . , N }, given an s × s stochastic matrix of transition probabilities governing the transitions among states, P τ (t) = (p ij (t)) 1≤i,j≤s , and a 1 × s vector of initial probability distribution p(t 0 ) = [1 0 . . . 0 0] if, without loss of generality, we assume an individual start in S 1 . Each p ij (t, τ ) has the usual interpretation of the probability of an individual transitioning from S i to S j in τ at time t. In principle, to calculate the distribution of states in the next cycle, p(t + τ ), we post-multiply the distribution in the current cycle, p(t), with P τ (t): Cohort simulation 70 An s-state cohort simulation [1,7] is a recursive algebraic calculations that generates a trajectory of population counts (state-configuration) : where N(t) ∈ R s denotes the number of individuals in state S i at time t ≥ t 0 , given P τ (t) as defined above,and an initial distribution N(t 0 ) = [n 0 0 . . . 0 0]. In principle, to calculate the distribution of individuals across all states at time t + τ , N (t + τ ), the state-configuration at time t, N (t), is projected forward in time using the linear operator P τ (t):

71
The recursive formula (Eq 3) can be written as a difference equation and if we take the limit of this difference equation with respect to the Markov cycle, we obtain the differential form of a cohort model: where Q is the infinitesimal generator matrix of size s × s, Q = (ĉ kl ) 1≤i,j≤s , for the decision modeling often involves a disease progression component which is an inherently 103 random system, i.e. can be described up to a certain degree of confidence. In contrast 104 to a deterministic model, in which given an initial condition, the sequence of the states 105 of the world can be determined with a probability of one, the solution to a stochastic 106 model can be represented by a multidimensional probability function at any t following 107 the initial time t 0 . Therefore, the state of the system at any given time can only be 108 described with a probabilistic statement, i.e., the probability of observing a particular 109 state-configuration, N(t) = n ∈ Z s at time t or P (N(t) = n) = P (n, t).

110
Master equation

111
Given the formulation strategy, we formalize the description of the process in this section. An event of transitioning from S k to S l is denoted by the transition R kl where k, l ∈ {1 . . . , s}. Since the occurrence of each transition alters the state-configuration, each R kl is associated with a vector representing this change, s kl , and is defined as s kl = e l − e k where e i is the i-th column of an identity matrix of size s. We associate each transition with a propensity function: ν kl (n, t) = c kl (t)h(n k , n l ), where c kl (t) is the transition rate for transition R kl , and N k (t) = n k and N l (t) = n l . The function h(n k , n l ) determines the number of ways R kl can occur which is equal to the number of individuals in S k , h kl (n k , n l ) = n k . The propensity function denotes the probability of a (any) transition occurring in continuous time as a function of the state-configuration and rate constant. The probability of observing one particular transition R kl in a small time interval τ is assumed to be linear in time: c kl (t)τ + o(τ ), where o(τ ) represents other terms that decay to zero faster than τ . Therefore, the probability of a transition from S k to S l in time step τ is given by n k c kl (t)τ + o(τ ) (see S1 Transition probability). A transition R k,l occurring with a probability ν kl (n, t)τ in a time step τ alters the current state N(t) = n to: Following the mathematical descriptions of the possible changes in the state-configuration and the probabilities of these changes, we derive the evolution equation for P (n, t), i.e., master equation (also known as the forward Kolmogorov equation), which describes how P (n, t) changes in continuous time. The master equation is obtained by enumerating all possible events contributing to a change in the probability of observing a particular state-configuration in an infinitesimal time step and is given by (see S2 Master equation: The terms, ν kl (n, t)P (n, t), represent the probability transferred from one to another state-configuration resulting from the transition R kl . (probability flux ). In principle, a master equation expresses the change per unit of time of the probability of observing a state-configuration n as the sum of two opposite effects: the probability flux into and out of n. Therefore, a master equation is essentially a rate equation for the probability of observing a state-configuration (c.f. Eq 4). The solution to the master equation, P (n, t), contains all information about the modeled system and can be, under some conditions on c kl (t) obtained by applying the generating function technique (see S3 Generating function method). A probability generating function (PGF) of P (n, t) is defined by: The explicit solution to the Eq 6 is obtained by solving a first-order linear partial 112 differential equation (PDE) for G(x, t). [14] We solve for the PGF under two initial 113 conditions: (1) an arbitrary initial distribution and (2) all individuals start in S 1 . From 114 the solution to the PDE, we recover P (n, t) by using the definition of PGF and obtain 115 the first two moments of the process, i.e., the mean and the variance from the first 116 derivative of G(x, t) (see S4 Mean and variance of the master equation).

118
Solution to master equation 119 The solution to the master equation, the probability function of a state-configuration n at time t , for an arbitrary initial state-configuration P (n, 0) is given by: where . . , s}, B(t) = exp At, i.e. the matrix exponential and A is an s × s matrix, (c kl ) k,l∈{1,2,...,s} , with elements of the form: a kl = c kl − γ k δ kl with γ k = s l=1 c kl and δ kl is the Kronecker delta (δ kl = 1 if k = l and is 0 otherwise). Matrix A is identical to matrix Q in Eq 4, the generator matrix for the continuous-time ODE-equivalence of a cohort model. For the case where all n 0 individuals start in S 1 (N 1 (0) = n 0 ), the probability function of a state-configuration n at time t is represented by: The mean of the number of individuals in the i-th state at time t is given by: The variance of the number of individuals is the i-th state at time t is given by: For an arbitrary initial distribution, the mean of the number of individuals in state S i is given by: and the variance is given by: Numerical verification 120 We conduct a numerical exercise to verify the validity of the derived stochastic process. 121 In particular, we aim to verify the formulas for the mean (Eq 8) and the variance (Eq 9). 122 For this verification exercise, we consider a stochastic process with s = 4, i.e. the formulas from Welton and Ades. [15] The results of the population trajectories are 133 given in Figure (Fig 2). We observe no difference between the empirical estimates from 134 the microsimulation and the analytic formulas for both the mean and the variance of 135 the process associated with the cohort model. first-order uncertainty [17]) and may be useful for quantifying the extent of variability 151 in the mean prediction. In our simulation example, we show that to estimate the 152 variance of the population counts across states, we need to replicate the microsimulation 153 population a sufficiently large number of times, which is a computationally intensive 154 task and is similar to applying uncertainty analysis to a microsimulation model. [18] 155 Our theoretical result provides a more direct and non-computationally demanding 156 approach through the explicit formula for the variance. The derived probability 157 function, when all individuals start in one state (a typical scenario in most applications) 158 takes the form of a multinomial distribution and is a known result from the stochastic 159 compartmental model literature [19]. In applications where the relevant population 160 comprises of an actual cohort (e.g. birth cohorts), the individuals are often, prior to an 161 implementation of an intervention, distributed across health states. For this purpose, we 162 also derive the probability function and formulas for the moments, for an arbitrary 163 initial condition. Lastly, we show the equivalence between the average of the derived 164 stochastic process and the cohort model; thereby substantiating the prevailing notion of 165 cohorts models as average process.

166
In the introduction to cohort models [1] for use in decision modeling, the authors 167 emphasized the clinical utility of a Markov process (as opposed to a decision tree model) 168 for modeling patient's prognosis with ongoing risks: a model for an inherently stochastic 169 system. A stochastic representation of a system amounts to making probabilistic 170 statements about its state given a set of probabilistic rules for the system dynamics.

171
Therefore, the solution to any stochastic processes or models takes the form of a 172 probability function on a set of random variables [2], i.e., in our context, P (n, t), which 173 is the solution to the master equation. However, in their expositions, population counts 174 were used as a representation of the stochastic system via a cohort simulation. The distinct processes: a process on S and a Markov chain on N s . The notion of the average 180 as represented by cohort simulation is based on the latter. To date, our work is the only 181 study that delineates the convolution of these two different processes which has been 182 perpetuated by tutorials and textbooks in decision modeling [7,20].

183
Having an explicit form of the evolution equation for P (n, t) is advantageous for, in 184 some cases (e.g. time-invariant rates), finding the analytical form of P (n, t) and, in 185 most cases (e.g. time-varying rates), deriving the mean and the variance of the process 186 by using their evolution equations and without knowing the explicit form of P (n, t). In 187 addition, given this knowledge of evolution equation associated with a cohort model, we 188 can identify the relationship between cohort models and other known methodologies in 189 stochastic process. In the classification tree of Markov processes [21], a cohort model as 190 represented by the master equation is derived from the differential 191 Chapman-Kolmogorov equation without the drift and diffusion coefficients. If the 192 cohort size is sufficiently large, a cohort model can be approximated by the van 193 Kampen's expansion method [22], which is analytically more tractable compared to 194 master equation. In addition, if we relaxed the integrality constraint on the state space 195 to allow for nonnegative real numbers, a cohort model may be represented by a 196 Fokker-Planck or a stochastic differential equation (SDE). [21,23] Therefore, our study 197 results provide practitioners and researchers the proper platform for the porting of 198 methods from stochastic process literature for applications in decision modeling.

199
Knowledge of the underlying process will provide a proper context for efforts to 200 resolve or clarify methodological challenges or issues, for example, in estimating the 201 appropriate time step for cohort models. [24,25] Practitioners of cohort models are 202 taught to add an additional half-year of lives at the end of the cohort simulation to 203 account for the fact that individuals transition at the end of each time step. This 204 half-cycle correction is a consequence of discretizing a continuous-time scale. However, 205 the refinement of half-cycle correction method was based on the average process [26], i.e. 206 not taking into account the variance of the process or other higher-order moments. 207 Naimark et al. [26] proposed several methods, including numerical integration 208 techniques to determine the appropriate time step to bypass the need of half-cycle 209 correction. Elbasha and Chhatwal [27] quantified the approximation errors from using 210 these numerical integration methods in the context of a 3-states cohort model. However, 211 these numerical approximation methods were applied to the "state membership stochastic simulation [28] and tau-leap methods [29] numerically solves a master 225 equation by simulating all possible events one-at-a-time or in batches, respectively.

226
Despite of this limitation, our analytical results remain applicable in many situations 227 since no restriction is placed on the number of states and the allowable transition patters 228 on the set of states. In addition, the analytical results for an arbitrary initial condition 229 can be utilized to approximate a process with time-varying rates by using a piecewise 230 process. Each piece or time interval corresponds to a time-invariant process. The initial 231 condition of the next time interval would be the distribution of individuals at the end of 232 the previous interval. We plan to introduce this piecewise method in a future study.

233
This paper also assumes that the discrete-time transition probability matrix is 234 embeddable to a continuous-time model, i.e. the limit of P exists and its equal to the 235 generator matrix of the process (Eq 4). Although discrete-time Markov models has been 236 the method of choice in many applications in decision modeling, the processes under 237 consideration, e.g. biological process or disease progression, evolve continuously; 238 henceforth the appropriate modeling framework would be a continuous-time model [30]. 239 In addition, the reason for the greater popularity of the discrete-time structure stems from its simpler mathematical nature, not from considerations of veracity. [31] However, 241 in some instances, one can argue that an individual-level model tend to follow a 242 discrete-time process, e.g. the transitions among living arrangements for an elderly 243 patient occur in discrete time steps. However, if we consider a cohort of individuals, the 244 probability of a patient transitioning in a small time interval would very likely to be 245 nonzero. Nevertheless, we believe that future studies investigating whether the typical 246 Markov models in decision modeling are embeddable are warranted.

248
The cohort model provides an easy-to-implement method for modeling recurrent events 249 over time and has been used extensively in many applications ranging from clinical

254
Supporting information 255 S1 Transition probability In this section, we show the derivation for the probability of transitioning from S k to state S l in time step τ , which is given by: The probability of a particular transition k → l (R kl ) in a small time step τ is postulated to be linear in time: c kl τ + o(τ ). This postulate is conceptually a first-order Taylor approximation to an instantaneous transition probability. The transition rate constant, c kl , is interpreted as the derivative of the transition probability at time τ = 0. We also restrict the number of allowable transitions in τ to one transition. If the number of individuals in S k at time t is equal to n k , then any of these n k , assumed to be indistinguishable, individuals is at risk of transitioning. Since individuals are indistinguishable, each individual transition can be considered as a Bernoulli trial with a probability of success of c kl τ + o(τ ). The probability of only one individual transitioning is equal to: (c kl τ + o(τ ))(1 − c kl τ + o(τ )) n k −1 . Since there are n k ways of this transition to occur, we have: Eq 12 is then recovered as the τ terms of higher order are collected as o(τ ) after 256 expanding the binomial term in Eq 13. (1 − c kl (t)n k τ )P (n, t) Rearranging Eq 14 and dividing it by τ yields: P (n, t + τ ) − P (n, t) τ = s k=1 s l=1,l =k c kl (t)(n k + 1)P (n − s kl , t) − s k=1 s l=1,l =k c kl (t)n k P (n, t) The master equation is then established after taking the limit τ → 0.

258
S3 Generating function method A probability generating function (PGF) for a vector x = (x 1 x 2 . . . x s ) is defined by: . . x ns s P (n, t) where n = n1 . . . ns . Differentiating G(x, t) with respect to t and assuming that the series is uniformly convergent, we obtain: Eq 17 can be simplified by (1) recognizing the following identity (e.g., for x 1 ): . . x s s P (n, t) = x 1 ∂ ∂x 1 G(n, t), (2) using the definition of PGF (Eq 16), and (3) rearranging the summation index to obtain the following first-order linear partial differential equation (PDE): with an initial condition: G(x, 0) = n x n1 1 x n2 2 . . . x ns s P (n, 0) The PDE by solved by using the method of characteristics [14,32]. The characteristics equations are: dx(ξ) dξ = v where v = (v 1 v 2 . . . v s ) and v i = s l=1,l =i c ik (x l − x i ). Putting β(s) = G(x(ξ), t − ξ), we have: The variance of the master equation can be derived by using the following relationship: The variance of the number of individuals in state S i , given all n 0 individuals start in state S 1 , is equal to: The variance of the number of individuals in state S i for an arbitrary initial distribution is given by: