Estimating a Markovian Epidemic Model Using Household Serial Interval Data from the Early Phase of an Epidemic

The clinical serial interval of an infectious disease is the time between date of symptom onset in an index case and the date of symptom onset in one of its secondary cases. It is a quantity which is commonly collected during a pandemic and is of fundamental importance to public health policy and mathematical modelling. In this paper we present a novel method for calculating the serial interval distribution for a Markovian model of household transmission dynamics. This allows the use of Bayesian MCMC methods, with explicit evaluation of the likelihood, to fit to serial interval data and infer parameters of the underlying model. We use simulated and real data to verify the accuracy of our methodology and illustrate the importance of accounting for household size. The output of our approach can be used to produce posterior distributions of population level epidemic characteristics.


A Statistical (MCMC) details
Here we give details of the MCMC routines used in the results sections. In all cases the proposals were Gaussian. Burn-in was 10 3 samples and the next 10 5 samples were taken, thinned by a factor of 10 to give 10 4 samples from the posterior. Table 1 gives the parameters for the full inference model. Table 2 gives the parameters used in the Hong Kong influenza inference.

B Sensitivity analysis of the full model
Here we show the results from a sensitivity analysis. We are mostly concerned with how the number of data points and their random nature affect the posterior distribution. Figures 3-7 show the posterior distributions for 8 sets of randomly generated serial interval distributions with 15, 50, 100, 200 and 300 data points respectively. The model parameters are: β = 2, σ = 1/4, γ = 1/2 and j = k = 2. MCMC parameters are given in Table 1.

C Full inference while holding β fixed
Here we present results from fitting the full model when the parameter β is fixed. Figure S8 shows posteriors estimated from 9 random serial interval distributions. Figure S9 shows the mean serial interval distribution and confidence intervals computed from fitting to the full distribution shown in the main text and holding β fixed.

D Some analytic results
The case N = 2. The cdf of the clinical serial interval T SI in the case N = 2 is given by where T Inf is the time for the first individual to infect the second individual, and T Sym is the time for the second individual to display symptoms. The exposed period is distributed Γ(j, jσ), and so the probability density function Now, when considering the random variable T Inf , we have k different possible paths to the occurrence of the first secondary case, corresponding to the infectious class of the index case at time of transmission. Each of these paths has a corresponding probability ρ(m) of arising (with k m=1 ρ(m) = 1, as a secondary case is assumed to occur). These probabilities may be evaluated by calculating the probability of the specified sequence of events, via the jump chain of the Markov chain conditioned on a secondary case. The corresponding holding times in each state of the conditioned chain are exponentially distributed, with some rate λ i . Hence, We now consider the evaluation of the required probabilities conditioned on a secondary case, in order to evaluate ρ(m) and the rates λ i . To evaluate these quantities, we use the results of Waugh [1] on conditioned Markov processes. The transition rates of the conditioned process arẽ where F is the set of states from which the event of interest, the first secondary case, can occur, and p i is the probability of the event of interest, at least one secondary case, occurring starting from state i. For the model with k infectious classes, the probability of at least one secondary case from state i is We note that only two types of events can occur in the conditioned process from the states (E 1 , E 2 , . . . , E j , I 1 , I 2 , . . . , I k , R) = (0, 0, . . . , 0, e k−1,i , 0, 0), where e k−1,i is a vector of length k − 1 with a 1 in the ith position, and they are to states (0, 0, . . . , 0, e k,i+1 , 0) (corresponding to Progression of infectious class) and to (1, 0, . . . , 0, e k−1,i , 0, 0) (corresponding to a new Infection); hence we denote the rates of these events, for notational convenience, byq i,P andq i,I respectively, and are given bỹ In state (0, 0, . . . , 0, 0, . . . , 1, 0) the only transition possible, in the conditioned process, is a transition to state (1, 0, . . . , 0, 0, . . . , 1, 0) at rate q k = β/α k .
Hence, the probability the first infection occurs with the index case in stage m(< k) of the k stages, in the conditioned process, is given by with an empty product evaluating to one, and Finally, note that Substituting these equations into (4), it is possible to evaluate the cdf of the CSI in the case N = 2 explicitly, with the assistance of MAPLE, at least for fixed j and k.
The N > 2 case. Unfortunately, the cdf of CSI in the case N > 2 is more challenging to evaluate than the N = 2 case. We now have Pr(i secondary infections) Pr(min{A 1 , . . . , A i } ≤ t).
The difficulty in evaluating this arises due to the dependence of the T 1 Inf , . . . , T i−1 Inf random variables within A i−1 and A i . For this reason, we consider an approximation which is derived by assuming independence between these random variables. We therefore have Pr(i secondary infections) We can evaluate the probabilities Pr(A j > t) using our results presented in the previous section, for evaluating the distribution of the CSI in the N = 2 case. We simply need to account for the probability of starting in each infectious class for the jth infection, and modify the transmission rate taking into account past infections. Due to the algebraic complexity of the resulting expressions, even under the assumption of independence, we once again use MAPLE for evaluation purposes. This also comes with the benefit that we may easily exploit Laplace transforms of pdfs in order to evaluate the pdf of each A j (j > 1), via the product of the Laplace transforms of the individual pdfs which follows from our assumption of independence. Finally, due to the small number of individuals which constitute households, we can evaluate the probability of having i secondary infections, for 1 ≤ i ≤ N − 1 (conditioned on at least one secondary infection).
Unfortunately, the resulting expressions are so lengthy that the are of limited use. From a practical viewpoint, they take longer to evaluate than the numerical scheme outlined and implemented in the paper. Whilst the approximation is typically accurate, the fact that it is an approximation and combined with this computational inferiority, we have chosen to use only the numerical method herein.