Fast estimation of time-varying infectious disease transmission rates

doi:10.1371/journal.pcbi.1008124

Table 1.

Notation.

Unless otherwise stated, simulations of reported incidence time series use the reference values listed here. If a symbol is to be interpreted differently in relation to disease incidence and disease mortality data, then the correct definition is indicated by (I) and (M), respectively.

More »

Expand

Fig 1.

Example of S(t) and β(t) estimation using the FC, S, and SI methods.

Plotted are the susceptible population size S(t) and seasonally forced transmission rate β(t) (Eq (27)) underlying 20 years of weekly reported incidence, together with time series estimates S_k and β_k obtained from the data by the FC [blue], S [green], and SI [red] methods. The reported incidence time series (Δt = 1 week, n = ⌊20 × 365/7⌋ = 1042) was simulated without process or observation error (ϵ = 0, p_rep = 1), using reference values (Table 1) for all other data-generating parameters. The three estimation methods were applied without input error, i.e., all input parameters were assigned their true (data-generating) values. [Panel A] S(t) scaled by 1/N₀, describing the number of susceptibles as a proportion of the initial population size. Grey lines show that the absolute error in the FC method estimate of S(t) increases linearly as μ_c 〈S〉t, where μ_c is the constant per capita natural mortality rate and 〈S〉 is the continuous-time average of S(t). [Panel B] β(t) scaled by 1/〈β〉, describing the transmission rate relative to its mean. RRMSE (Eq (33)) in the β_k time series generated by the (FC, S, SI) method is roughly (0.3355, 0.0240, 0.0021).

More »

Expand

Fig 2.

Effects of process and observation error on the S and SI methods.

Plotted are the estimates [Row A] Z_k, [Row B] I_k, and [Row C] β_k of true incidence Z(t), prevalence I(t), and the seasonally forced transmission rate β(t) (Eq (27)) obtained by applying the [Left] S and [Right] SI methods without input error to each of four simulated reported incidence time series (indicated by the legend; Δt = 1 week, n = ⌊3 × 365/7⌋ = 156). The first simulation was purely deterministic [dark grey] (ϵ = 0, p_rep = 1), while the remaining three accounted for (i) environmental stochasticity [ES, light grey] (ϵ = 0.5, p_rep = 1), (ii) ES and demographic stochasticity [ES+DS, blue] (ϵ = 0.5, p_rep = 1), or (iii) ES, DS, and observation error [ES+DS+OE, red] (ϵ = 0.5, p_rep = 0.25). Reference values (Table 1) were assigned to all other data-generating parameters, in all four simulations. The left and right panels in Row A are identical, because the S and SI methods compute Z_k identically (compare Eqs (25a) and (26a)). RRMSE in the β_k time series is (0.0239, 0.0375, 0.1126, 0.1432) with the S method and (0.0021, 0.0153, 0.0494, 0.0591) with the SI method (order follows the legend). Note that the underlying β(t) was the same in all simulations; it is not plotted in Row C, but is close to perfectly represented by the dark grey curve in the right panel (RRMSE ≈ 0.2%). Due to process error, the underlying Z(t) and I(t) (also not shown) varied between the deterministic, ES, and ES+DS simulations.

More »

Expand

Fig 3.

Bias and variance in 1-year cycles embedded in three estimates of a seasonally forced β(t).

[Panel A] In black, the seasonally forced β(t) (Eq (27)) underlying 1000 years of simulated reported incidence data. In (transparent) colour, raw estimates β_k obtained from the data by the S [green] and SI [red] methods, both applied without input error. Only the first 10 of 1000 years are shown. [Panels B and C] In black, the true 1-year cycle in the seasonally forced β(t). In light (transparent) colour, the 1000 1-year cycles embedded in the linear interpolant β_int(t) of β_k. In dark colour, the average 1-year cycle (Eq (22a)) in β_int(t). Results are shown for both the S [Panel B, green] and SI [Panel C, red] methods. [Panel D] Like Panel C, except for a smooth loess curve β_loess(t; q) (q = 53) fit to β_k, instead of the interpolant β_int(t). [Details] A reported incidence time series with 1000 years of weekly observations (Δt = 1 week, n = 52153) was simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (p_rep = 0.25), using reference values (Table 1) for the remaining parameters.

More »

Expand

Fig 4.

Reduction in β(t) estimation error with optimal loess smoothing.

The horizontal axis measures the case reporting probability p_rep, for which 41 values equally spaced on a logarithmic scale between 0.01 and 1 were considered. Using each value of p_rep and reference values (Table 1) for all other parameters, 100 reported incidence time series (Δt = 1 week, n = 1042) were simulated accounting for environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (measured by p_rep). The underlying seasonally forced β(t) (Eq (27)) was estimated from reported incidence using the S and SI methods, both applied without input error, yielding two raw estimates β_k per simulation. Smooth loess curves β_loess(t; q) (q = 10, …, 110; cf. §2.2.6) were fit to each β_k time series. The optimal q for a given time series, denoted by q_opt, was defined as the value that minimized RRMSE (Eq (33)) in β_loess(t_k; q). Overall, for each value of p_rep and each β(t) estimation method (S and SI), 100 values of q_opt were obtained corresponding to 100 β_k time series. Plotted on the vertical axis as functions of p_rep are the median and 5th and 95th percentiles of [Panel A] RRMSE in the raw estimates β_k [dashed lines] and optimal loess estimates β_loess(t_k; q_opt) [solid lines] and [Panel B] q_opt. Lines and bands indicate the median and 5th–95th percentile range, respectively. Results for the S and SI methods are shown in green and red, respectively.

More »

Expand

Fig 5.

Sensitivity of β(t) estimation error to the mean 〈β〉 and amplitude α of seasonal forcing.

Contained in each panel are heatmaps of median RRMSE (Eq (33)) in estimates of a seasonally forced β(t) (Eq (27)) from simulated reported incidence time series, as a bivariate function of the mean 〈β〉 and amplitude α of seasonal forcing. The 〈β〉 axis has been scaled to measure the basic reproduction number (Eq (2)). When simulating reported incidence, reference values (Table 1) were assigned to all data-generating parameters except 〈β〉 and α. A grid of pairs with levels and α = 0, 0.01, …, 0.2 was considered, with 〈β〉 defined for each value of via Eq (2). For each parametrization, 1000 simulations were performed with environmental stochasticity [ES] (ϵ = 0.5) and with or without demographic stochasticity [DS] and observation error [OE], as indicated by row: [Row A] without DS or OE (p_rep = 1, t_rep = 0 weeks), [Row B] with DS but without OE (p_rep = 1, t_rep = 0 weeks), [Row C] with DS and OE (p_rep = 0.25, t_rep = 2 weeks). Corresponding mock birth and natural mortality time series were created, then β(t) was estimated from the data using [Left] the S method and [Right] the SI method, all without input error. For each set of estimates of β(t) (1000 estimates per parametrization, per simulation method, per estimation method), the median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate heatmap, coloured according to the logarithmic scale on the right. The darkest blue indicates median RRMSE less than 0.01.

More »

Expand

Fig 6.

Sensitivity of β(t) estimation error to data-generating parameters other than 〈β〉 and α.

Plotted in each panel is the median RRMSE (Eq (33)) in estimates of a seasonally forced β(t) (Eq (27)) from simulated reported incidence time series (Δt = 1 week, n = 1042), as a univariate function of each of 5 or 6 data-generating parameters (indicated by the legend). When simulating reported incidence, reference values (Table 1) were assigned to all but the focal parameter, which was assigned 41 values logarithmically spaced between and 4 times its reference value. The horizontal axis (logarithmic scale) measures the ratio of the focal parameter’s true value to its reference value, so that commensurate deviations from the reference case can be compared across parameters. For each parametrization, 1000 simulations were performed with environmental stochasticity [ES] (ϵ = 0.5) and with or without demographic stochasticity [DS] and observation error [OE], as indicated by row: [Row A] without DS or OE (p_rep = 1, t_rep = 0 weeks), [Row B] with DS but without OE (p_rep = 1, t_rep = 0 weeks), or [Row C] with DS and OE (p_rep = 0.25 except when p_rep is the focal parameter, t_rep = 2 weeks). Corresponding mock birth and natural mortality time series were created, then β(t) was estimated from the data using [Left] the S method and [Right] the SI method, all without input error. For each set of estimates of β(t) (1000 estimates per parametrization, per simulation method, per estimation method), the median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate panel and graph.

More »

Expand

Fig 7.

Sensitivity of β(t) estimation error to the user-specified values of input parameters.

[Panel A] Median RRMSE (Eq (33)) in estimates of β(t) from simulated reported incidence time series (Δt = 1 week, n = 1042), as a univariate function of the factor by which an input parameter was mis-specified. One thousand simulations were performed using fixed values (Table 1) for all data-generating parameters. The simulations accounted for environmental stochasticity [ES] (ϵ = 0.5), demographic stochasticity [DS], and observation error [OE] (p_rep = 0.25, t_rep = 2 weeks). For each simulation, corresponding mock birth and natural mortality time series were created, and β(t) was estimated from the data using the SI method. True (data-generating) values were specified for all input parameters except the focal parameter (indicated by the legend), for which 41 values logarithmically spaced between and 4 times the true value were specified in turn. Each input parametrization yielded 1000 estimates of β(t), whose median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate graph. [Panel B] Result of repeating the analysis from Panel A in which S₀ was specified with varying amounts of error, but with the initially erroneous value of S₀ updated using the method of peak-to-peak iteration (PTPI; 25 iterations) prior to β(t) estimation. The original result, obtained without PTPI, is presented for comparison.

More »

Expand

Fig 8.

Example of S(t) and β(t) reconstruction with an overestimate of S₀ corrected by peak-to-peak iteration (PTPI).

[Panel A] Truncation step of PTPI (Box 5). Plotted is a reconstruction of true incidence Z(t) from a simulated reported incidence time series, before [Z_k, black] and after [, yellow] smoothing with a 13-point central moving average. Vertical lines indicate peaks in . The times of the first peak in and the last peak occurring at the same phase of the cycle (in this case, the last peak) are denoted by t_a and t_b. [Panel B] Iteration step of PTPI (Box 6), where the initial estimates of both S₀ = S(0) and S(t_a) were taken to be 4 times the true (data-generating) value of S₀. Plotted in grey are successive reconstructions of S(t) between times t_a and t_b, generated by updating the estimate of S(t_a) with the estimate of S(t_b) obtained in the previous iteration. Dashed continuations to the left of t_a display estimation of S₀ backwards in time from estimates of S(t_a). Plotted in black is the result of reconstructing S(t) starting from the final estimate of S₀, which was obtained after 25 iterations and had a relative error of roughly 1.4% (compared to 300% in the initial estimate). [Panel C] The sequence of reconstructions of β(t) corresponding to the estimates of S₀ shown in Panel B. [Details] Twenty years of weekly reported incidence (Δt = 1 week, n = 1042) were simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (p_rep = 0.25), using reference values (Table 1) for the remaining parameters. Z(t), S(t) and β(t) were reconstructed from reported incidence using the SI method without input error (apart from mis-specification of S₀).

More »

Expand

Fig 9.

Convergence of estimates of S₀ obtained using peak-to-peak iteration (PTPI).

S₀ was estimated by applying PTPI (25 iterations) to 1000 incidence time series (i.e., 1000 realizations of a reported incidence time series, scaled by ). An initial guess for S₀ was taken to be or 4 times the true (data-generating) value. For each initial guess, this process generated 1000 sequences of 26 estimates of S₀. Plotted are the median [black lines] and 5th–95th percentile range [grey bands] of the estimate of S₀ at each iteration, for the first 10 iterations. The vertical axis measures (on a logarithmic scale) the ratio of the estimated and true values of S₀, hence convergence close to 1 [dashed green line] represents convergence of the estimates close to the true value. [Details] One thousand reported incidence time series (Δt = 1 week, n = 1042) were simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (p_rep = 0.25), using reference values (Table 1) for the remaining parameters, including S₀ (hence S₀ was the same in all simulations). True incidence was estimated from reported incidence via Eq (26a) (with reporting parameters p_rep and t_rep correctly specified), yielding 1000 time series of estimated incidence. Corresponding mock (constant) birth and natural mortality time series were created (with vital rates ν_c and μ_c correctly specified), and these data (estimated incidence, births, natural mortality) were passed to the PTPI algorithm, allowing for iterative re-estimation of S₀.

More »

Expand