Table 1.
Unless otherwise stated, simulations of reported incidence time series use the reference values listed here. If a symbol is to be interpreted differently in relation to disease incidence and disease mortality data, then the correct definition is indicated by (I) and (M), respectively.
Fig 1.
Example of S(t) and β(t) estimation using the FC, S, and SI methods.
Plotted are the susceptible population size S(t) and seasonally forced transmission rate β(t) (Eq (27)) underlying 20 years of weekly reported incidence, together with time series estimates Sk and βk obtained from the data by the FC [blue], S [green], and SI [red] methods. The reported incidence time series (Δt = 1 week, n = ⌊20 × 365/7⌋ = 1042) was simulated without process or observation error (ϵ = 0, prep = 1), using reference values (Table 1) for all other data-generating parameters. The three estimation methods were applied without input error, i.e., all input parameters were assigned their true (data-generating) values. [Panel A] S(t) scaled by 1/N0, describing the number of susceptibles as a proportion of the initial population size. Grey lines show that the absolute error in the FC method estimate of S(t) increases linearly as μc 〈S〉t, where μc is the constant per capita natural mortality rate and 〈S〉 is the continuous-time average of S(t). [Panel B] β(t) scaled by 1/〈β〉, describing the transmission rate relative to its mean. RRMSE (Eq (33)) in the βk time series generated by the (FC, S, SI) method is roughly (0.3355, 0.0240, 0.0021).
Fig 2.
Effects of process and observation error on the S and SI methods.
Plotted are the estimates [Row A] Zk, [Row B] Ik, and [Row C] βk of true incidence Z(t), prevalence I(t), and the seasonally forced transmission rate β(t) (Eq (27)) obtained by applying the [Left] S and [Right] SI methods without input error to each of four simulated reported incidence time series (indicated by the legend; Δt = 1 week, n = ⌊3 × 365/7⌋ = 156). The first simulation was purely deterministic [dark grey] (ϵ = 0, prep = 1), while the remaining three accounted for (i) environmental stochasticity [ES, light grey] (ϵ = 0.5, prep = 1), (ii) ES and demographic stochasticity [ES+DS, blue] (ϵ = 0.5, prep = 1), or (iii) ES, DS, and observation error [ES+DS+OE, red] (ϵ = 0.5, prep = 0.25). Reference values (Table 1) were assigned to all other data-generating parameters, in all four simulations. The left and right panels in Row A are identical, because the S and SI methods compute Zk identically (compare Eqs (25a) and (26a)). RRMSE in the βk time series is (0.0239, 0.0375, 0.1126, 0.1432) with the S method and (0.0021, 0.0153, 0.0494, 0.0591) with the SI method (order follows the legend). Note that the underlying β(t) was the same in all simulations; it is not plotted in Row C, but is close to perfectly represented by the dark grey curve in the right panel (RRMSE ≈ 0.2%). Due to process error, the underlying Z(t) and I(t) (also not shown) varied between the deterministic, ES, and ES+DS simulations.
Fig 3.
Bias and variance in 1-year cycles embedded in three estimates of a seasonally forced β(t).
[Panel A] In black, the seasonally forced β(t) (Eq (27)) underlying 1000 years of simulated reported incidence data. In (transparent) colour, raw estimates βk obtained from the data by the S [green] and SI [red] methods, both applied without input error. Only the first 10 of 1000 years are shown. [Panels B and C] In black, the true 1-year cycle in the seasonally forced β(t). In light (transparent) colour, the 1000 1-year cycles embedded in the linear interpolant βint(t) of βk. In dark colour, the average 1-year cycle (Eq (22a)) in βint(t). Results are shown for both the S [Panel B, green] and SI [Panel C, red] methods. [Panel D] Like Panel C, except for a smooth loess curve βloess(t; q) (q = 53) fit to βk, instead of the interpolant βint(t). [Details] A reported incidence time series with 1000 years of weekly observations (Δt = 1 week, n = 52153) was simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (prep = 0.25), using reference values (Table 1) for the remaining parameters.
Fig 4.
Reduction in β(t) estimation error with optimal loess smoothing.
The horizontal axis measures the case reporting probability prep, for which 41 values equally spaced on a logarithmic scale between 0.01 and 1 were considered. Using each value of prep and reference values (Table 1) for all other parameters, 100 reported incidence time series (Δt = 1 week, n = 1042) were simulated accounting for environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (measured by prep). The underlying seasonally forced β(t) (Eq (27)) was estimated from reported incidence using the S and SI methods, both applied without input error, yielding two raw estimates βk per simulation. Smooth loess curves βloess(t; q) (q = 10, …, 110; cf. §2.2.6) were fit to each βk time series. The optimal q for a given time series, denoted by qopt, was defined as the value that minimized RRMSE (Eq (33)) in βloess(tk; q). Overall, for each value of prep and each β(t) estimation method (S and SI), 100 values of qopt were obtained corresponding to 100 βk time series. Plotted on the vertical axis as functions of prep are the median and 5th and 95th percentiles of [Panel A] RRMSE in the raw estimates βk [dashed lines] and optimal loess estimates βloess(tk; qopt) [solid lines] and [Panel B] qopt. Lines and bands indicate the median and 5th–95th percentile range, respectively. Results for the S and SI methods are shown in green and red, respectively.
Fig 5.
Sensitivity of β(t) estimation error to the mean 〈β〉 and amplitude α of seasonal forcing.
Contained in each panel are heatmaps of median RRMSE (Eq (33)) in estimates of a seasonally forced β(t) (Eq (27)) from simulated reported incidence time series, as a bivariate function of the mean 〈β〉 and amplitude α of seasonal forcing. The 〈β〉 axis has been scaled to measure the basic reproduction number (Eq (2)). When simulating reported incidence, reference values (Table 1) were assigned to all data-generating parameters except 〈β〉 and α. A grid of
pairs with levels
and α = 0, 0.01, …, 0.2 was considered, with 〈β〉 defined for each value of
via Eq (2). For each parametrization, 1000 simulations were performed with environmental stochasticity [ES] (ϵ = 0.5) and with or without demographic stochasticity [DS] and observation error [OE], as indicated by row: [Row A] without DS or OE (prep = 1, trep = 0 weeks), [Row B] with DS but without OE (prep = 1, trep = 0 weeks), [Row C] with DS and OE (prep = 0.25, trep = 2 weeks). Corresponding mock birth and natural mortality time series were created, then β(t) was estimated from the data using [Left] the S method and [Right] the SI method, all without input error. For each set of estimates of β(t) (1000 estimates per parametrization, per simulation method, per estimation method), the median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate heatmap, coloured according to the logarithmic scale on the right. The darkest blue indicates median RRMSE less than 0.01.
Fig 6.
Sensitivity of β(t) estimation error to data-generating parameters other than 〈β〉 and α.
Plotted in each panel is the median RRMSE (Eq (33)) in estimates of a seasonally forced β(t) (Eq (27)) from simulated reported incidence time series (Δt = 1 week, n = 1042), as a univariate function of each of 5 or 6 data-generating parameters (indicated by the legend). When simulating reported incidence, reference values (Table 1) were assigned to all but the focal parameter, which was assigned 41 values logarithmically spaced between and 4 times its reference value. The horizontal axis (logarithmic scale) measures the ratio of the focal parameter’s true value to its reference value, so that commensurate deviations from the reference case can be compared across parameters. For each parametrization, 1000 simulations were performed with environmental stochasticity [ES] (ϵ = 0.5) and with or without demographic stochasticity [DS] and observation error [OE], as indicated by row: [Row A] without DS or OE (prep = 1, trep = 0 weeks), [Row B] with DS but without OE (prep = 1, trep = 0 weeks), or [Row C] with DS and OE (prep = 0.25 except when prep is the focal parameter, trep = 2 weeks). Corresponding mock birth and natural mortality time series were created, then β(t) was estimated from the data using [Left] the S method and [Right] the SI method, all without input error. For each set of estimates of β(t) (1000 estimates per parametrization, per simulation method, per estimation method), the median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate panel and graph.
Fig 7.
Sensitivity of β(t) estimation error to the user-specified values of input parameters.
[Panel A] Median RRMSE (Eq (33)) in estimates of β(t) from simulated reported incidence time series (Δt = 1 week, n = 1042), as a univariate function of the factor by which an input parameter was mis-specified. One thousand simulations were performed using fixed values (Table 1) for all data-generating parameters. The simulations accounted for environmental stochasticity [ES] (ϵ = 0.5), demographic stochasticity [DS], and observation error [OE] (prep = 0.25, trep = 2 weeks). For each simulation, corresponding mock birth and natural mortality time series were created, and β(t) was estimated from the data using the SI method. True (data-generating) values were specified for all input parameters except the focal parameter (indicated by the legend), for which 41 values logarithmically spaced between and 4 times the true value were specified in turn. Each input parametrization yielded 1000 estimates of β(t), whose median RRMSE was calculated (after smoothing with fixed q; see Eq (59)) and displayed as one point in the appropriate graph. [Panel B] Result of repeating the analysis from Panel A in which S0 was specified with varying amounts of error, but with the initially erroneous value of S0 updated using the method of peak-to-peak iteration (PTPI; 25 iterations) prior to β(t) estimation. The original result, obtained without PTPI, is presented for comparison.
Fig 8.
Example of S(t) and β(t) reconstruction with an overestimate of S0 corrected by peak-to-peak iteration (PTPI).
[Panel A] Truncation step of PTPI (Box 5). Plotted is a reconstruction of true incidence Z(t) from a simulated reported incidence time series, before [Zk, black] and after [, yellow] smoothing with a 13-point central moving average. Vertical lines indicate peaks in
. The times of the first peak in
and the last peak occurring at the same phase of the cycle (in this case, the last peak) are denoted by ta and tb. [Panel B] Iteration step of PTPI (Box 6), where the initial estimates of both S0 = S(0) and S(ta) were taken to be 4 times the true (data-generating) value of S0. Plotted in grey are successive reconstructions of S(t) between times ta and tb, generated by updating the estimate of S(ta) with the estimate of S(tb) obtained in the previous iteration. Dashed continuations to the left of ta display estimation of S0 backwards in time from estimates of S(ta). Plotted in black is the result of reconstructing S(t) starting from the final estimate of S0, which was obtained after 25 iterations and had a relative error of roughly 1.4% (compared to 300% in the initial estimate). [Panel C] The sequence of reconstructions of β(t) corresponding to the estimates of S0 shown in Panel B. [Details] Twenty years of weekly reported incidence (Δt = 1 week, n = 1042) were simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (prep = 0.25), using reference values (Table 1) for the remaining parameters. Z(t), S(t) and β(t) were reconstructed from reported incidence using the SI method without input error (apart from mis-specification of S0).
Fig 9.
Convergence of estimates of S0 obtained using peak-to-peak iteration (PTPI).
S0 was estimated by applying PTPI (25 iterations) to 1000 incidence time series (i.e., 1000 realizations of a reported incidence time series, scaled by ). An initial guess for S0 was taken to be
or 4 times the true (data-generating) value. For each initial guess, this process generated 1000 sequences of 26 estimates of S0. Plotted are the median [black lines] and 5th–95th percentile range [grey bands] of the estimate of S0 at each iteration, for the first 10 iterations. The vertical axis measures (on a logarithmic scale) the ratio of the estimated and true values of S0, hence convergence close to 1 [dashed green line] represents convergence of the estimates close to the true value. [Details] One thousand reported incidence time series (Δt = 1 week, n = 1042) were simulated with environmental noise in transmission (ϵ = 0.5), demographic stochasticity, and random under-reporting of cases (prep = 0.25), using reference values (Table 1) for the remaining parameters, including S0 (hence S0 was the same in all simulations). True incidence was estimated from reported incidence via Eq (26a) (with reporting parameters prep and trep correctly specified), yielding 1000 time series of estimated incidence. Corresponding mock (constant) birth and natural mortality time series were created (with vital rates νc and μc correctly specified), and these data (estimated incidence, births, natural mortality) were passed to the PTPI algorithm, allowing for iterative re-estimation of S0.