Fig 1.
This figure illustrates our individual-based simulation model, following a hypothetical population for three consecutive (annual) time steps (Time = t, t+1, t+2). All infected individuals carry a single strain of TB (A, B, or C in this example). At each time step, three processes are modeled: 1. Transmission: upon successful contact, actively infected individuals can transmit the disease (marked by their strain type) to other people in the population. 2. Progression: other TB states are updated as shown in the left panel, including stabilization of latency, re-infection, diagnosis, and treatment, and relapse. Individuals who are diagnosed have their strain type recorded for analysis as they move from the active to the recovered state. 3. DNA fingerprint replacement: a random number of individuals in the late latency state are selected to carry new and unique fingerprints (strains), to maintain genetic diversity and account for processes such as mutation, migration, and infection from outside the population.
Table 1.
Model parameters.
Fig 2.
Absolute bias in estimates of the TB recent transmission proportion, comparing the ‘n-1’ method to novel regression-based tools in the derivation set (top) and validation set (bottom).
The y-axis presents the absolute estimation bias [|estimated value – true value| × 100] in the proportion of incident active TB due to recent transmission (“recent transmission proportion”), and the x-axis denotes the recent transmission proportion (simulated) in each set of simulations. Estimates from the ‘n-1’ method are shown in red, and those from the simple and comprehensive regression tools are shown in green and blue, respectively. Boxes show the interquartile range of values from all simulations, and “whiskers” show the 95% confidence intervals, such that narrower boxes correspond to more precise (reproducible) estimates. The ‘n-1’ model tends to provide less accurate and precise estimates of recent transmission proportion (wide red bars) across all settings as compared to the simple and comprehensive regression-based models.
Fig 3.
Absolute estimation bias in validation set, comparing the ‘n-1’ method to the regression-based models.
The x-axis denotes the absolute level of estimation bias (abs[estimated value – true value]*100) in the proportion of incident active TB due to recent transmission (“recent transmission proportion”) across the validation set of simulations. The y-axis denotes the cumulative proportion of simulations with estimation bias greater than the threshold shown on the x-axis. For example, the vertical dotted line shows the proportion of simulations under each method that resulted in an absolute estimation bias of >10%: the ‘n-1’ method resulted in 10% or greater estimation bias in 35% of all simulations (red line), compared to 2% of all simulations with the simple regression model (green) and 1% of all simulations with the comprehensive regression model (blue).
Fig 4.
Estimation bias at different levels of study duration (top) and coverage (bottom), comparing the ‘n-1’ method to novel regression-based tools.
The y-axis presents the (non-absolute) estimation bias [(estimated value – true value) × 100] in the proportion of incident active TB due to recent transmission (“recent transmission proportion”). Estimates from the ‘n-1’ method are shown in red, and those from the simple and comprehensive regression tools are shown in green and blue, respectively. Boxes show the interquartile range of values from all simulations, and “whiskers” show the 95% confidence intervals, such that narrower boxes correspond to more precise (reproducible) estimates. The ‘n-1’ model tends to underestimate the recent transmission proportion at low levels of sample coverage (<50%) and study duration (<10 years), and begins to overestimate the recent transmission proportion as coverage and duration are increased. The regression models are fairly robust to variation of study characteristics, providing more accurate and precise estimates of recent transmission proportion, especially in settings of incomplete coverage and short study duration.
Fig 5.
Estimation bias in the TB recent transmission proportion in an illustrative high burden case study, comparing the ‘n-1’ method to the regression-based models at different levels of sampling coverage.
The top panel compares the estimation bias resulting from each model at a fixed sampling coverage of 73% (as estimated assuming a 90% case detection proportion, referred to as “Scenario 1” in the text), while varying the duration of data collection. The bottom panel present the results at a fixed study duration of 6 years (as reported in the original study), while varying the coverage of molecular fingerprints at the population level. The asterisk denotes the baseline scenario as reported in reference [23], at which all three methods accurately estimate the recent transmission proportion to within 5%. However, as study duration and population coverage decline, the performance of the ‘n-1’ method falls dramatically. At either a two-year study duration or a 20% population coverage of molecular data, the ‘n-1’ method underestimates the recent transmission proportion by 17% (second column of each figure), whereas both regression tools continue to estimate the recent transmission proportion with a bias of 7% or less. Boxes show the interquartile range of values from all simulations, and “whiskers” show the 95% confidence intervals, such that narrower boxes correspond to more precise (reproducible) estimates. Note that the three bars are jittered at each level of coverage/duration for clarity, but all three methods are performed under the same conditions.