Village-scale persistence and elimination of gambiense human African trypanosomiasis

Gambiense human African trypanosomiasis (gHAT) is one of several neglected tropical diseases that is targeted for elimination by the World Health Organization. Recent years have seen a substantial decline in the number of globally reported cases, largely driven by an intensive process of screening and treatment. However, this infection is highly focal, continuing to persist at low prevalence even in small populations. Regional elimination, and ultimately global eradication, rests on understanding the dynamics and persistence of this infection at the local population scale. Here we develop a stochastic model of gHAT dynamics, which is underpinned by screening and reporting data from one of the highest gHAT incidence regions, Kwilu Province, in the Democratic Republic of Congo. We use this model to explore the persistence of gHAT in villages of different population sizes and subject to different patterns of screening. Our models demonstrate that infection is expected to persist for long periods even in relatively small isolated populations. We further use the model to assess the risk of recrudescence following local elimination and consider how failing to detect cases during active screening events informs the probability of elimination. These quantitative results provide insights for public health policy in the region, particularly highlighting the difficulties in achieving and measuring the 2030 elimination goal.


Model Model equations
We introduce a compartmental infection model for HAT. The outputs of the model are the number of humans and the proportion of the total number of vectors in each class. For humans, the classes are: susceptible, exposed (but not infectious), Stage 1 infection, Stage 2 infection and hospitalised (or recovering at home). These variables are denoted by S Hj (t), E Hj (t), I 1Hj (t), I 2Hj (t) and R Hj (t), where j = 1, 2, with j = 1 for low risk individuals, randomly participating in active screening, and j = 2 for high risk individuals, never participating in active screening, each at time t > 0. The population size is assumed to be constant (see Fig 7 for different assumptions on the populations) and thus N Hj = S Hj (t) + E Hj (t) + I 1Hj (t) + I 2Hj (t) + R Hj (t), j = 1, 2.
The vectors (tsetse) are given by the variables S V , E 1V , E 2V , E 3V , I V and G V , which correspond to teneral (unfed) tsetse, infected tsetse in their extrinsic incubation period (there are three classes to create a gamma distributed period), infectious tsetse, and non-teneral (fed) but uninfected tsetse. The compartments of the model and possible transitions between them are shown graphically in Fig  1. We assume that there is natural mortality from all compartments, which leads to replacement of that individual as a susceptible in the population.
Active screening has the potential to detect infection and so move individuals from the exposed or infected classes directly to hospitalised. This is simulated by randomly selecting from the low risk population and if these individuals are exposed or infected, they become hospitalised with 91% probability (the sensitivity of the screening algorithm), as this is the probability of a positive test result, given the person has the infection.
For m = m eff /p H and p H > 0, Table 1 defines the transition rates of the Markov-chain model {(S H1 (t), E H1 (t), I 1H1 (t), I 2H1 (t), R H1 (t), S H2 (t), E H2 (t), I 1H2 (t), I 2H2 (t), R H2 (t), S V (t), E 1V (t), E 2V (t), E 3V (t), I V (t), G V (t)) : t > 0} from state (s H1 , e H1 , i 1H1 , i 2H1 , r H1 , s H2 , e H2 , i 1H2 , i 2H2 , r H2 , s V , e 1V , e 2V , e 3V , i V , g V ). For m → ∞ and p H → 0 (when the vector-to-host ratio, V:H → ∞), and tsetse. Humans can become exposed, E H , on a bite from an infective tsetse and progress through the infection stages, I H1 and I H2 , before moving to the non-infectious class R H due to hospitalisation. The birth rates B Hi are given by µ H N Hi for i = 1, 2 and B V is equal to µ V N H . Active screening moves exposed or infected people directly to the hospitalised class. Unfed tsetse can become exposed and infected, E V and I V , on consumption of a blood-meal, of which a proportion are taken on humans. Alternatively, first blood-meals not resulting in exposure means tsetse are less susceptible to trypanosomes in future meals, . The transmission of infection between humans and tsetse is shown by grey paths. Infected animals are not considered.
the transition rates remain the same in the human population, but now, since we are considering an infinite number of vectors, an ordinary differential equation approximation is used to describe the vectors. In simulating the model, the tau-leaping algorithm is used, whereby the states and rates are updated with a fixed time step at which the ordinary differential equations are also updated.
In our analysis, the endemic disease equilibrium is calculated as the steady state of the deterministic model, excluding control measures other than basic passive surveillance [8]. Initial conditions for individual villages are taken by sampling from a binomial distribution, where the probability of being selected in a class is given by the proportion in that class in the steady state of the model. Using random initial conditions with the mean at endemic equilibrium removes the effects of rounding the initial conditions to the same integer values for each simulation.
A sample realisation of the infection dynamics in a population with three active screening events, where randomly sampled infected low risk individuals are moved directly into the hospitalised class, is shown in Figure 2. Table 2 defines the parameters used in the model for HAT infection dynamics. These parameter values are taken from Rock et al. [8], which were either sourced from literature, where well-defined, or otherwise (in the case of m eff , R 1 , R 2 , r and u) taken as the median of the distribution obtained by model fitting using a Metropolis-Hastings MCMC algorithm that matched the deterministic version of the model to incidence data from the WHO HAT Atlas.

Parameters
Since the human population is partitioned into low and high risk individuals, we also denote to give the proportion of blood meals on each of the two risk groups, which sum to the total proportion of blood meals tsetse take on humans, f H . Furthermore, we note that the effective density of tsetse is given by the product of the relative tsetse density and the probability of human infection per single infective bite, m eff = mp H .

Risk structure
The risk structure in the human population is included, since there is evidence for differences in screening attendance and tsetse exposure within the population [5]. Furthermore, using deviance information criterion (DIC) as the method of model selection, there was strong evidence for the inclusion of risk structure [8]. Figure 3 shows the comparison of the models with the reported incidence data. The marginal improvement of the fit when an animal reservoir was added to the model meant animals were not considered here. The DIC was calculated by the following: where LL is the log-likelihood of unknown parameters θ, and whereθ is the expectation of these parameters.
3σ V e 1V Death in exposed 1 3σ V e 2V Death in exposed 2 A single realisation of the simulated infection dynamics of the model. A population of 514 is used with 404, 430 and 52 of the low risk population randomly selected for active screening after three, four and nine years, respectively. Note that the number of hospitalised individuals increases at the first two active screening events, but no infected individuals are randomly selected and detected in the third screening event. We only plot those infected (or affected) by HAT, although we note that the vast majority of both populations are susceptible.

Tsetse density
One fundamental parameter of the model, the effective density of tsetse, is equal to the product of the vector-to-host ratio and the probability of human infection per single infective bite. In the original deterministic framework, these constituent terms did not need to be known, as it was only the product that was important; therefore they could not be inferred separately. However, since the stochastic model explicitly captures the number of tsetse, the two parameters are now required. Given the limited data to estimate either component parameter, we choose to explore the potential range of parameters: from very high probability of infection and hence low vector-to-host ratio, to very low probability of infection and large vector-to-host ratio such that the dynamics of tsetse can be modelled deterministically.
Considering the simplest case in the absence of disease-control measures, we find that different values for the vector-to-host ratio have a relatively limited impact on predictions of disease persistence. Unsurprisingly, large tsetse populations, which can be approximated by deterministic dynamics, lead to the greatest persistence (Fig 4); thus, throughout this paper, we utilise this worst-case scenario, although other assumptions do not change the qualitative findings.

Isolated populations
For the majority of results in the main paper, we assume that importations of infection between villages do not play a significant role and so treat them independently. In many cases, such as community-level persistence, this approach is justified, as we wish to understand the stochastic dynamics in the absence of confounding imports. We also quantify the level of infectious imports by considering the probability of a village having infection present on the first screening and assuming this represents a long-term equilibrium solution, since this is the infection level before regular active screening began. By matching the WHO HAT Atlas data to the output of the model with an additional per person importation rate parameter (δ) we find that, using least-squares fitting, δ ≈ 3.4 × 10 −6 days −1 . This rate is small and so can be reasonably ignored on the studied time scales, particularly as there is some probability that this importation will not cause any subsequent transmission. In addition, it is reasonable to assume that this importation rate will decay over time, since the total case numbers in the DRC is reducing [6]. By fitting the reduction in total cases in the DRC to an exponential decay function, we get a decay rate of 0.1071, which we apply to the import rate (Fig 5). As a validation of our assumption that importations can be safely ignored over the time-scales of this study, we re-examine the match between active-screenings that fail to detect cases and model results including imports (Fig 6). This is a counter-point to Fig 2 in the main paper. We find that the inclusion of imports has an extremely small difference on the model predictions: slightly improving the fit between model and data (Fig 6A). However, under this new model, there is now an additional village where the model significantly predicts fewer zero-detections than observed, decreasing the percentage of villages within the model prediction intervals to 91.4% (Fig 6B).

Population dynamics
In all model simulations, we have assumed that the population size remains constant, such that when a death occurs, the individual is replaced by a new susceptible individual immediately. We have analysed this assumption by comparing this with different population dynamics (Fig 7). We consider four additional population assumptions, all where births and deaths have been decoupled. These are: births and deaths are proportional to the number of individuals of that infection status; deaths are proportional, but births are at the constant rate from the initial equilibrium; and the same two mechanisms, but where the birth rate is increased by the population growth of 2.6%. As the populations change in size, we keep the effective tsetse density constant.
We see that the probability of gHAT persistence as time progresses is very similar for all five assumptions, while it does diverge slightly with time (Fig 7). For a population size of 2,000, after 15 years, the probability of gHAT persistence using all five assumptions gives results less than 0.01 apart and the confidence intervals all overlap. For a smaller population size, there is a bit more variation and the confidence intervals do not all overlap. The assumptions where the population size grows by 2.6% each year has a higher probability of gHAT persistence, although the maximum difference is still only approximately 0.012 and this represents a significant increase in the village population over the time span. Where there is no population growth the confidence intervals all overlap.
Thus, we can use the assumption of constant population size, since it does not significantly change the results, excluding large increases in population size for small populations, and is significantly easier to implement. Keeping the population size constant also decouples the stochastic extinction of the gHAT infection from any underlying population dynamics. If we were to introduce more elaborate demographics into the model structure, it would be unclear if the extinction of infection was due to the stochastic process of transmission or the fluctuations in population size. Furthermore, we can associate each simulation of a village with a single population size, and can hence show probability of gHAT persistence as a function of population size.

Tau-leaping
Since we use a fixed step size for the tau-leaping algorithm of 0.1 days, Figure 8 shows that the results for persistence of HAT after 15 years are qualitatively the same for time steps of 0.01 and 1 days, so we are confident that a time step of 0.1 days is not introducing numerical errors. 95% confidence intervals are all small, in the range (0.0044, 0.0196) about the plotted mean values.

Data Active screening data
In the main manuscript, we note that not all settlements undergo the same levels of screening. In fact, larger village populations tend to have more frequent screening, but with a lower proportion of the population tested; small populations tend to have high coverage but infrequent screens (Fig 9A and  B). The smaller villages were also not typically screened in 2000-2008, while villages with populations greater than 500 were screened over the whole data period 2000-2012 (Fig 9C). We do not account for these differences in our analysis but can use these trends to help explain any discrepancies between models and the data.
In addition, the active screening data from the WHO HAT Atlas were used to calculate the mean annual screening coverage as 21%. Population sizes were taken from the census data within the WHO HAT Atlas and then modified by assuming an annual population growth of 2.6%. Coverages were then calculated as the number of people screened divided by the estimated population size, assuming a maximum coverage equal to that of the entire the low risk population size (92.4%), as described in the main manuscript. The mean annual coverage was then obtained as the mean of these coverages, where active screening occurred, as well as the 0% coverages, where no active screening was recorded. This value is thus highly dependent on the quality of the census information, as differences in population sizes will change this estimate of mean coverage. It is therefore important to have good census data for making predictions on the effect of different screening coverages.

Persistence and elimination
Local disease persistence As a control strategy, we find that randomly selecting the screening coverage from all reported coverages slightly out-performs screening annually at the mean coverage (21%), kept fixed across all years. High levels of coverage in any one year are more likely to interrupt transmission (Fig 10A). In addition, long-term persistence requires continued transmission and so the probability of persistence decreases with time (Fig 10B and C).

Re-invasion due to tsetse
We consider the dynamics of re-invasion, given different initial conditions to Figure 4 in the main manuscript. In a population without human infections but with endemic levels of tsetse infection, the chance that any human cases will be generated increases with settlement size (Fig 11A). This is due to tsetse being able to interact with fewer humans over their short lifespan, while new transmission events caused by human infection are less dependent on population size, as the infection will be present in the population for much longer. The probability of new transmission events is further increased by the addition of an infected human to endemic levels of tsetse infection (Fig 11B).

Proportion of positive tests in a screening
Comparing the data on the proportion of tests that result in a positive detection of HAT to the output of the model demonstrates similar results to comparing the screenings with no detected cases, as in Figure 2 from the main manuscript (Fig 12). The difference in coverages of an active screening event mean the results have more variation than simply not detecting any new cases, yet the model is similarly capturing the behaviour seen in the data; there are a few more settlements that stand out as significantly different from the model (69 settlements, 12.3%), but these are similarly clustered where expected.

Definition of a full screen
In calculating the probability of infection on detecting no cases, a zero-detection event is different for different screening percentages. We use 20% of the population screened to define a 'full screening'; however, similar results are achieved for the screening regime of Yasa-Bonga and Mosango 2000-2012 when 10% is used. If 50% is used, and so all screenings are of reasonable quality, there is more confidence that the same number of consecutive zero screenings are a proxy for no infection in the population (Fig 13). When screening coverage is higher and no infection is detected, there is more certainty that no infection is present.