Age trends in asymptomatic and symptomatic Leishmania donovani infection in the Indian subcontinent: A review and analysis of data from diagnostic and epidemiological studies

Background Age patterns in asymptomatic and symptomatic infection with Leishmania donovani, the causative agent of visceral leishmaniasis (VL) in the Indian subcontinent (ISC), are currently poorly understood. Age-stratified serology and infection incidence have been used to assess transmission levels of other diseases, which suggests that they may also be of use for monitoring and targeting control programmes to achieve elimination of VL and should be included in VL transmission dynamic models. We therefore analysed available age-stratified data on both disease incidence and prevalence of immune markers with the aim of collating the currently available data, estimating rates of infection, and informing modelling and future data collection. Methodology/Principal findings A systematic literature search yielded 13 infection prevalence and 7 VL incidence studies meeting the inclusion criteria. Statistical tests were performed to identify trends by age, and according to diagnostic cut-off. Simple reversible catalytic models with age-independent and age-dependent infection rates were fitted to the prevalence data to estimate infection and reversion rates, and to test different hypotheses about the origin of variation in these rates. Most of the studies showed an increase in infection prevalence with age: from ≲10% seroprevalence (<20% Leishmanin skin test (LST) positivity) for 0-10-year-olds to >10% seroprevalence (>20% LST-positivity) for 30-40-year-olds, but overall prevalence varied considerably between studies. VL incidence was lower amongst 0-5-year-olds than older age groups in most studies; most showing a peak in incidence between ages 5 and 20. The age-independent catalytic model provided the best overall fit to the infection prevalence data, but the estimated rates for the less parsimonious age-dependent model were much closer to estimates from longitudinal studies, suggesting that infection rates may increase with age. Conclusions/Significance Age patterns in asymptomatic infection prevalence and VL incidence in the ISC vary considerably with geographical location and time period. The increase in infection prevalence with age and peaked age-VL-incidence distribution may be due to lower exposure to infectious sandfly bites in young children, but also suggest that acquired immunity to the parasite increases with age. However, poor standardisation of serological tests makes it difficult to compare data from different studies and draw firm conclusions about drivers of variation in observed age patterns.


Reversible catalytic model
The general form of the reversible catalytic model fitted to the data is as follows. The transmission dynamics are assumed to be in equilibrium, such that the prevalence of infection (as determined by sero-/LST-/PCR-positivity) only varies with age a and not time. Sero-/LST-/PCR-negative individuals, whose prevalence is denoted by s, become sero-/LST-/PCR-positive (prevalence p) at a certain, potentially study-, test-and age-dependent, rate λ i (a) (which we refer to as the rate of infection (ROI)), and revert to sero-/LST-/PCR-negativity at an age-independent, but potentially study-and test-dependent, rate γ i , where i ∈ {1, . . . , N s } or i ∈ {DAT, rK39, LST, PCR} and N s is the number of studies, i.e. ds da = −λ i (a)s + γ i p, Equations (1)- (3) can be reduced to an initial value problem for p since s + p = 1, Data from Hasker et al [1] suggests that seroconversion rate increases with age. To test whether the conversion rate is age-dependent we consider different forms of λ(a): • Constant (age-independent) ROI: λ i (a) = b 0,i .
For this form, the solution of (4) is • Age-dependent ROI: We assume that the rate of conversion to sero-/LST-/PCR-positivity increases linearly with age, based on the data in [1] (see below) where b 1,i ≥ 0 is the rate at which the conversion rate increases with age. The initial value problem for p does not have a simple closed form solution in this case.
Since the conversion and reversion rates may also vary with the location and time period in which the study was performed, and/or the test used, we compare the model fit under different assumptions about the study-and test-dependence as shown in Table 1. We note that the age-independent models are nested inside the age-dependent models (they are obtained by setting b 1,i = 0).

Parameter estimation
The catalytic model was fitted to the infection prevalence data from the studies in Table 3 of the main text to estimate the ROIs (the baseline ROIs, b 0 = (b 0,i ) i=1,...,Ns , and rates of increase with age, b 1 = (b 1,i ) i=1,...,Ns , where applicable) and reversion rates (γ = (γ i ) i=1,...,Ns ) using maximum likelihood estimation. The overall binomial likelihood is given by where N i is the number of age groups in study i, p i,j = p(a i,j ; λ i (a i,j ), γ i ) is the proportion positive in age group j according to the model (equation (4)) (with a i,j taken as the mid-point of the the jth age group in study i), and n i,j and k i,j are the total number of individuals and the number that tested positive in age group j in study i.
We also fitted the model with a constant ROI and with an age-dependent ROI to the DAT seroprevalence and seroconversion data from Hasker et al [1] to confirm that the age-dependent conversion rate provides a better fit to this data. Given the form of the model, the number of seroconversions in age group j is Poisson distributed with rate parameter λ(a j )s(a j )m j (where m j is the number of individuals in age group j), so the overall log-likelihood of the data is All code was developed in MATLAB R2016b [2] and is freely available at https://github.com/LloydChapman/VLageTrendsAnalysis. The mle function in MATLAB's Statistics and Machine Learning Toolbox was used to find the maximum likelihood estimates (MLEs) and, where appropriate, calculate their approximate 95% confidence intervals (CIs) using the Hessian of the log-likelihood surface at the MLE, approximating the surface as Normal (confidence intervals were not calculated where the likelihood surface was non-Normal).

Model comparison
The different models in Table 1 were compared in terms of their ability to fit the data using the Akaike information criterion (AIC), calculated from the overall likelihood L as: where N p is the total number of parameters in the model. The model with the lowest AIC was selected as the best-fitting model.

Hasker et al (2013) data
The results of fitting the age-independent and age-dependent ROI models to just the seroprevalence and seroconversion data from Hasker et al [1] are shown in Figure 1. The age-dependent model (bottom) has a much lower AIC (∆AIC = 312.9) and is clearly a much better fit to the data than the age-independent model (top). Thus, it made sense to test whether the age-dependent ROI model was a better fit across all the datasets than the age-independent model.  , to data on DAT prevalence (left) and seroconversion incidence (right) from Hasker et al [1]. Vertical lines show binomial and Poisson confidence intervals for the prevalence and seroconversion incidence in each age group.

Model selection
The overall AICs for the different models (Table 1) fitted to the data from all the infection prevalence studies in Table 3 in the main text are presented in Table 2. The best-fitting model is Model 6, the age-independent, study-specific conversion and reversion rate model. This model provides a far better fit to the data than any of the other age-independent or age-dependent models with other combinations of test-and/or study-specific conversion and reversion rates, except for Model 6a, the age-dependent model with study-specific rates, for which ∆AIC=6.5. The fact that the models with study-specific rates fit the data best suggests that the main source of variation in the rate estimates is the study that the data comes from, as opposed to the type of test used. This is likely to be due to genuine differences in the infection rate between different locations and time periods, e.g. with differences in clinical VL incidence, but may also reflect differences in test standardisation and protocols between studies. Although it is not obvious why the reversion rate should depend on the study, the study may be a proxy for other factors that affect the reversion rate, such as previous exposure and time since infection. Table 2. Akaike information criterion (AIC) values for the different models fitted to the data and the number of fitted parameters in each model,

Parameter estimates
The ROI and reversion rate estimates and corresponding AIC value for each of the studies/study combinations for the different models are shown in Tables 3-4. Figure 3 shows the fits of the best-fitting model, Model 6 (the age-independent model with study-specific rates), for the individual studies. The fits are reasonable with the prevalence estimated from the model being within the confidence intervals of all the prevalence point estimates for the different age groups in the data for nearly all the studies. However, the model does not provide a good description of the trends in some studies, e.g. Bern et al, 2007 [3].
Although Model 6 is a better fit than Model 6a (its age-dependent equivalent) overall, the differences in the AIC values are small for most of the studies (∆AIC ≤ 2), for 4 of the larger studies (Hasker et al [1] DAT and rK39, Singh et al [4] DAT, and Bern et al [5] LST) the AIC is actually lower for Model 6a, and Model 6a has a higher likelihood. Given this and the fact that the rate estimates from the age-dependent model agree more closely with those from longitudinal data (see Figure 2 and main text), we cannot discount the possibility that the ROI increases with age. However, there are several other potentially important factors, such as spatial variation in the ROI, that have not been taken into account.