Proposing a Compartmental Model for Leprosy and Parameterizing Using Regional Incidence in Brazil

Hansen’s disease (HD), or leprosy, is still considered a public health risk in much of Brazil. Understanding the dynamics of the infection at a regional level can aid in identification of targets to improve control. A compartmental continuous-time model for leprosy dynamics was designed based on understanding of the biology of the infection. The transmission coefficients for the model and the rate of detection were fit for each region using Approximate Bayesian Computation applied to paucibacillary and multibacillary incidence data over the period of 2000 to 2010, and model fit was validated on incidence data from 2011 to 2012. Regional variation was noted in detection rate, with cases in the Midwest estimated to be infectious for 10 years prior to detection compared to 5 years for most other regions. Posterior predictions for the model estimated that elimination of leprosy as a public health risk would require, on average, 44–45 years in the three regions with the highest prevalence. The model is easily adaptable to other settings, and can be studied to determine the efficacy of improved case finding on leprosy control.


Introduction
Hansen's disease (HD, or leprosy) is a chronic progressive disease caused in Brazil by infection with Mycobacterium leprae. Transmission is most likely through nasal droplets [1], and is associated with socioeconomic status [2]. While leprosy is curable through chemotherapy [3], detection is often delayed [1], leading to more serious sequelae (including disfigurement and disability). The World Health Organization (WHO) has set a goal for elimination of leprosy as a public health problem, defined as a prevalence of <1/10,000 [4]. Several countries, including Brazil, have failed to meet that goal [5].
The Brazilian leprosy control program has been successful in decreasing the incidence of leprosy, but the prevalence remains high in 2 regions [5]. Movement towards elimination seems to have stagnated in these regions, possibly due to a downgrading of the importance of case finding [6]. Treatment of leprosy has been decentralized, so regional differences in case detection, and disease progression are to be expected [7]. Infection hotspots have also been noted in Brazil [8], leading to regional and sub-regional differences in transmission rates [5]. These may be related to socioeconomic factors, as a systematic review has found that socioeconomic inequalities associated with leprosy were large [9]. Prediction models must take these regional differences into account in order to accurately represent these differences and identify possible control points.
A number of models of leprosy have been proposed [10][11][12][13][14][15][16][17][18][19][20][21], and 6 of the base models from these studies were recently fitted to regional data from Brazil [22]. However, only one model takes into account much of the recent research on leprosy susceptibility [10,18], and it is an agent-based model that relies on specific population structures; the results of this model are quite useful on a regional level [23,24], but have not been applied to national-level results. The goal of this research is to produce a compartmental model that represents current understanding of leprosy susceptibility and pathogenesis, but that is also easily adaptable to different populations. Unknown parameters for this model will be fitted to regional incidence data from Brazil and analyzed to determine differences in control efficacy and their effect on the elimination target and long-term control.

Methods
All human data was anonymized at the source before usage [25]. As these data were publicly available and fully anonymized, no institutional review board approval was required. A deterministic compartment model of leprosy (Fig 1) was designed to take into account current understanding of the disease. Briefly, individuals are divided into 3 categories: resistant (R, with probability q S = 1-p s ), susceptible to paucibacillary infection (S P , with probability p s Ã p p ), and susceptible to multibacillary infection (S M , with probability p s Ã q p where q P = 1-p p ).
Resistance, q s , is meant to convey both genetic resistance and socioeconomic protective factors [13,18]. Resistance to multibacillary infection, p p , is meant to convey genetic resistance [18]; this value is higher than the observed proportion of new cases that are PB (0.8 vs. 0.54), but the discrepancy is explained by the high rate of self-cure among PB cases (α PN ). Resistant individuals (R) enter and leave the population without infection. Individuals with susceptibility to leprosy but genetic resistance to MB disease enter the paucibacillary (PB) track as susceptible (S P ). They may be exposed (E P ) at rate λ and eventually develop symptomatic PB disease (N P ) at rate γ P . Paucibacillary disease either self-heals at rate α PN or is detected and leads to treatment (T P ) at rate φ P , either of which results in recovery (R P ) at rate α PT . Recovered individuals may relapse to refractory disease (A P ) at rate σ P , from which they can be detected and return to treatment at rate φ P . Individuals with a genetic susceptibility to MB disease enter the population as susceptible (S M ) and may become exposed (E M ) at rate λ. Exposed individuals develop multibacillary disease (N M ) at rate γ M and are diagnosed and entered into treatment (T M ) at rate φ M . Treated individuals recover or leave treatment (R M ) at rate α M and may relapse to refractory disease (A M ) at rate σ M , from which they may be detected and return to treatment at rate φ M . Multibacillary individuals are subject to a death rate that is proportionately higher than the general population, at ν M for untreated individuals and ν MT for treated individuals. The force of infection, λ, is calculated as for the density-dependent model, where N is the total population size, β P is the contribution of PB individuals to the force of infection, β M is the contribution of MB individuals to the force of infection, θ 1 and θ 2 are the proportional decreases in infectiousness due to treatment of PB and MB individuals, respectively. Total population size (N) is calculated as the sum of all the compartments, and varies over time as births and deaths occur. For the sake of simplification, it is assumed that treated individuals are quickly rendered non-infectious [3] and therefore that θ 1 = θ 2 = 0. Thus, the force of infection becomes in the density-dependent model. Good estimates were available in the literature for most system parameters (see Table 1). However, estimates were unavailable for the transmission coefficients (β P and β M ) and the true case detection rates (φ P and φ M ), as these are likely to vary by locality and can be difficult to measure directly. These parameters were therefore estimated using the Sequential Monte Carlo approximate Bayesian Computation (SMC ABC) algorithm [26,27] applied to Brazilian incidence data, as described previously [22]. Briefly, the annual incidence of PB leprosy, inc PB obs ðyÞ, and the annual incidence of MB leprosy,inc MB obs ðyÞ, were obtained for each of the 5 regions of Brazil between 2000 and 2012 [28]. For each region, an initial parameter set (n = 100,000) was sampled from the prior distributions of the estimated parameters (a uniform distribution with a non-informative range, Table 1), and in subsequent SMC particles (rounds), parameter sets were sampled from the immediately previous particle with a perturbation kernel. Each parameter set was used to simulate the incidence values between 2000 and 2010 (allowing 2011-2012 to be used for unconstrained validation), where incidence was assumed to be new individuals entering the treated category (T P or T M ) from the untreated category (N P or N M ). A distance function was calculated using the equation where inc PB pred ðyÞ was assumed to be equal to φ P N P (y), or the number of new PB cases entering treatment in year y, and inc MB pred ðyÞ was assumed to be equal to φ M N M (y), or the number of new MB cases entering treatment in year y. It was assumed that recurrent infections (from A P or A M ) were not included in the observed incidence. A parameter set was accepted if d<τ, where τ was set equal to the 75 th percentile of d in the previous particle. In each particle, the algorithm was repeated until 100,000 parameter sets were accepted; 10 particles were produced, with the 10 th particle used to form the posterior distribution. The perturbation kernel was set to be a uniform distribution with a range limited by ± the variance of each parameter in the previous particle. Initial values in each of the compartments were determined analytically based on the parameters and observed prevalence. As the duration of treatment in MB disease is twice the duration of treatment in PB disease under MDT, the observed prevalence was assumed to be divided between T M (0) and T P (0) with the ratio 2 Ã incidenceðMBÞ incidenceðPBÞ . The ratio of N i (0): were set empirically in a multi-step process similar to that described previously [22]. Briefly, the ratios were adjusted manually for each region such that the model, simulated with the assumed values in Table 1, predicted the incidence of both PB and MB cases in that region with less than 10% deviation from the observed values in 2000 (the first year of observation) and 2002 (the year of peak incidence in most regions). The model was then fitted with the initial population distribution determined by these ratios, and the median of the estimated distribution for each fitted parameter was used to predict incidence of both PB and MB cases in each region. If the predicted incidences in 2000 or 2002 deviated from observed values in any region by more than 10%, the ratios were again adjusted manually to correct the deviation and the model was re-fitted. This process repeated until the median of the estimated distribution for each fitted parameter was able to predict incidence of PB and MB cases in each region with less than 10% deviation from observed values in both 2000 and 2002.
As density-dependent transmission was assumed, but is known to be a simplification of true human contact rates [32], the above process was repeated for a frequency-dependent transmission model. In this model, the force of infection λ f becomes where b f P and b f M are adjusted from the density-dependent model to account for population size. The results of the frequency and density dependent models were compared using Bayes factor analysis, in which the Bayes factor was the ratio of the summed distance in all regions, corrected for differences in regional population size, across a weighted sample of 1,000 posterior parameter sets.
The results of the best-fitting regional model (frequency or density dependent) were examined for similarity between distributions, and 3 hierarchical fittings were considered: transmission parameters (β M , β P ) shared across regions (V1), transition parameters (φ M , φ P ) shared across regions (V2), all parameters (β M , β P , φ M , φ P ) shared across regions (V3), and sharing no parameters (the regional model described above, V4). In the hierarchical models, the distance function was altered to where r represents the region and N r is the population of region r in 2000. Hierarchical models were compared to each other and the regional model using Bayes factor analysis, in which the Bayes factor was the ratio of the summed distance in all regions, corrected for differences in regional population size, across a sample of 1,000 posterior parameter sets weighted by the inverse of their summed regional distances (Eq 5).
In order to check the consistency of the model results, data were simulated for each region using the median of the best fitted value from the preferred hierarchical model. These data were then used to repeat the full model selection and parameterization process, including hierarchical model selection and parameterization. Results were compared to the simulated input values.
Posterior predictions were produced using a weighted sample of 1,000 parameter sets from the posterior distribution of both the preferred hierarchical model and the regional model, and outcomes of interest were predicted from this sample. Outcomes were the time to elimination in years (t elim ) and the predicted incidence overall and of MB and PB cases in the year 2050 (i 2050 , iM 2050 , and iP 2050 , respectively).
All models and fitting were performed in R 3.0.3, [33] which was accessed through the Revolution R Analytics interface (copyright 2014 Revolution Analytics, Inc.).

Results
The median and range of each parameter for the each of the model fits are shown in Table 2. Transmission and transition parameters were similar between the density-dependent and frequency-dependent models. Bayes factor analysis identified the frequency-dependent model as having the lowest summed deviance from the observed incidence. As a result, the frequencydependent model was used for hierarchical model fitting.
Bayes factor analysis identified the regional version as having the lowest summed deviance from the observed incidence, although the preference for the regional version was not strong (Bayes Factor of 1.2 to 2.1, compared to the hierarchical models). Transmission coefficients were estimated to be similar in all regions even in the regional model, but transition rates had high variability between regions. Transition rates were lower in the Midwest for all individuals, higher in the South and Southeast for MB individuals, and higher in the North for PB individuals.
The final distribution of the initial population, as determined by the empirical process, is shown in Fig 2 for each region and model fit. The number of latent and undetected individuals was the most variable across models, with the density-dependent model requiring higher numbers of latent individuals to reproduce the initial and peak incidence in each region.
Posterior predictions for the preferred hierarchical model and the regional model are shown in Fig 3 and Table 3. The results show that the fit underestimated PB incidence in the North and Midwest and MB incidence in most regions in later years. The South and Southeast reported incidences below the elimination threshold in 2001, and this was also predicted to be possible by the model, although the average time to elimination was predicted to be 2002 and 2007, respectively. On average, the North and Midwest were predicted to reach the elimination threshold by 2045, while the Northeast was predicted to reach the elimination threshold by 2044. However, the ranges of values were wide, indicating that the Northeast could require up to the year 2053 to reach the elimination threshold.
Simulation results (Table 4) show that the model was able to predict the simulated values in most cases. The exceptions were the values of φ M and φ P , which tended to overestimate the true value, and the values of β M and β P in the density-dependent model, which tended to underestimate the true value.

Discussion
This study presents a compartmental model for Hansen's Disease that takes into account the current understanding of the disease but that is computationally simple and easy to adapt. The structure of this model differs from that of Meima et al. [11] in 2 ways. First, this model assumes that the 90% of people who never develop leprosy are inherently resistant, rather than self-healing. Second, this model assumes that the 80% of susceptibles who will only develop PB disease are again inherently resistant to MB disease. That allows us to separate the relapsed cases appropriately, such that only those recovered from MB disease relapse to MB disease. These assumptions also allow for including differences of susceptibility in a compartmental model, which is less computationally intensive, easier to fit to data, and easy to adapt to other populations or across larger and more diverse regions. However, Ridley-Jopling classification allows for borderline cases which can cross between PB and MB groups during relapse. [34] While it would be advantageous to capture the full diversity of leprosy presentation in future models, the parameterization of such models will require improved reporting; at present, Brazil reports only the PB and MB classifications of detected cases. [25] Many studies at the national or regional level will report a "case detection rate". This, however, is not the true case finding rate represented by φ M and φ P in this model. It is, instead, the annual observed incidence, and uses the population as the denominator, not the delay in diagnosis. This is an important difference: a high "case detection rate" could indicate an outbreak rather than fast detection, while the true case finding rate only represents the time necessary to Table 2. Posterior distribution median and 95% prediction intervals determined by ABC fitting of Approximate Bayesian Computation models for Hansen's Disease to data from the 5 regions of Brazil. Version 4 consisted of fitting the regional best-fit model to each region's observed data separately with both frequency and density-dependent transmission assumptions; all other versions used a hierarchical structure with density-dependent transmission in which at least some parameters were shared across regions, and fitting was done simultaneously across all 5 regions. Mean error refers to the average value of d per iteration of each version, based on a sample of 1,000 iterations, with confidence intervals based on 100 samples of 100 iterations each.    identify a clinical case. We found that the range of potential case finding rates was fairly similar across most regions, but that cases were estimated to be infectious for 5 years in most cases before treatment was initiated. This agrees with the findings of multiple studies [35,36], where people delayed seeking treatment partly from an assumption that the symptoms were not serious and potentially from a worry of the stigma attached to a diagnosis [37]. It also falls into the range assumed by previous models within Brazil for this time period [24]. In a modeling study, Table 3. Posterior predictions (mean and range) from the regional model for Hansen's Disease fit to regional data from Brazil. Incidence of Hansen's Disease in the year 2050 is reported overall (i 2050 ) and for multibacillary (iM 2050 ) and paucibacillary (iP 2050 ). The time to elimination (t elim ) was calculated as the year in which overall incidence was 1/10,000, starting from 2001.  Fischer et al. [10] found that contact tracing was important to avoid the diagnostic delay; contact tracing was decreased in India in order to meet WHO case detection rate goals, and the result was treatment delays [38], which would produce a long-term effect of higher incidence and, eventually, higher case detection rates. Importantly, our model predicted that cases in the Midwest were infectious for an average of 100 years before detection, an unrealistic value indicating that the true detection rate of cases is too low to estimate properly. If true, this could indicate a public health failing that should be addressed. This difference in detection rate between regions could be socio-economic in origin, as the three regions predicted to have low detection rates (Northeast, North, and Midwest) also have lower GDP per capita. Similar regional health disparities have been noted for ischaemic heart disease [39] and laryngeal cancer [40] mortality, suggesting regional disparities in health care [41].
The decision was made to compare density-and frequency-dependent models, despite the fact that leprosy is considered to be a disease of close contact and therefore would be classically considered to have density-dependent transmission. This is due to the limitations of a model such as this, caused by the homogeneous mixing assumption, in capturing the limited number of close contacts any individual is likely to have. Thus, while a disease may be truthfully density-dependent, it may behave mathematically as a frequency-dependent disease. The results of this study show that the density-dependent model was slightly preferred to the frequencydependent model. This question would not arise with an individual-based model, such as SIM-COLEP [42], but those models are not as easy to adapt as they must rely on population-specific characteristics in their design, which may require parameters that are not locally available. The goal of this study was to provide an adaptable model that was still able to capture the regional dynamics of leprosy spread. The preference of the regional model supported this decision, but the preference was not strong, indicating that some national-level models may be as informative as the regional models.
Several models assume that PB individuals are non-infectious [11]. We observed that PB individuals did contribute to the force of infection, although with roughly half to two-thirds the strength of MB individuals. In other mycobacterial diseases, less infectious individuals have been found to be potentially important in maintaining the endemicity of the infection [43,44]. This highlights the importance of diagnosing and treating all cases; although the PB cases do not have as serious sequelae as the MB cases, they may serve to maintain the infection in a region. With regards to the intra-regional variation in transmission parameters, we found that higher transmission parameters were predicted in regions with higher incidence. This is to be expected, and highlights the ability of the model fitting to identify regional differences.
The results of the scenario analysis show that all regions are well on-track to eliminate leprosy, with the South and Southeast, which have the lowest incidence, likely to eliminate leprosy first. The posterior prediction plots (Fig 2) show that the fitted model estimated the observed decrease in PB incidence fairly accurately in most regions, but slightly overestimated the decrease in incidence of MB cases in all but the Midwest. It may be assumed, therefore, that these results are best-case scenarios for those regions. The predicted incidence in the Northern region is higher than has recently been predicted for Para State, which is within that region, but the time to the elimination target generally agrees between the two models [24]. All regions observed an increase in incidence up to 2003, followed by a slow decrease. This is likely due to the slow impact of control programs, rather than a change in case detection rates; chronic diseases with long latent periods, like leprosy, will require consistent control over long periods of time to reverse incidence trends.
It is important to note that the incidence of MB disease was predicted to be in the range of 13,983 to 14,913 new MB cases in Brazil in 2015. However, the North, Northeast, and Midwest are likely to require a much longer period to reach official elimination than the South and Southeast. The best way to decrease the time to elimination would be increasing the case finding rate [45]. This would also improve the level of disability in new cases, as delay in onset of treatment is a major cause of disability. Our results, therefore, indicate that the North, Northeast, and Midwest regions of Brazil would benefit from improving the true case finding rate, which we have estimated to be slow.