Different Mechanisms for Heterogeneity in Leprosy Susceptibility Can Explain Disease Clustering within Households

The epidemiology of leprosy is characterized by heterogeneity in susceptibility and clustering of disease within households. We aim to assess the extent to which different mechanisms for heterogeneity in leprosy susceptibility can explain household clustering as observed in a large study among contacts of leprosy patients. We used a microsimulation model, parameterizing it with data from over 20,000 contacts of leprosy patients in Bangladesh. We simulated six mechanisms producing heterogeneity in susceptibility: (1) susceptibility was allocated at random to persons (i.e. no additional mechanism), (2) a household factor, (3, 4) a genetic factor (dominant or recessive), or (5, 6) half a household factor and half genetic. We further assumed that a fraction of 5%, 10%, and 20% of the population was susceptible, leading to a total of 18 scenarios to be fitted to the data. We obtained an acceptable fit for each of the six mechanisms, thereby excluding none of the possible underlying mechanisms for heterogeneity of susceptibility to leprosy. However, the distribution of leprosy among contacts did differ between mechanisms, and predicted trends in the declining leprosy case detection were dependent on the assumed mechanism, with genetic-based susceptibility showing the slowest decline. Clustering of leprosy within households is partially caused by an increased transmission within households independent of the leprosy susceptibility mechanism. Even a large and detailed data set on contacts of leprosy patients could not unequivocally reveal the mechanism most likely responsible for heterogeneity in leprosy susceptibility.


Background
Leprosy is a disease caused by infection with the bacterium Mycobacterium leprae.
Leprosy evolves in a spectrum between two poles (tuberculoid and lepromatous leprosy). The infection is eventually cleared for those with tuberculoid leprosy, while the lepromatous form is chronic. Not all people are susceptible to leprosy, and a marked heterogeneity exists in this 20 susceptibility. This may be because of a resistance against infection or a sufficiently fast clearance of the infection to prevent disease 1 . For those developing leprosy, the incubation period of the disease is long, with 4 to 11 years depending on the type of leprosy 2 . Especially contacts of known leprosy patients are at risk of developing leprosy. These individuals have a higher exposure, but could also be more likely to be susceptible than people in the general 25 population due to shared environment -household-or familial relationship to the patient.
Modeling can aid to extrapolate trial outcomes of one study to whole populations or from a short time frame to longer time periods. This provides a way to compare different control strategies. Modeling is also a way of getting insight into underlying natural mechanisms, e.g.
the aforementioned heterogeneity in susceptibility. Models for leprosy have up to now not 30 taken into account the household structure of a population, and explicit genetic mechanisms 3,4 . We aim to assess dynamics on a household level, thus the household structure of population needs to be incorporated. As leprosy has a long incubation period 1 the timescale at which the disease evolves and the timescale at which households changes are comparable.
This means that a household often does not have the same composition at the end of the 35 infection as it was at the moment of infection. Individuals including those infected, can have moved out or new (possibly infected) individuals can have moved into the household. We take up the challenge to explicitly model the formation and change of household 5 . The model will be parameterized for northwest Bangladesh 6 and fitted to the detailed disease data of a trial in the same area 7 . 40 Microsimulation SIMCOLEP simulates leprosy transmission in a population structured by households that form and dissolve during the simulation. The model is a microsimulation or a stochastic individual-based model 8,9 . The model simulates the life history of fictitious individuals, including the household formation, and the natural history of infection with M. leprae. The 45 state of an individual changes during events that are scheduled in continuous time. The timing of events is determined by probability distributions, which is determined by the current state and history of an individual. The model is divided into two modules: a population module, and a disease module. The population module describes processes unrelated to disease or infection, such as birth, death and marriage. The disease module simulates processes of 50 infection and disease, including interventions.
A computer program was written in JAVA 1.5.0 12 to make the calculations of the model using a similar structure as STDSIM 13 . To explicitly simulate an infectious disease requires the simulation of many interacting individuals, and can become computationally demanding.
Reliable simulation of a relatively rare infectious disease, such as leprosy, requires a large 55 population. To keep computation time within reasonable limits, we used the MUSIDH method 14 with a setting of 50 disease histories to 1 life history. In short this method implies that every demographic life history (birth, death etc.) is used as if 50 individual have exactly the same demographic life history, while disease events differ between these 50 individuals.
This prevents the simulation of many demographic life histories. 60

Population module
The population grows with a time-dependent growth rate. In total we recognized three population growth phases (see Figure S-1) and choose to model population growth with exponential growth during these three phases. For the population before 1800, we assumed a constant population (i.e. a growth rate of 0 y -1 ). The second phase of slow growth from 1800 65 until 1950 occurred with a rate of 0.007 y -1 . From 1950 onwards the population grows with a rate of 0.0235 y -1 . The population growth-curve after 1800 was obtained from extrapolations based on census data 10,15 . The population size is kept at the required size by replacing deaths by births, and population growth is accomplished by additional births. We assumed a closed At birth, a new individual is created and the age of death is determined by a sex-dependent survival curve, which changes with calendar time (see Figure S-2). We used the available survival data from 1961 until 2000 10,15 . Survival data previous to 1961 were not available, and therefore we used the survival curve of 1961 for all years previous to that. 75 The newly created individual is placed into a household in which a married female is available as mother. The actual mother is randomly selected from all married females weighed by her age. The age-weighed selection of a mother is based upon age specific birth rates 15 . The birth rates for 1995 are shown in Figure S Unmarried males and females can be coupled during wedding events, which are scheduled 80 such that the proportion of married people in each age group matches census data (Figure S-4 15 ). After the death of a married person, the surviving spouse is again a candidate for marriage again. At marriage, 25% of couples create a new household; for the other couples the female will become member of the household of the male. In the latter case, the household will split up with a rate of 0.083 y -1 (i.e. after 12 years), and the married couple and their possible 85 children will create a new household. Data to directly quantify the parameters for the model of movement of people are unavailable, 95 and therefore the above-mentioned values were obtained by calibrating the model to mimic the distribution of household sizes in Nilphamari district in Bangladesh and the percentage of people that moved during a 2-year period. The observed average household size was 4.6 (ranging from 3.9 to 5.9 between villages), and 3.1% of the population moved per year (ranging from 2.0% to 3.6% between villages) 6 . The calibration of parameters gave an 100 average household size of 4.3 and the movement rate was 2.9% per year. Simulated household sizes were slightly larger than observed, but the household size distribution did not differ significantly from the data 6 (χ 2 test, p = 0.25, Figure S

Disease module 105
The disease module exists of four separate, but interacting components: transmission, natural history of infection, allocation of susceptibility and type of leprosy, and interventions.

Transmission
Transmission occurs during events in which an infectious individual has contact with a susceptible individual. We modeled two transmission processes (1) in the general population 110 and (2)  transmission process is called frequency dependent transmission (or mass action) 17 , which means that the number of contact events per individual per time unit (i.e. year) is independent of the population size.
Additional to these infections, a within-household transmission process is modeled. A 130 susceptible living in a household with one or more infectious individuals can be infected within the household. The within-household transmission process is modeled by density dependent transmission (or pseudo mass action) 17   The self-healing type is never infectious 2 . The duration of the asymptomatic state is gamma distributed with mean 4.2 years and a standard deviation of 1.9 years 1,2 . In the symptomatic state, the self-healing type is detectable during examination and will be treated immediately after infection. The self-healing type is uninfected, and recovered without symptoms at the moment of self-healing. The time until self-healing from onset of symptoms 160 is exponentially distributed with rate 0.2 (i.e. mean duration of 5 years). The self-healing type is assumed never to be infectious.
The chronic infection has an asymptomatic period with mean 11.1 years and standard deviation of 5.0 years, and will be symptomatic until treatment or death of the individual.
During the asymptomatic period the infectivity of an individual, i.e. the probability of 165 infecting during a sufficiently close enough contact, increases linear to one at first symptoms. Treatment is given directly at detection and makes an infectious individual immediately non-infectious. Relapse of disease after treatment for both chronic and self-healing infections occurs with a rate depending on calendar time. Between 1970 and 1990 dapsone monotherapy is given, and relapses occur with a rate of 0.015 y -1 , and after full implementation of multi-170 drug therapy (MDT) in 1990 the relapse rate is 0.001 y -1 19 . Of all treated cases including those of the self-healing type, 90% will relapse as a chronic infection, and 10% as a selfhealing infection 20 .

Allocation of susceptibility and type of leprosy
The susceptibility of an individual is determined by one of six mechanisms of allocation of 175 susceptibility and of the type of leprosy (self-healing or chronic infection):  followed up yearly for 3 consecutive annual visits at each of them 90% will be examined.
Contacts can be diagnosed as "no disease" for individuals that are uninfected and in the asymptomatic states; furthermore 10% symptomatic cases are missed during examination and thus incorrectly given the diagnosis "no disease". The remaining 90% of symptomatic cases is diagnosed as mild disease for self-healing infections or severe disease for chronic infections. 220 BCG, a vaccine used against tuberculosis, has a protective effect against leprosy. In this study, we choose a life-long protective effect of 60% against infection with M. leprae 25,26 .

Estimation of contact rate parameters, c pop and c hh
The two contact rates, c pop and c hh , were estimated by fitting the model to data from the 230 DBLM registers in Nilphamari, and a study among contacts of leprosy patients by Moet et al 7 .
The model was fitted to three aspects of the data: (1) new case detection in 2003, (2) prevalence among contacts by 6 household size categories, and (3)   Eq. S-5 The overall log-likelihood was used to determine the fit of the combination of contact 260 rates. The combination of contact rates with the highest log-likelihood is the best model quantification. The fit to all datasets are combined in the log-likelihood function S-6.
Constant C is the sum of the parts of Equations S-3 to S-5 that do not depend on simulation outcomes, only on the data, and are therefore equal for any assumed combination of c pop and c hh . This constant can be ignored for the maximization. The log-likelihood ratio is the difference between the values of Equation S-6 for two different parameters sets obtained from two different models. The log-likelihood ratio times -2 is approximately χ 2 -distributed, which can be used to test whether two models are significantly 270 different.

Metamodel
We use a regression model as metamodel 28 The regression model Equation S-7 was estimated for all possible combinations of linear and log-transformed outcomes and parameters. For each scenario, the regression model with 280 the transformations yielding the highest adjusted R 2 -coefficient of determination-was used as metamodel. The metamodels were used to determine the best fitting parameter combination. The metamodel is used to find the optimal parameter values and the 95% confidence around such an optimum.