Modeling the persistence of Opisthorchis viverrini worm burden after mass-drug administration and education campaigns with systematic adherence

Opisthorchis viverrini is a parasitic liver fluke contracted by consumption of raw fish, which affects over 10 million people in Southeast Asia despite sustained control efforts. Chronic infections are a risk factor for the often fatal bile duct cancer, cholangiocarcinoma. Previous modeling predicted rapid elimination of O. viverrini following yearly mass drug administration (MDA) campaigns. However, field data collected in affected populations shows persistence of infection, including heavy worm burden, after many years of repeated interventions. A plausible explanation for this observation is systematic adherence of individuals in health campaigns, such as MDA and education, with some individuals consistently missing treatment. We developed an agent-based model of O. viverrini which allows us to introduce various heterogeneities including systematic adherence to MDA and education campaigns at the individual level. We validate the agent-based model by comparing it to a previously published population-based model. We estimate the degree of systematic adherence to MDA and education campaigns indirectly, using epidemiological data collected in Lao PDR before and after 5 years of repeated MDA, education and sanitation improvement campaigns. We predict the impact of interventions deployed singly and in combination, with and without the estimated systematic adherence. We show how systematic adherence can substantially increase the time required to achieve reductions in worm burden. However, we predict that yearly MDA campaigns alone can result in a strong reduction of moderate and heavy worm burden, even under systematic adherence. We predict latrines and education campaigns to be particularly important for the reduction in overall prevalence, and therefore, ultimately, elimination. Our findings show how systematic adherence can explain the observed persistence of worm burden; while emphasizing the benefit of interventions for the entire population, even under systematic adherence. At the same time, the results highlight the substantial opportunity to further reduce worm burden if patterns of systematic adherence can be overcome.


A.2 Entities, state variables and scales
Fig A shows a schematic of the entities contained in the ABM.There are N human individuals and the environment consisting of reservoir hosts that contribute to the transmission of O. viverrini.We choose to model worm burden in reservoir hosts at an aggregate level because the scientific questions we want to answer with the model revolve around the heterogeneity of worm burden in humans, because of limited data availability on reservoir host worm burden and to reduce computational costs.The state variables associated with the entities are listed in Table B. Transmission rate to cats from fish 1/(Day × Animals) Table C. Transmission parameters of the ABM.The function of the parameters is described in more detail in Section A. 7. Parameter values are determined by fitting to field data, as described in the main text.
community and we assume the proximity between humans at a small scale to be irrelevant for transmission given the life cycle of O. viverrini.Time t is measured in days.The time step size dt is constant over a simulation run is calculated by dividing 365 by the input parameter timesteps per year.We set timesteps per year to 120 for analyses in the main paper resulting in a dt of 3.042 days.The model does currently not support a time step smaller than one day.

A.3 Process overview and scheduling
Submodules are executed in the order listed below at each time step t to calculate the value of the state variables at time t + dt.This section provides an overview of the submodules with further details on each submodule given in Section A.7 and a description of parameters given in Tables D and E. to False, and beta multiplier (t+dt) i to 1.
6. "Reservoir hosts submodule": Calculates cats (t+dt) and dogs (t+dt) .Subtracts random number of worms from cats (t) and dogs (t) according to parasite mortality rates µ pc and µ pd .Adds random number of worms to cats (t) and dogs (t) depending on number of currently infected fish fish (t) and animal worm acquiring rates β cf and β df .
Calculates the eggs per gram as a nonlinear function of worms for all definitive hosts.Adds random number of infected snails to snails (t) depending on the sum of calculated eggs per gram (EPG) over definitive hosts and transmission rates β sh , β sc and β sd .Subtracts random number of infected snails from snails (t) according to snail mortality rate µ s .Adds random number of infected fish to fish (t) depending on number of infected snails snails (t) and transmission parameter β f s .Subtracts random number of infected fish from fish (t) according to mortality rate of fish µ f .8. "Summary statistics submodule": Calculates various summary statistics over the human population and stores them in a array together with the environment variables described in Table B.

A.4 Design concepts
Emergent behavior : The model does not exhibit emergent behavior.The pivotal characteristic of heterogeneous worm burden in individuals is imposed by the distribution of individual worm acquiring parameters as described in further detail in Section A.5.1.
Stochasticity: Stochasticity is introduced by using pseudorandom numbers in every submodule as well as during initialization.Stochasticity does play an important role for stochastic extinction.
Observation: The state variables of the environment are stored for every time step over the simulation period.For the population of human individuals, various summary statistics are calculated and stored for time t + dt after completing all the manipulations in time step t by the summary statistics submodule described in Section A.7.8.The entire population of individuals can be stored for all time steps as well but this option is usually turned off because of memory requirements.
The model does not make use of the further concepts described by ODD which are adaption, objectives, learning, prediction, sensing and interaction at the level of individuals as well as collectives.

A.5 Initialization
In the initialization phase, two 2-dimensional arrays are created.The first array, humans, contains an array for each human which tracks the variables listed in the upper part of Table B. The second array, pop, is initialized to contain as many vectors as there will be time steps throughout the model run.Each secondary array in pop stores the environment variables listed in the lower part of summary statistics that are calculated over the population of humans, for example the mean worm burden.This array is returned after a model run and used to produce the time series plots.The first element of this array will be populated with values as described in the following parts of this section.

A.5.1 Initialization of individuals in the humans array
The variable sex i is set randomly to 0 or 1, where the value 1 represents a male and is assigned with a probability given by parameter proportion male.Given sex i , each individual is assigned age (0) i in days with probabilities given by vector parameters population distribution male and population distribution female.Individuals aged above minimum age for worm infection are assigned random worm counts worms (0) i as specified by the vector parameters initial worm distribution probabilities and initial worm distribution values.Individuals aged above minimum age for worm infection have eating is set to 0 or 1 with probability latrine coverage.We do currently not differentiate between latrine availability and latrine use and therefore set latrine The mechanism by which β hf,i is distributed among individuals is specified by the parameter beta hf distribution, with the options being gamma and empirical worms.Because of the pivotal role of this distribution, we say that these options imply different transmission models.Each of the transmission models requires specific parameters: • gamma: The β hf,i of individuals is drawn from a gamma distribution parameterized by beta hf mean and beta hf variance.
• empirical worms: The β hf,i of individuals is drawn from a distribution determined by the vector parameters initial worm distribution probabilities and initial worm distribution values.First a worm count wi is drawn for each individual from the worm counts given by these two vectors.Subsequently, the individual's β hf,i is derived from the drawn wi using the relation log 10 (β hf,i ) = beta empirical worms a If the option align beta hf to initial worm burden is set to True, the β hf,i across individuals is distributed such that individuals with higher worms (t) i are assigned a higher β hf,i with the intention of reaching equilibrium quicker.

A.5.2 Initialization of reservoir hosts
The number of worms in dogs and cats are initialized as and where the resulting values are rounded to the closest integer.Similarly, the number of infected fish and infected snails are initialized as February 7, 2024 5/22 and where the resulting values are again rounded to the closest integer.

A.6 Input data
Input data is passed to the model as model parameters, which are described in the following sections and listed in Tables D and E. The parameters controlling the transmission to humans and for interventions are described in the respective submodule subsections and are different from the input data specified here in that they are fitted to data as described in the main text.We set the population size of humans N to 15, 000 corresponding to the population size at the site of data collection.We use 2015 data of the United Nations (UN) Demographic Statistics for Laos [3] to derive the values of human demography parameters proportion male, population distribution male, population distribution female, mortality male, and mortality female.We choose to use national demographic data instead of the demographic characteristics from study participants due to a lack of detailed demographic data for the region.The UN data is processed in the same manner for males and females in which we apply slight modifications to achieve a monotonically decreasing size of age groups with increasing age.This involves evenly distributing the population up to an age of twenty years as well as a slight rearrangement of the population aged over 90 years under the assumption that individuals can attain an age of at most 100 years.This results in vectors population distribution male and population distribution female with 100 × 365 elements giving the proportion of the population having a specific age in days.From these vectors, we calculate the age-dependent daily probability of death required in order to keep the age distribution steady over model runs with mortality male i = 1 − population distribution male i+1 population distribution male i (6 for i < (365 × 100) − 1 and mortality male (365×100)−1 = 1.Age-dependent probabilities of death for females are obtained analogously.The age distribution and daily mortality probability for females is depicted in Fig F. The frequency of pregnancy, which leads to exclusions from MDA campaigns, is provided by the parameters pregnancy age bins and pregnancy proportions, which contain the values listed in Table G.

A.6.2 Transformation worms to eggs per gram
We use a power law derived from multiple purging and autopsy studies to transform EPG to worm counts in individuals and vice versa [9] [5].
In absence of data for animals, we assume that a parasite produces the same number of EPG in cats and dogs.

A.6.3 Worm distribution in humans and feces quantities
The number of worms present in the initial population parameterized by initial worm distribution probabilities and initial worm distribution values are obtained by transforming the EPG data collected in the 2012 survey using the inverse ρ −1 of the function given in equation (7).We set the maximum number of worms a human can harbor max worms per person to 10 5 and the minimum age for worm infection to 2 × 365 days assuming younger children do not consume potentially contaminated dishes.We use WHO recommendations to categorize the intensity of O. viverrini infection as light (1 − 999 EPG), moderate (1, 000 − 9, 999 EPG) or heavy (≥ 10, 000 EPG) [10].This classification scheme is passed to the model in the form of the vector parameter epg classification bin edges containing binning edges.We set feces gram per day human to 100 grams.Based on the estimation that a dog weighs one sixth of the human and stool production is proportional to body weight, we set feces gram per day dogto 16.6 grams.Assuming cats have 5% of human body weight, we set feces gram per day cat to 5 grams.

A.6.4 Further parameters
We set latrine coverage to 0.4406, which equals the coverage over the entire study population in 2012.

A.6.5 Reservoir host population parameters and worm mortality
We parameterize the number of dogs N dogs and the number of cats N cats to be one sixth of the human population size N based on questioning individuals in the study population [6].We calculate initial worms per dog and initial worms per cat from EPG data collected for cats and dogs at the study site in 2012.Transforming the EPG count from individual animals using ρ −1 and taking the average results in 0.53 initial worms per dog and 5.63 initial worms per cat.
We set the population size of snails N snails to be 100 times the population size of humans N and the population size of fish N fish to be 10 times that of humans N.
These numbers are not based on data but on rough guesses and likely far from the true values.However, we expect misestimates of these numbers not to have an effect on the system's dynamics when the transmission parameter values are fitted to data.We set the initial prevalence snails to 0.0029 and initial prevalence fish to 0.2691 based on samples of fish and snails taken at the study site in 2012.
We set the worm mortality µ ph in humans based on a life expectancy of the parasite of 10 years to 1 10×365 .For other hosts, the life expectancy of the host as opposed to the life expectancy of the parasite is the limiting factor.We set the mortality parameters analogously assuming life expectancies of 4 years for cats and dogs, 2.5 years for fish and 1 year for snails [8].

A.7.1 MDA submodule
The MDA submodule checks if an MDA round is scheduled within the time interval [t, t + dt) with the timings of MDA campaigns specified by the vector parameter MDA timing.Only one round of MDA can be conducted in a time step.
If an MDA round is scheduled in the current time step, individuals eligible for MDA treatment are determined.These are all individuals aged above MDA minimum age who are not pregnant.Pregnancy is randomly assigned in each MDA round to female individuals according to parameters pregnancy age bins and pregnancy prortions, which specify the proportion of pregnant women for different age groups.This randomly assigned pregnancy is only a temporary variable used to determine MDA eligibility and is not stored at the individual level.Therefore the model is not well-suited for modelling MDA campaigns taking place more frequently than once a year.The MDA coverage levels specified through MDA coverage apply to the population of individuals eligible for MDA.Subsequently, it is determined which eligible individuals receive MDA depending on the parameter MDA strategy, for which there are two choices: • "random" assigns MDA randomly to individuals with the probability of the coverage specified for the current MDA round in the vector parameter MDA coverage, which contains the coverage of each MDA round listed in MDA timing.
• "gaussian coppula" assigns individuals a probability p m,i of adhering to an MDA campaign.The probabilities are determined at the moment when the first MDA campaign is carried out and stored in the vector MDA adherence probabilities.Individuals keep their p m,i campaign over all following MDA campaigns.p m,i is drawn from a beta distribution with a mean specified by the parameter MDA coverage and variance specified by the parameter MDA variance.Correlation between p m,i and the individual transmission parameters, β hf,i , is introduced through a Gaussian copula with the following steps: -The β hf,i are mapped to the interval [0,1] using the cumulative distribution function of the gamma distribution that generated the β hf,i , resulting in a vector of values X U .
-X U is then mapped to X N with the standard normal quantile function.
-A new set of normally distributed random values Y N is generated which are correlated with X N using the function February 7, 2024 11/22 where ρ is the correlation coefficient between the latent Gaussian variables in the copula and is specified through the parameter MDA copula correlation and ε are random draws from a standard normal distribution.
-Y N is mapped to Y U on the interval [0,1] using the standard normal cumulative density function.
-The p m,i are generated by transforming Y U using the quantile function of the beta distribution with mean and variance specified by parameters MDA coverage and MDA variance.
In each round of MDA, individuals receive MDA with probability p m,i , provided they are eligible for MDA as described above.
This option currently only works with the gamma-distributed β hf,i but could also be implemented with the empirical model.
For individuals receiving MDA, worms i is set to 0 with immediate effect on transmission and MDA treatment i is increased by 1.

A.7.2 Education submodule
Checks if an education round is scheduled within the time interval [t, t + dt) with education timings specified in days by vector parameter education timing.Only one round of education can be conducted in a time step.
If an education round is scheduled in the current time step, individuals eligible for education are determined.These are all individuals aged above education minimum age.There are two ways in which individuals that eventually receive education are determined depending on the parameter education strategy: • "random" selects individuals randomly with the probability of the coverage specified for the current education round in the vector parameter education coverage, which contains the coverage of each education round listed in education timing.For individuals receiving education, beta multiplier i is multiplied with the education efficacy of the current education round specified in the vector parameter educ efficacy with an immediate effect on transmission.
• "gaussian coppula" assigns individuals an education factor e i that determines the effect an education campaign has on them.The e i are determined when the first education campaign is carried out and are stored in the vector edcucation adherences.The factors then stay fixed over time for individuals.They are drawn from a beta distribution with a mean specified by the parameter education efficacy mean and variance specified by the parameter education efficacy variance.Correlation between e i and the individual transmission parameters, β hf,i , is introduced through a Gaussian copula with the following steps: -The β hf,i are mapped to the interval [0,1] using the cumulative distribution function of the gamma distribution that generated the β hf,i , resulting in a vector of values X U .
-X U is then mapped to X N with the standard normal quantile function.
-A new set of random values Y N is generated which are correlated with X N using the function February 7, 2024 12/22 where ρ is the correlation coefficient between the latent Gaussian variables in the copula and is specified through the parameter edcuation copula correlation and ε are random draws from a standard normal distribution.
-Y N is mapped to Y U on the unit interval [0,1] using the standard normal cumulative density function.
-The e i are generated by transforming Y U using the quantile function of the beta distribution with mean and variance specified by parameters education efficacy mean and education efficacy variance.
In each round of education, individuals eligible for education have their beta multiplier i multiplied with their individual education factor e i with an immediate effect on transmission.
When considering waning effects of the education campaign, we model a waning effect of the education campaign factor by updating each individual's beta multiplier (t) i at each time step using the formula beta multiplier which results in an exponential-like decay of the effect of the education campaigns.We choose an arbitrary value of 0.97 for educ decay for the analysis in the main paper (in absence of any data on this).The results of the analysis are not changed if a different value for educ decay is chosen as long as it is greater than 0.9.

A.7.3 Latrines submodule
Checks if a change in latrine coverage is scheduled within the time interval [t, t + dt) with latrine change timings specified in days by vector parameter latrine change timing.If this is the case, latrine coverage is changed according the scheme specified by the parameter latrine change plan and latrine change coverage.There are two choices latrine change plan: random assigns latrines randomly among all individuals in the population with coverage latrine change coverage.This means some individuals can lose access to latrines; addition calculates the difference in coverage between latrine change coverage and the current coverage, and randomly distributes latrines within the population that currently has no latrines such that the population has the latrine coverage specified by latrine change coverage after the intervention.Removing latrines with the option addition is currently not supported.

A.7.4 Human worms submodule
Checks which individuals reach the minimum age for worm infection within the time interval [t, t + dt) and randomly sets their eating (t) i state variable to 0 or 1 with an immediate effect on transmission.
Calculates the number of worms

A.7.6 Reservoir hosts submodule
The total number of worms in dogs updated according to where the mortality of worms in dogs is given by and acquisition of new worms in dogs is given by Analogously, the number of worms in cats is updated according to where the mortality of worms in cats is given by and acquisition of new worms in cats is given by The number of infected snails is now updated with where the mortality of infected snails is given by and the emergence of newly infected snails is given by N cats × N snails − snails (t)  .
(25) Finally, the number of infected fish is determined by where the mortality of infected fish is given by and the emergence of newly infected fish is given by A.7.8 Summary statistics submodule Stores summary statistics for the population of humans.This includes the mean age of humans, the latrine coverage and usage, the number of people consuming raw or undercooked fish, the number of people above minimum age for worm infection, the total number of worms in humans, the number of humans currently infected, the maximum number of worms any human carries in the population and the variance of number of worms humans carry.For EPG in individuals, the mean, the variance and maximum values are stored.Additionally, the number of individuals having light, moderate or high worm burden as specified by the parameter epg classification bin edges and the mean EPG among individuals within each of those groups are stored.

D Model variant with only humans as definitive hosts: Additional figures
This section contains the last two plot of the main text for the model runs without animals as definitive hosts.Animals were removed from the system by setting the transmission parameter from fish to non-human animals to zero and refitting the other transmission parameters.All the qualitative conclusions from the main text are not affected by this removal, although the exact times until the goals are achieved are slightly reduced.The parameter controlling the mean effect of the education campaign, µ e , has mostly a strong impact on the prevalence in humans.Systematic adherence to the education campaign, controlled through the parameter σ 2 e , affects prevalence, prevalence of heavy worm burden, and mean worm burden, though for each one of these outcomes, one of the other two studied parameters is of greater importance.Systematic adherence to the MDA campaign, controlled through the parameter σ 2 m , has a strong effect on the prevalence of moderate and heavy worm worm burden and consequently on the mean worm burden in the population.
Fig F. Age distribution of females in modified 2015 UN data and inferred mortality rates.

Fig
Fig K. First-order effects measured in the global sensitivity analysis of fitted intervention parameters (x-axis) on goodness of fit of the model to the objective summary statistics in the 2018 data for the model without non-human animals.The parameter controlling the mean effect of the education campaign, µ e , has mostly a strong impact on the prevalence in humans.Systematic adherence to the education campaign, controlled through the parameter σ 2e , affects prevalence, prevalence of heavy worm burden, and mean worm burden, though for each one of these outcomes, one of the other two studied parameters is of greater importance.Systematic adherence to the MDA campaign, controlled through the parameter σ 2 m , has a strong effect on the prevalence of moderate and heavy worm worm burden and consequently on the mean worm burden in the population.

February
Fig L. Time series for various summary statistics as predicted by the ABM without non-human animals as definitive hosts with fitted systematic adherence where all three interventions are implemented together or in isolation.

Table B .
1, ..., N State variables tracked in the ABM.The time t at which a variable is measured is denoted by a superscript (t) for variables that change over time, the individual i for which a variable is measured is indicated by a subscript if applicable.The dynamics of the model are governed by various transmission parameters β which are also depicted in Fig A and listed in Table C. Spatial factors are not included in the model, as the scope is restricted to a single Latrine submodule": Checks if a change in latrine coverage is scheduled within the time interval [t, t + dt).If this is the case, latrine coverage is changed according the scheme specified by the parameter latrine change coverage.. Randomly assigns death according to the assigned mortality rates.Replaces individuals that were assigned death with newborn individuals by resetting these individuals' state variables age 1. "MDA submodule": Checks if an MDA round is scheduled within the time interval [t, t + dt).If this is the case, MDA is distributed according to the scheme specified by the parameter MDA plan and MDA coverage among non-pregnant individuals aged over MDA minimum age.For individuals receiving MDA, worms (t) i is set to 0 with an immediate effect on transmission.2. "Education submodule": Checks if an education round is scheduled within the time interval [t, t + dt).If this is the case, eating education is assigned according to the scheme specified by the parameter education strategy and education coverage among individuals aged over education minimum age.For individuals receiving education, beta multiplier i for each individual depending on the number of currently infected fish fish (t) and the individuals' worm acquiring rate β hf,i .i and sex i .Adds time step size dt to each age

Table D .
Vector of size 100 × 365 Proportion of males having the age in days corresponding to the vector index Complete list of input parameters.
Vector of size 100 × 365 Proportion of females having the age in days corresponding to the vector index United Nations [3]/interpolation mortality male Vector of size 100 × 365 Mortality rate of males of the age in days corresponding to the vector index United Nations [3]/interpolation mortality female Vector of size 100 × 365 Mortality rate of females of the age in days corresponding to the vector index

Table E .
Complete list of input parameters.

Table G .
: Proportion of females pregnant at any given point in time per age group derived from fertility rates from 2005 Poisson dt × β hf,i × beta multiplier ph,i equals 0. Additionally, this module integrates the mean number of worms over time in order to track the worm days of individuals:The probability of death in the current time step for each individual is read from vector parameters mortality male and mortality female depending on age .If the time step dt is greater than one, the mortality rates which are initially supplied as daily values, get converted to mortality rates per time step.A Bernoulli random variable based on the individuals' probability is drawn determining whether they die in the current time step.In an intermittent step, dt is added to each individual's age age .All individuals dying in the current time step are replaced by newborn individuals by resetting state variables age i This section contains two plots with additional variables for the model runs with non-systematic and systematic adherence shown in the main text.Time series for various summary statistics as predicted by the ABM with fitted systematic adherence where all three interventions are implemented together or in isolation.