CystiHuman: A model of human neurocysticercosis

Introduction The Taenia solium tapeworm is responsible for cysticercosis, a neglected tropical disease presenting as larvae in the body of a host following taenia egg ingestion. Neurocysticercosis (NCC), the name of the disease when it affects the human central nervous system, is a major cause of epilepsy in developing countries, and can also cause intracranial hypertension, hydrocephalus and death. Simulation models can help identify the most cost-effective interventions before their implementation. Modelling NCC should enable the comparison of a broad range of interventions, from treatment of human taeniasis (presence of an adult taenia worm in the human intestine) to NCC mitigation. It also allows a focus on the actual impact of the disease, rather than using proxies as is the case for other models. Methods This agent-based model is the first model that simulates human NCC and associated pathologies. It uses the output of another model, CystiAgent, which simulates the evolution of pig cysticercosis and human taeniasis, adding human and cyst agents, including a model of cyst location and stage, human symptoms, and treatment. CystiHuman also accounts for delays in the appearance of NCC-related symptoms. It comprises three modules detailing cyst development, seizure probability and timing, and intracranial hypertension/hydrocephalus, respectively. It has been implemented in Java MASON and calibrated in three endemic villages in Peru, then applied to another village (Rica Playa) to compare simulation results with field data in that village. Results and discussion Despite limitations in available field data, parameter values found through calibration are plausible and simulated outcomes in Rica Playa are close to actual values for NCC prevalence and the way it increases with age and cases with single lesions. Initial simulations further suggest that short-term interventions followed by a rapid increase in taeniasis prevalence back to original levels may have limited impacts on NCC prevalence.


Introduction
Cysticercosis is a neglected tropical disease affecting humans and pigs, and a major cause of epilepsy in developing countries [1,2]. Eating undercooked meat from pigs infected with cysticercosis can lead to human intestinal infection with the adult Taenia solium parasite; this infection is called taeniasis. Meanwhile, pigs eating Taenia eggs or proglottids can develop the larval stage of the parasite in the body where it forms cysts, leading to what is called cysticercosis. Open defecation and free roaming of pigs promote contacts between pigs and Taenia eggs/ proglottids hence the spread of the parasite. Humans can also accidentally ingest Taenia eggs through the fecal-oral route, which may result in human cysticercosis. Human cysticercosis can have significant health effects in humans, particularly if cysts develop in the central nervous system (leading to neurocysticercosis, or NCC), which can lead to multiple presentations including epilepsy, migraine, intracranial hypertension (ICH), hydrocephalus and even death [3][4][5], for an estimated 2.8 million disability adjusted life years (DALYs) lost [6].
The World Health Organization has increasingly called for interventions to control or eliminate T. solium transmission [2]. While evidence on the effectiveness of interventions in reducing transmission between humans and pigs has been building [7][8][9][10], key information needed for policy development remains largely unavailable, including the effect of interventions on reducing the burden of NCC in the population. This gap is largely due to the costs and timeframes associated with neuroimaging needed to measure this burden over a large scale and over the many years needed before significant reductions in NCC are seen. In this context, computer simulations of the disease (based on models requiring neuroimaging data only for calibration/validation), combined with existing field tests, can provide more confidence regarding the best interventions to implement in different contexts. So far, multiple papers [11][12][13][14][15][16] have modelled the Taenia solium transmission cycle and some have modelled the number of NCC cases [11,14,15]. However, none have modelled the course of the disease or its symptoms.
We developed an agent-based model (ABM), CystiHuman, to address this gap. The novel features of this model include: 1) simulation of ICH/hydrocephalus (grouped within a single category for the simulation of both symptoms and treatment, for simplicity purposes) and epilepsy, including prevalence, treatment, and mortality, 2) differentiating model outputs according to the location of the lesion: parenchymal or extra-parenchymal (the latter being generally associated with the most severe symptoms) and 3) accounting for the time lag between infection and the appearance of symptoms/treatment, as well as time with the disease. Further, with CystiHuman, it will be possible to compare the cost-effectiveness of a large array of interventions, from taeniasis or cysticercosis treatment, to interventions to mitigate NCC through improved diagnosis and symptom management, which cannot be simulated using transmission models.
The objective of this paper is to describe this new model, including its purpose, scope, processes, and the information used to guide its development. We also explore the extent to which the model can be calibrated to real-world data collected from a variety of sources, and then apply the model to an endemic village in Northwestern Peru [17] to compare model outputs to observed data from that village. Model development is an iterative process that responds to both new data and knowledge from the real world, as well as to increased understanding of the model itself and of its performance. We consider this the first iteration of CystiHuman. Future versions will include more in-depth analysis of the behavior of the model, analysis of the impact of interventions on disease prevalence and their economic and DALY costs/benefits, and results from field studies that should provide data to refine the model and/or validate certain of its aspects.

Purpose
We developed CystiHuman with the long-term goal of informing decision-making through a cost-benefit analysis of different interventions to address neurocysticercosis. This paper focuses on how we model the prevalence and symptoms of human NCC, treatment likelihood, and symptom evolution. The primary model outputs are average NCC prevalence, personweeks with different symptoms or treatments, and the number of surgeries and deaths.
Data are currently insufficient to validate the model, but comparison of model simulations with available field data can nevertheless be useful while awaiting for the results of planned field studies. For that purpose, we applied the model to an endemic village in Northwestern Peru, Rica Playa [17], and compared simulated and actual values in this village.

Prerequisites
CystiHuman requires input from a separate model of the human taeniasis-pig cysticercosis cycle that gives information on the evolution of taeniasis and Taenia egg density in the community. We have chosen an adaptation of the CystiAgent model [16] for this purpose, as it provides the inputs needed and was validated in Peru, where our team has been working. Eggs in the environment, originally represented as a number in [16], are now represented as a density. Information regarding adaptations to CystiAgent relevant to CystiHuman since the original paper [16,18] is provided in S1 Text.
CystiHuman also integrates demographic movements from CystiAgent. These include short-term mobility (travels to and from other villages), and long-term 'mobility' (emigration, deaths, immigration and births by age range), with immigrants differentiated according to their origin (high risk endemic area or low risk area). Integration of such movements is necessary, as Peruvian society is highly mobile, with migrations into and out of districts in the target region estimated at over 3% of the district population per year [19][20][21].
Finally, CystiHuman uses the same human and household allocation as CystiAgent, which reflects the actual situation in the target villages.

State variables and processes
The model works at multiple levels: each village is a collection of households, which contain human individuals, who may host NCC lesions.
NCC lesions are individual agents located within human hosts. Multiple NCC lesions can be simultaneously present in a given host, modelled as independent agents. Each NCC lesion has seven state variables. Lesion-related processes include: 1) change in the stage/substage of the lesion (based on its age and in some cases treatment type), and 2) association with symptoms, which updates the state variables 'time since last seizure' and 'association with ICH/hydrocephalus'. In the model code, cyst-related processes are implemented before human-related processes.
Humans are the second class of agents. They may harbor any number of NCC lesions, and are located within a village household (except for emigrants). They have nine state variables. In the current form of the model, there is no distinction between sexes (behaviors may differ by sex, but field data suggest that men's and women's taeniasis rates are similar, reducing the relevance of sex-disaggregation) and immunity to cyst development has been neglected, though these are features that could be added at a later stage of the model. Human-related processes include: 1) infection and cyst initiation through egg ingestion, 2) symptom development, 3) treatment, including type, delay and success and 4) travel, emigration and deaths.
Households belong to a specific village in which they have a fixed location in line with their actual location in the field. All households contain at least one human agent, while villages are open systems with in and out movement of humans, but a fixed number of households. Human agents do not change household within the village. Household related processes are limited to the replacement of departing humans with immigrants or newborns so as to keep overall population size constant, and to the welcoming of short-term travelers from villages outside of the simulation. Replacement of departing humans is implemented as humans emigrate or die naturally or through disease (details and justifications in S3 Text).
NCC lesions directly affect human hosts as their state determines the host's disease and symptoms e.g., likelihood of seizure. Hosts may affect NCC lesions through treatment (e.g., surgical removal of a cyst). Humans interact with one another: an infected cook may affect household members through the preparation of contaminated food; while humans indirectly affect others from the same or other households in the village through environmental contamination. This indirect interaction between households is the only such interaction represented within CystiHuman. Meanwhile, in the absence of sufficient data to support the existence of interactions between lesions or represent them, we make the simplifying assumption that NCC lesions do not interact with one another. This assumption may be adjusted if more data become available in the future. The model does not include adaptive responses beyond humans' choice to get treated or not after symptomatic disease appears.
The model has a spatial structure. In addition to interactions of humans within households, the underlying model of human taeniasis (CystiAgent) and some of the interventions (e.g., ring strategy) we want to assess are spatial in nature. Spatial location is modelled through a discrete location variable (latitude and longitude) on a square lattice. When the model is initiated i.e., at the beginning of the burn-in period, humans have no cyst in their encephalus. When cysts are created, they are immediately allocated a location, values for τ 1 and τ 2 , immature stage, and association with no symptom. The immature stage of a lesion has a fixed duration, but the duration of other stages and all other processes are stochastic, as stochasticity is key to capture the time spread in symptom emergence. Finally, the model is divided into three modules: module 1 simulates NCC prevalence and cyst stage; module 2 epilepsy/seizures; and module 3 ICH/hydrocephalus. The modules are calibrated successively. For the calibration of modules 1 & 2, extra-parenchymal lesions are ignored because they are rare among all lesions and epilepsy cases (see S1 Data). For module 3, they are included as they drive most ICH/hydrocephalus cases.
The description of state variables, their meaning, possible values and initial values are provided in Table 1.

Sub-models
The following section details the three main modules of CystiHuman: prevalence, epileptic symptoms, and ICH/hydrocephalus symptoms.
Module 1: NCC prevalence (infection risk and cyst stages). Module 1 concentrates on disease prevalence, ignoring disease symptomatology. It assumes that extra-parenchymal lesions represent a small enough share of all lesions (when asymptomatic parenchymal lesions are included) to be ignored for the purpose of the module. NCC prevalence is determined by two different processes, infection risk and cyst stages: Infection risk. The likelihood of developing cysts in the encephalus is determined by three main drivers: • Self-infection risk: the likelihood of self-infection of a person with T. solium taeniasis, in the absence of protective hygiene practices (e.g., hand washing), is characterized by χ.
• Infection of household members by a person with T. solium taeniasis: this happens primarily if the person responsible for food preparation has taeniasis and limited hygiene. Household members of a person with taeniasis identified as a cook have an added infection risk noted a χ with a � 1 (assuming that the risk of contamination when eating food prepared by an infected cook is equal or lower than self-contamination risk when one has taeniasis).
• Risk of contamination through disseminated eggs in the overall environment. If E is the average density of eggs in the environment, the likelihood of environmental infection, if hygiene is poor, E is assumed to be uniform at village level and is noted σ E. For large communities, this assumption will likely no longer be valid and we may use a function σ f(E(location)), with E(location) = egg density, f being a bell-shaped function centered on the individual's household and reflecting the places where s/he is typically present.
All infection risks are modulated by hygiene, represented by a multiplicative term h (h = 1 if there is no hygiene, h = 0 if hygiene is fully protective).
Simultaneous development of multiple cysts is not impossible. To account for this risk, the weekly risk of developing k cysts is given by a Poisson distribution with λ = h (σ E + χ) for a person with taeniasis, λ = h (σ E + a χ) for non-infected household members of an infected cook, and λ = h σ E otherwise.
The way infection risk is modelled in CystiHuman and how this links to the outputs of CystiAgent (taeniasis cases and eggs in the environment) is detailed in Fig 1. Cyst timeline. Once a cyst develops in the encephalus, it goes through multiple stages: starting off as an immature cyst for duration τ 0 = 3 months [34], it continues as a mature noncalcified lesion, then either calcifies with probability p calc or disappears. The mature non- calcified stage is divided into two substages, a first asymptomatic substage of duration τ 1 , and a second substage of duration τ 2 that may be symptomatic. These substages are not directly related to changes in what is seen on imaging but to symptomatology, even though the two may coincide (see Modules 2 and 3). Table 2 provides the detail of known parameters in all three modules. There are three unknown parameters in module 1 that will be determined through the calibration process: h σ, h χ and a. h was introduced to highlight the contribution of behavioral drivers (h) vs. biological drivers (χ, a & σ), but does not need to be separated from these to accurately simulate NCC in the village. S1 Text details how cyst lifecycle indicators and indicators associated with disease prevalence and cyst number were computed.

Module 2: epilepsy symptoms, treatment likelihood and risk of death
Module 2 simulates epilepsy symptoms, treatment, and risk of death. It only includes parenchymal lesions as the large majority of epilepsy cases are associated with such  Period from cyst maturity to lesion disappearance or calcification (τ 1 + τ 2 , driven by a 2-part distribution)-parenchymal lesions only τ 1 (duration of the 1 st substage) is given by a Gamma distribution:

PLOS COMPUTATIONAL BIOLOGY
GðaÞ with all times in years and α = 2.94, β = 0.83, then taking the maximum of the resulting τ 1 and 1 week to avoid 0 or negative values. τ 1 derives from the analysis of estimated timeframes from infection to first symptoms in [35] † . τ 2 (duration of the 2 nd substage, from the end of the 1 st substage to cyst death), is given by an exponential distribution which parameter is the average weekly death rate of a cyst: 2.6% (plausible range: Weekly likelihood for unsuccessfully treated or untreated calcified parenchymal lesions associated with epilepsy to be associated with a seizure ω = 0.05 -see explanation of the computation of this figure in S2 Text Delay from cyst maturity to first symptoms, for extra-parenchymal lesions τ 1 (duration of the 1 st substage) is given by a Gamma distribution: Delay from first ICH symptoms to treatment t delay : 37% of cases delay treatment by less than 1 month, 36% by 1-6 months, 10% by 6-12 months, and 19% by over 1 year-see [49-53] † and S2 Text and lesions. The rationale for all figures is detailed in S2 Text while mortality data are provided in S3 Text.
Likelihood of symptoms associated with a parenchymal lesion. parenchymal lesions may be associated with incident epileptic seizures at the beginning of the second substage of the mature non-calcified stage (with probability π e ) or at the beginning of the calcified stage (with probability π ec ). When cysts that were associated with seizures prior to calcification reach the calcified stage, associated seizures may stop or continue (the corresponding probability is noted π ae ). Meanwhile, the model defines a probability of seizure in any given week for calcified lesions associated with active epilepsy in patients that have never been treated (or have been unsuccessfully treated) as ω. This representation simplifies active epilepsy as it does not represent individuals with highly irregularly spaced seizures.
Finally, if lesions that have already been associated with epileptic seizures disappear, further seizures may take place before waning: the corresponding probability is s = 10%. In such cases, it is assumed, as a simplification, that new seizures happen at the moment of lesion disappearance but stop afterward.
Human symptoms. symptoms for a human host are derived from the symptoms associated with individual brain lesions: if any of the brain lesion in the model has been formally associated with epilepsy (time since last seizure � 0), the host has epilepsy. Humans have active epilepsy if the most recent seizure associated with any of the host's lesions happened less than T a = 5 years or 261 weeks ago. This value was chosen in line with practices in the region of Peru to which the model has been applied.
Treatment. treatment with anti-epileptic medication may be undertaken if the individual has epileptic seizures. The model uses the estimated probability of treatment in endemic communities in the target region of Peru. It assumes that treatment, once initiated, continues for a duration T treat after the last seizure. Based on feedback from Peruvian colleagues, we used T treat = 2 years, but leave space for other options as needed.
Treatment is deemed "successful" if seizures stop while the patient is being treated (no "breakthrough seizures"). Treatment success leads to the end of treatment after the patient has �� Ignoring extra-parenchymal lesions that are asymptomatic during their whole lifespan. † Value not directly taken from the reference but obtained through process explained in S1 Text, S2 Text or using data in S1 Data. PLOS COMPUTATIONAL BIOLOGY remained seizure free for at least two years. Treatment success is most relevant at the calcified stage as drugs are normally not discontinued during the shorter non-calcified stage [4]. The probability of success (cessation of seizures) when calcified lesions are treated with anti-epileptic medication for two years has been estimated at around 47% [47]. In the model, patients that have been successfully treated no longer experience seizure, even after treatment is discontinued. When treatment is not successful, patients are modelled as having continued seizures both during and after treatment at frequency ω. We do not model the variety of situations among "unsuccessfully treated" patients, some of whom may have significantly reduced seizure frequency. Mortality. deaths from active epilepsy are rare. These are computed based on Peru's data. For simplicity reasons, individuals that die from active epilepsy are replaced by immigrants or births in the same household, in line with the rules of the demographic model. Symptoms. all extra-parenchymal lesions represented in the model are associated with ICH/hydrocephalus, starting at time τ 0 + τ 1 . Parenchymal lesions are rarely associated with ICH/hydrocephalus, but if they do so this also happens after τ 0 + τ 1 and only at the non-calcified stage. The associated probability is noted π i .
ICH/hydrocephalus treatment is very rare in the context of endemic Peruvian villages, though the exact share that gets treated is unknown. In addition to people that never get treated, many of those who ultimately consult a doctor delay care-seeking. Treatment, when it takes place, may be non-surgical (primarily anthelminthic) or surgical (shunt placement, cyst excision, etc.). Death rates are elevated, and primarily known for individuals that do seek treatment. It can be assumed that they are higher for those that do not. Table 2 provides module parameter values, how they were obtained is detailed in S2 Text (for symptoms) and S3 Text (for deaths).

Programming and implementation
The model was programmed in Java MASON and the corresponding code (as well as the code for CystiAgents) is available at https://github.com/oflixs/CystiAgents. The model is implemented using a burn-in period (during which statistics are not recorded) of 3,500 weeks (67 years) or roughly a human lifespan to allow for the accumulation of calcified lesions in all individuals in the modelled community. Statistics are then accumulated over 10,000 weeks (192 years) to produce baseline figures. We expect that the impact of control interventions will generally be assessed over one or several decades. NCC symptoms evolve over months and years, but taeniasis infections and environmental contamination with eggs, which drive NCC infections, evolve over the course of weeks and months, hence the model has a discreet time step of one week.

Calibration methods
Model calibration serves to identify the values of unknown parameters (called "tuning" or "calibration" parameters) that lead to model outputs that best fit observable data. CystiHuman was tuned using multi-stage approximate Bayesian computation calibration. To do so, CystiAgent, the transmission model that has been chosen to provide inputs to CystiHuman, first needs to be calibrated. However, so far, there are no villages in which local contemporaneous data sufficient to calibrate both CystiHuman and CystiAgent are available. Field studies are planned to gather a comprehensive set of data for both models in the same community. In the meantime, we have chosen to tune CystiHuman in the same 3 endemic villages in North-West Peru, denoted as 515, 566 and 567, in which CystiAgent was calibrated [unpublished results], because data on village demographics, human taeniasis and pig cysticercosis are sufficient to fully calibrate CystiAgent there.
CystiHuman was tuned in three steps, corresponding to each of the three modules: calibration of h σ, h χ and a, which allows for the simulation of disease prevalence and stage (module 1), calibration of π e , π ec and π ae , related to epilepsy and seizure symptoms (module 2), and calibration of π i and ξ, associated with ICH/hydrocephalus symptomatology (module 3). For the calibration of modules 1 & 2, extra-parenchymal lesions are ignored because they are rare among all lesions and epilepsy cases (see S1 Data). For module 3, they are included as they drive most ICH/hydrocephalus cases.
The observables chosen to calibrate the model are presented in Table 3. The table provides the geographic origin of the data/proxies, while S1 Text (for module 1) and S2 Text (for modules 2 and 3) detail how the corresponding values were obtained. The second and third observables in the table (share of individuals with NCC that have a single lesion, and share having two lesions) are based on global averages as a review of literature has revealed that these shares were largely stable across countries and communities (see S1 Text, Fig A and Table A in S1 Text).
The three modules were calibrated successively. For the first two modules, the program explores a series of parameter sets at each stage of calibration. Multiple runs are simulated for each set, and outcomes averaged over all runs. The number of runs was chosen to limit uncertainty on the average found for each parameter set, while the number of parameter sets was defined to optimize the speed of convergence of the calibration, given the number of runs. A threshold below which a parameter set is accepted was also defined. This threshold was set so that around 20 parameter sets were accepted at each calibration stage. Table 4 provides details for the first two modules. The two unknown parameters in the third module can be calibrated successively, allowing us to use manual calibration. However, given the small number of cases in any single village, we had to average outcomes over 64 runs for each parameter set.
The 'best' parameter set is the one whose output is closest to the observed values used to calibrate the model. The distance D between simulated and observed values has been defined as  Once actual values for NCC prevalence in the villages in which the calibration is undertaken become available, the calibration will seek to approximate these values for each village separately. The main results of the calibration process are described in the next section while further details are available in S4 Text.

Results
This section includes a description of calibration results for villages 515, 566 and 567 in the Piura region of Peru. It also describes initial model outcomes (e.g., age-related patterns) and applies the model to a fourth village: Rica Playa. The purpose of this section is to: 1) demonstrate the feasibility of model calibration using available data, and 2) provide and discuss initial model outcomes based on this calibration. Field studies are planned to improve model observables hence calibration results and obtain data from further villages for validation. In-depth discussion of the model, description of the model of the costs of the disease and analysis of the impact of interventions, will be done in separate papers.

Calibration results
The calibration significantly narrowed the range of plausible parameters for all parameters. Calibrated values are: h σ = 0.00215, h χ = 0.0180, h a = 0.0714. π e = 0.0050, π ec = 0.0132, π ae = 0.406, ξ = 0.0108 and π i = 0.00083. Simulated outcomes using these values yield outputs that are very close to the targets (see details in S4 Text).
The calibrated value for h a is small, corresponding to a strength of self-infection (h χ) substantially higher than that of contamination through cooks (h a χ): after a year with taeniasis, the risk of contamination is 61%, while after a year eating food prepared by a household member with taeniasis, the risk is 6%. There is also a 1.4% yearly contamination risk from environmental sources (a continuous, unavoidable source of infection, which magnitude is reflected by h σ). Over 50 years, this corresponds to one chance in two of being contaminated. Overall, using the selected calibration parameters, 42% of NCC cysts come from environmental contamination sources.
Meanwhile, the calibrated parameters for module 2 suggest that 0.50% of all lesions will be associated with epilepsy starting at the non-calcified (viable or degenerating) stage, while 0.42% (π ec × p calc ) will be associated with epilepsy starting at the calcified stage. Close to one in two lesions associated with seizures at the non-calcified stage continue being associated with seizures after calcification. In addition, when running the simulations, we find that close to 20% of all epilepsy cases caused by NCC are individuals who, on imaging, will not have any lesion, these epilepsy cases derive from a past infection in which the lesion has cleared. Once treatment likelihood is included, the model suggests that, for individuals in the target villages or who contracted the disease while there, over 10,000 weeks of simulation, 3.5 people on average will die from active epilepsy, 22.4 from ICH/hydrocephalus, and 2.6 will undergo treatment/surgery. The burdens of active epilepsy, treated active epilepsy and ICH/hydrocephalus over that period are 1231, 146 and 1731 person-years spend with the respective condition, assuming that the treatment gap is 95% for ICH/hydrocephalus and the death rate 36% for untreated individuals (the lowest estimate for that figure).

Initial model outcomes and application to an additional village
In this section, we present some additional model outputs of interest. Fig 3 shows the evolution of NCC prevalence for the calibrated model for one run in one of the villages, and how this relates to oscillations in human taeniasis prevalence (in the absence of interventions). These patterns are similar for different runs and villages. Short peaks in taeniasis rapidly lead to an increase in NCC prevalence (A), followed by a slow decrease (B), over decades. Further, a rapid succession of peaks and lows in taeniasis prevalence is not associated with a substantial decrease in NCC prevalence (C), which suggests that short-term interventions, if they are followed (as has been the case so far [9, 10]) by a rapid increase in taeniasis back to original levels (also associated with a rapid increase in pig cysticercosis), may have limited impact on NCC rates.
Data availability so far is insufficient to validate CystiHuman. However, CystiHuman can be applied to the Peruvian village of Rica Playa [17] and model outcomes compared with a number of field measures. Further, it is possible to compare model outcomes for the three calibration villages with a number of actual figures from other communities/contexts for which data are insufficient to apply the model, keeping in mind that CystiHuman results may not be fully transferable to these new contexts.
Age at first symptom. In model simulations, across the three villages, seizures are expected to begin at 32 years old while ICH/hydrocephalus symptoms should begin at 41 years old. Clinical data suggest that there is indeed a difference between these ages: in a review of 38 cases, the average age of patients with ICH/hydrocephalus was 39.6 while that of patients with epilepsy and no ICH/hydrocephalus was 33.3, a 6.3-year difference [54]. Further, multiple studies have compared the age of symptomatic patients with parenchymal vs. extra-parenchymal lesions (the first group mostly presented with epilepsy and the second with ICH/hydrocephalus): in these studies, patients with parenchymal lesions were on average 7.2 [70], 3.5 [69] and 2.3 [72] years younger than those with extra-parenchymal lesions.  [17]. We applied the calibrated CystiAgent and CystiHuman models to Rica Playa, simulating taeniasis, pig cysticercosis, human NCC prevalence and NCC symptoms. We defined a confidence interval in which 95% of experimental measures (should all the adult population be sampled), would fall. These results were compared with field study results, based on CT scans of 86% of the adult population. Results are available in Table 5. The only significant difference between simulations and actual figures relates to the number of cases with 11 or more lesions. There is good coherence between all other outputs and field measures. Note that the patterns of increase in NCC prevalence with age found in both simulations and field data from Rica Playa is also generally observed in other endemic communities e.g., in Mexico [32] and Ecuador [28]. Age-related increases derive from the accumulation of infection risks over time, and are likely to be influenced by phenomena such as acquired immunity and historical changes in pig raising practices or hygiene levels.

Comparison of model outcomes with field data in Rica Playa
It is important to note that Rica Playa [17] is one of the villages included, alongside other communities, in the computation of some of the proxies used when calibrating CystiHuman. This increases the likelihood that projections for this village would align with reality, though coherence is not a given. For example, the use of a global average for the shares of NCC cases with a single or two lesions in the three villages of calibration was premised on the assumption (supported by community data, as shown in S1 Text) that these shares are very stable across countries and communities. However, using these to calibrate CystiHuman did not guarantee that the model would lead to stable figures across communities, and it was reassuring to find that model simulations in Rica Playa were coherent with field data. To validate the model, however, there will be a need to apply it to a set of entirely new villages-something we are working toward through plans for new field studies.

Discussion
The primary objective of this paper is to present CystiHuman, an ABM that simulates human NCC in the endemic community setting. This ABM represents an important first step toward filling a critical gap in the field of T. solium control and elimination, namely a tool that can simulate the prevalence and incidence of NCC, associated disease manifestations such as epilepsy and ICH/hydrocephalus, and their cost, to inform policy decisions. In this paper, we demonstrated that a functional ABM of NCC can be developed based on current understanding of the processes involved and on existing data sources. Furthermore, we showed that this

PLOS COMPUTATIONAL BIOLOGY
model can be calibrated successfully to reproduce observed patterns of NCC in endemic villages in northwestern Peru, such as age-prevalence increases, despite employing calibration targets that were not specific to these villages. Finally, we showed that using these calibrated parameters, CystiHuman adequately reproduces real-life data observed in another rural village in northern Peru. However, more work is needed to achieve the goal of a credible model that can be used for policy decisions.
One of the main challenges in developing CystiHuman was the general paucity of realworld data to inform parameter estimates and processes used in the model. As is the case for most neglected tropical diseases, literature on NCC is limited due to a historic lack of attention, and funding, to the disease. Poor accessibility to neuroimaging in endemic regions, suboptimal performance of diagnostic assays [73], and lack of standard approaches to screening and diagnosis, further limit the scope and quality of the literature base. In general, available studies had small sample sizes [25,27,[74][75][76], were cross-sectional in nature or with shortterm or incomplete follow-up, and were biased towards enrollment and/or follow-up of symptomatic cases (among papers with data on the number of lesions in NCC cases, 18 focused on symptomatic cases vs. 6 on asymptomatic or all cases-these were used in S1 Text). Further, studies employed a variety of diagnostic methods including CT scan, serology or mixed criteria [77], complicating efforts to synthesize results and often leading to wide confidence intervals for parameter estimates. For some potentially important processes, such as human immunity, which may differ by exposure, age, gender or genetic background, currently available data were insufficient to include these processes in the current iteration of CystiHuman. Nonetheless, we found sufficient data to build and successfully calibrate the ABM using a combination of global and regional data. Some target values for calibration (e.g., proportion of NCC cases with a single lesion) were remarkably stable across many studies, while others (e.g., share of NCC cases with parenchymal lesions or with ICH) could be improved with additional high quality field studies.
We were also challenged to find adequate data sources for validation purposes, as studies measuring NCC, epilepsy and ICH, along with taeniasis and porcine cysticercosis, were rare. We were able to conduct an initial cursory comparison using observed data from the small village of Rica Playa and modeled values from CystiAgent. It is important to note that Rica Playa [17] is one of the villages included, alongside other communities, in the computation of some of the proxies used when calibrating CystiHuman. This increases the likelihood that projections for this village would align with reality, but coherence is not a given. For example, the use of a global average for the shares of NCC cases with a single or two lesions in the three villages of calibration was premised on the assumption (supported by community data, as shown in S1 Text) that these shares are very stable across countries and communities. However, using these to calibrate CystiHuman did not guarantee that the model would lead to stable figures across communities, and it was reassuring to find that CystiHuman reproduced this share as well as most other observed values in Rica Playa using the set of tuning parameter values that were not specific to this village. The one element CystiHuman did not reproduce is the high share of cases with large numbers of lesions. Though this discrepancy should not affect the ability of the model to simulate disease prevalence, it suggests that some real-life phenomenon has not been accounted for. This may be, for example, an additional modality of infection affecting a minority of cases, or perhaps, inborn and/or acquired immunity. We are currently planning additional community studies in northern Peru to collect the comprehensive datasets needed to conduct more in-depth analyses of the model's behavior, and for full validation against a set of entirely new villages. These studies should include some behavioral data (e.g., social networks, open defecation or cooking roles or practices), linked to demographic data (sex, age), which may help assess how much these may influence heterogeneities in infection risk.
Further refinements to the processes included in this model should also be considered. For example, model performance might be improved by allowing some lesions to die rapidly, leading to early calcification and symptoms, rather than to require these to always go through a viable stage [34]. Data from India [35] suggest that few of the UK soldiers that contracted NCC in India immediately developed symptoms, but it has been suggested that disease manifestations may differ in Latin America, possibly because of mechanisms of immunity [78]. Hence, developing a plausible model of human immunity, albeit very complex, would be an important next step for CystiHuman. Other possible refinements include a model of contamination through communal eating places, interactions with extended family, or a more detailed representation of contamination through dispersed eggs in the environment (the present model assumes uniform risk at village level). As further data become available regarding associations between NCC and chronic severe headache, cognitive impairment, and other manifestations [79][80][81], these could be added to the model to more accurately capture the burden of disease.
Once CystiHuman has been validated in the context for which it has been developed, assessing how well it can be transferred to other settings will be essential. One of the strengths of the model is that some of its parameters and observables are expected to be valid in multiple contexts. Its calibrated parameters are mostly dependent on the biology of the disease and on hygiene practices at village level. These may therefore be transferrable to other endemic communities in Northwest Peru, where human, pig and worm genetics, as well as hygiene practices, are expected to be similar. The observables used for calibration are also mostly non-local, reducing expected measurement efforts to re-calibrate the model to new contexts. The primary exception is NCC prevalence, which is village-specific. Its measurement can be costly and logistically difficult in certain communities, particularly if we wish to transfer the model to poorer country contexts. This pleads for increased efforts to improve biological markers of the disease, which are cheaper and easier to implement than CT scans and MRIs.
We developed CystiHuman because we believe that such a model could add new insights to those brought by transmission models focusing solely on human taeniasis and pig cysticercosis. CystiHuman will have the ability to model the impact of a broader array of interventions than transmission models, e.g., free supply of anti-epileptic drugs. More and better simulations will require the development of estimates of the economic and DALY costs of the disease (with some economic estimates coming from planned field studies) and testing of new model elements.
In conclusion, CystiHuman presents an important first step toward accurate modelling of human NCC, which could bring useful insights into the relative effectiveness and cost of different interventions to address the disease. In addition to providing a different perspective on interventions that can be modelled through transmission models, CystiHuman also has the ability to include new interventions focused on NCC mitigation. More field studies and further model development and testing are planned to ensure that CystiHuman provides a fully reliable tool to study the disease.
Supporting information S1 Text. Inputs to the model of lesion stages and neurocysticercosis prevalence. Table A in S1 Text: distribution of NCC cases per number of lesions. Table B in S1 Text: Weekly probability of death of an NCC lesion after the beginning of symptoms.