^{1}

^{*}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}

^{3}

Conceived and designed the experiments: MZL NMB VK JGCdC ECB RHG CB. Performed the experiments: MZL NMB VK ECB. Analyzed the data: MZL DSS DAV. Wrote the paper: MZL DSS JBP. Conceived and designed the analysis: MZL DSS JBP.

The authors have declared that no competing interests exist.

Vector-borne transmission of Chagas disease has become an urban problem in the city of Arequipa, Peru, yet the debilitating symptoms that can occur in the chronic stage of the disease are rarely seen in hospitals in the city. The lack of obvious clinical disease in Arequipa has led to speculation that the local strain of the etiologic agent,

Chagas disease has become an urban problem in the city of Arequipa, Peru, yet there are very few people exhibiting severe symptoms of the disease. Severe symptoms often do not appear until decades after infection. To determine why so few people were exhibiting severe symptoms, we used a new method, epicenter regression, to trace the history of Chagas disease transmission in a community of Arequipa, Peru. Our findings suggest that transmission in Arequipa occurred through a series of small epidemics, the oldest of which began only around 20 years ago. These micro-epidemics infected nearly 5% of the community before the insect that carries Chagas disease,

Chagas disease, responsible for more deaths in the Americas than any other parasitic disease

The vast majority of the 8–10 million individuals infected with

The long asymptomatic period of Chagas disease leads us to an alternative hypothesis for the absence of clinical cases in Arequipa: transmission in the city may be so recent that most infected individuals have yet to progress to late stage disease. In order to evaluate this hypothesis it is necessary to elucidate the timing of

Traditionally, analyses of infectious diseases have aimed either to describe risk factors for infection at a static moment in transmission, using statistical methods to smooth heterogeneities in exposure between individuals created by the agent's spread

We consider models with a single site of introduction of

We conducted cross-sectional entomologic and serologic surveys in one recent settlement (

Epicenter regression makes inference on where and when a disease agent was introduced into a community, as well as the effect of household-level covariates on the risk of infection given exposure. We begin with a simple model ^{-Risk given exposure}^{ Duration of exposure}. This equation is known as the catalytic model, because it also describes the probability of a change in state of molecules exposed to a constant bombardment of a catalyst

We expand this framework into a biologically plausible model by allowing ‘risk given exposure’ and ‘duration of exposure’ to vary among individuals depending on observed covariates, and estimate the effect of the covariates from the observed data. We estimate the risk of exposed individual, _{i} = _{i}

For each individual we estimated the period of time over which he or she had been exposed to _{jk}/r_{jk}/r_{jk}

A steep hilltop separated households in the study area. We calculated the distance between the households going around this hilltop. As in the related proportional hazards model

We expanded our epicenter regression framework to consider the possibility that _{k}_{k} – (d_{jk}/r)

For comparison purposes, we fit an endemic model, in which we assume that each individual's time of exposure was equivalent to their age. The infection probability of individual

Bayesian analysis of epicenter regression begins with what is known about the epidemic before testing people in the community, the “priors” of the model ^{3}) on the effect of household-level covariates (the _{k}

Providing an explicit prior on the speed of spread of the pathogen allows us to better tailor our analysis to the particulars of

We fit epicenter regression models using Bayesian methods and Monte Carlo Markov Chains (MCMCs). We updated MCMCs using the Metropolis and Metropolis-Hastings algorithms ^{th} estimate in the remainder of a chain to diminish autocorrelation among the estimates. For each pair of models compared, we estimated the Bayes factor by the average, over the 50 pairs of chains, of the ratio of harmonic means of the posterior likelihood for the models

Models describing ^{4} (

Shown are the mean and standard error of the estimated Bayes' factors for comparing each model to the 1 epicenter model. The dotted line denotes models with strong support relative to the 1 epicenter model (Bayes' factor >10).

Under the four-epicenter model, the parasite was first introduced into Guadalupe about 20 years ago. When we tabulated the exposure time and risk of infection of individuals in the population in the four-epicenter model we found that around half of infections occurred in the 5 years previous to disruption of transmission through insecticide application, and 90% of infections occurred over a period of 12 years (

A. The expectation for models with 2,4,6,8 and 10 epicenters; lines are shaded according to the number of epicenters (2 epicenters-light grey, 10 epicenters = black). B. A boxplot showing the median and credible intervals for the posterior estimates from the best-fit four-epicenter regression model. Chagas disease is a lifelong infection; infected individuals are assumed to remain seropositive through their lifetimes.

Spatially, the first introduction in the four-epicenter model occurred in the southwest of the community (

The probability that each household was the first (hexagons), second (triangles), third (squares) or fourth (pentagons) site of introduction under the four-epicenter model is shown. Larger shapes correspond to higher probabilities. Households with cases of human disease are shown as red circles, and those with vectors carrying

Temporally the four-epicenter model captured the relationship between age and prevalence observed in the data (

The histogram represents the observed data; a smoothed spline, weighted by the number of observations at each age, is fit to these data (black curve). Model estimates of the relationship between age and prevalence were calculated by determining the probability of infection for each individual derived from the posterior predictions of the epicenter regression model with four epicenters. The spline fit to the median posterior predictions is surrounded by a region bounded by splines fit to predictions from the 2.5% and 97.5% quantiles of the posterior (light grey, shaded).

Parameter | Endemic Model | Single epidemic model | Four micro-epidemic model |

Median [2.5%, 97.5% c |
Median [2.5%, 97.5% |
Median [2.5%, 97.5% cri] | |

Rate of Spread of the Parasite (m/year) | - | 29.18 [17.52–44.69] |
17.35 [8.87–32.55] |

Baseline yearly risk of infection | 0.0014 [0.0009–0.0020] | 0.0032 [0.0018–0.0058] | 0.0042 [0.0022–0.0086] |

Relative risk per domiciliary insect captured | 1.018 [1.004–1.028] | 1.014 [1.000–1.024] | 1.014[1.000–1.025] |

Relative risk in households with animals sleeping inside | 1.34 [0.66–2.56] | 1.53 [0.76–2.93] | 1.42 [0.70–2.72] |

Years since the first introduction of the parasite | - | 20.31 [12.71–33.25] | 19.98 [10.92– 34.65] |

Credible intervals are the 2.5^{th} and 97.5^{th} quantiles of the posterior samples.

Corresponds to an average of 15.01 [5.07, 34.26] households exposed by a single infected household in a fully-susceptible population.

Corresponds to an average of 4.950 [1.01, 18.77] households exposed by a single infected household in a fully-susceptible population.

When we combined our posterior model estimates of the timing of infection with

Top row: three alternative models to describe the probability of onset of late-stage disease as a function of time (see text). Histograms represent the posterior expected number of late stage cases under the four-epicenter model (middle row) and one-epicenter model (bottom row).

Transmission of

Our findings do not disprove the hypothesis that the parasites circulating in Arequipa are less pathogenic than other strains. Previous studies suggest that

Clinically, our findings may contribute to a re-evaluation of treatment guidelines for indeterminate Chagas disease that is currently underway

Geographically, our finding that

Generally, our approach is applicable to any situation in which the expected observation of an organism at sampling locations at a certain time is a function of the (unknown) site or sites of introduction of the organism into the system. The functional relationship between the expectation at a sampling site to the site(s) of introduction can be a simple function of distance, as we have used here, or can include information about the habitat between the sampling and introduction sites. The method can be informative when some prior information, on the advance of the disease agent or the likely sites of introduction, is available. The method is not likely to be informative in the absence of both.

Our application of epicenter regression to

Our study focused on a single peri-urban community. Since the completion of our study we have observed similar patterns of micro-epidemics of

The traditional, endemic patterns of transmission of

(PDF)

(CSV)

(DOCX)

(PDF)

(DOCX)

(R)

(R)

(MP4)

(MP4)

(MP4)

(MP4)

We thank Michelle Kaplinski and Alicia Hedron who oversaw the hospital based studies discussed in the supplemental material; Vitaliana Cama and Jenny Ancca-Juárez who typed