Integration of animal health and public health surveillance sources to exhaustively inform the risk of zoonosis: An application to echinococcosis in Rio Negro, Argentina

The analysis of zoonotic disease risk requires the consideration of both human and animal geo-referenced disease incidence data. Here we show an application of joint Bayesian analyses to the study of echinococcosis granulosus (EG) in the province of Rio Negro, Argentina. We focus on merging passive and active surveillance data sources of animal and human EG cases using joint Bayesian spatial and spatio-temporal models. While similar spatial clustering and temporal trending was apparent, there appears to be limited lagged dependence between animal and human outcomes. Beyond the data quality issues relating to missingness at different times, we were able to identify relations between dog and human data and the highest ‘at risk’ areas for echinococcosis within the province.

Introduction Cystic echinococcosis (CE) is caused by Echinococcus granulosus (EG), a cestode of the family Taeniidae whose hosts are herbivore and carnivore animals (sheep and dogs being the most important involved in the transmission to man). It is a parasitic zoonosis endemic in South America, especially in Argentina, Chile, Peru, Uruguay and southern Brazil [1].
Among the factors that contribute to the transmission of EG, the type of sheep production in subsistence economies and the practice of feeding sheep offal to dogs appear most relevant. Other factors, such as temperature and humidity, impact the survival and dispersion of the parasite's eggs to the environment [2].
In the province of Rio Negro, south of Argentina, CE is the most important zoonosis. In 1980, when the province's control program against CE was launched, passive surveillance detected 146 human cases (incidence rate: 73 cases per 100,000). The location of the province of Rio Negro is highlighted in Fig 1A and its 18 subregions in Fig 1B. Rio Negro is one of the six provinces that make up the Argentine Patagonia. The CE endemic area is in the south-west of the province, in an area of 120,013 km2 with a population density of 0.88 inhabitants/ km2. The CE program control area has 13 hospitals and a network of 80 Primary Health Care Centers (PHCCs), normally manned by a sanitary agent or nurse in rural areas, and a general practitioner in urban PHCCs.
The geographic areas of greatest risk are located to the west and center of the province [3][4][5] including the towns of Bolsón and Bariloche in the Cordillera region and those of Comallo, Pilcaniyeu, Ñ orquinco and Ingeniero Jacobacci and their rural areas in the Patagonian plateau region, where the ecological conditions favor the survival of Echinococcus granulosus eggs and the social, cultural and economic conditions, with subsistence sheep farming and the persistent practice of feeding sheep offal to the dogs, generate an epidemiological environment that favors the sustenance of the transmission cycle [6].
The control program, which mainly consisted of regular administration of praziquantel to dogs, contributed to the reduction of CE incidence to 29 cases per 100,000 in 1997, and to 7 cases per 100,000 by 2016. Surveillance is essential to evaluate the effectiveness of the control efforts. In addition to the clinical cases detected via passive surveillance, active surveillance in the form of ultrasound screening (US) of all 7 to 14 years old children attending school was implemented [7]. In 1984, the first US campaign showed a prevalence of 5.6% [7]. Subsequent screening in 1997 showed a prevalence of 1.1%, and of 0.2% in 2016 [7,8].
Surveys of sheep farms monitoring environmental exposure were also conducted, specifically targeting lambs and resident dogs to monitor recent infection. CoproELISA was used as a screening method in dogs with confirmation by Western Blot (WB) in fecal samples obtained from the environment [9,10]. ELISA was used as a screening test on sera from sheep with confirmation by WB [11].
Disease risk mapping is important for the understanding of the spatial epidemiology of infectious diseases. Integration of data and analyses, whether of population or health related variables, has been shown to improve risk classification accuracy and the efficacy of detection [12]. However, in most cases, even for multi-host diseases such as zoonoses, risk estimation has been conducted in an univariate fashion based on human case data alone. Specifically, for zoonotic diseases, knowledge of spatial and temporal patterns of the animal host could inform incidence in humans [13]. Linked models have been widely used in several diseases such as tularemia [13], and respiratory diseases [14]. Bayesian approaches have been used extensively to model disease counts at the small area level [15,16], and to facilitate the joint modeling of animal and human data [17].
For CE, Bayesian and spatio-temporal analysis models have been developed widely (e.g. in China [18,19], Iran [20] and Kyrgyzstan [21]). However, none of these studies merged multiple surveillance sources and simultaneously combined animal and human case data. The aim of this study is to exhaustively analyze CE surveillance data for the identification of joint patterns of disease in humans and animals, and the achievement of more precise CE risk estimates in the province of Rio Negro, Argentina.

Materials
Animal data. Sheep farms, selected following a simple random sampling scheme across the entire province, constituted our epidemiological unit (EU) of interest. Samples of canine faecal matter were collected from the soil of sheep farms in close proximity to the houses. Samples were processed by coproELISA and, if positive, confirmed with WB. In parallel, blood samples from lambs at the same EU were processed by ELISA with confirmation by WB. A positive sample classified the EU as having recent transmission [10,22].
Two data sets were considered: a. Lamb and dog annual case data for the period 2003-06: It includes data on the total number of farms sampled and the total number of farms with recent transmission (at least one dog or one lamb positive).
b. Dog case data for the year 2010: It includes data on the number of farms sampled and the number of farms with recent transmission (at least one dog positive).
Human data. Annual clinical cases detected by passive surveillance, were classified in children up to 14 years of age and adults, in the period 1997-2016. We chose this classification of clinical cases to allow comparisons of recent infections in children vs. old exposure in adults. Information was also collected from the cases diagnosed in US surveys of asymptomatic school children from 6 to 14 years old (active surveillance). All cases were georeferenced to their health program area [7].
Both animal and human data were aggregated at the health program area level (m = 18) in Rio Negro (Fig 1).

Methods
Spatial analysis of animal data. In this first analysis, we consider the previously described animal data and fit a spatial model with the aim of estimating the risk of recent transmission in each health program area for the two periods 2003-06 and 2010.
Model formulation. Let y i represent the number of farms with recent transmission out of those that have been sampled (f i ) in the ith health program area. At the first level of the model hierarchy, we assume that the number of diseased farms y i are described by a binomial distribution with p i the probability of a farm having recent transmission in area i.
At the second level of the model hierarchy, the logit of the probability, p i , is decomposed in additive components representing spatial effects. In particular, we assume that the logit (p i ) = α 0 +u i +v i , where α 0 represents the overall risk of disease in the study region (here Rio Negro), and u i ,v i are, respectively, spatially correlated and uncorrelated random effects. The prior distributions for these parameters are defined as zero mean Gaussian for the intercept and uncorrelated random effect, and a ICAR correlation model for ν i [15]. Conventional prior distributions are assumed for the standard deviations of these effects. In particular, we assume a uniform distribution on a fixed range. This type of model is frequently used to provide an optimal description of the spatial variation of disease risk in spatial epidemiology studies [16].
Information was missing on both the number of cases and the number of farms sampled for certain years. We modelled this missingness by means of a Poisson distribution with parameter λ i which has a Gamma prior distribution defined to mirror the sampled population. The average number of farms sampled per area, 20.14, was used to set the mean of the Gamma prior distribution.

Spatio-temporal analysis of animal data
In this section, we consider the annual data and fit a spatio-temporal model with the aim of estimating the risk of recent transmission in each area and its evolution over time.
Model formulation. As in the spatial analysis, we assume that the number of farms with recent transmission out of those that have been sampled in the ith small area at time j (f ij ) follow a binomial distribution with p ij the probability of a farm having recent transmission.
At the second level of the model hierarchy, the logit of the probability (p ij ) is decomposed in additive components representing spatial and temporal effects: where α 0 , u i and v i are defined as those above, γ j is the temporal random walk component representing a common trend in the study region, and δ ij is uncorrelated space-time interaction random effect. As done in the spatial models for missing data, we assume that f ij has a Poisson distribution with a parameter which has a suitably scaled gamma prior distribution. This prior sets the mean number of sampled farms as 15, which is similar to the number of farms sampled in the areas with complete information.

Human case analysis
We jointly modelled active and passive surveillance data. In one scenario (see below), we split the active surveillance data into adult and child cases.
Model formulation. For Model 1 (and variants 1.1, 1.1.a, 1.1.b, and 1.2) we let cases(all) ij represent the number of passive surveillance cases observed for both children and adults together, e(all) ij represents the expected case count, n.screened ij the total number of children screened (active surveillance), and pos.screens ij the number of screened children with a positive test. The passive surveillance model is set up as a Poisson data model for all cases with mean e (all) ij � θ ij where, θ ij represents the relative risk of surveillance cases in area i at time j. The screening model on the other hand assumes that positive screening findings have a binomial distribution with probability p ij . Due to missingness in the screening data, we assume that we could use the screening observed in non-missing areas as a basis for imputation for missing areas. Missing n.screened data is imputed using a Pois(λ ij ) distribution with λ ij having a gamma prior distribution that reflects the mean number of screened individuals for those health program areas with data (approximately 200). θ ij and p ij were modeled with log and logit link functions respectively with varying degrees of linkage between the models. The full list of models is given in S1 Table. Model variants include v and v1 terms which are both uncorrelated error terms independent from one another. Each have Gaussian prior distributions. Spatio-temporal error terms, ψ and ψ1, are also included in some models with Gaussian prior distributions. The common spatial error and temporal error terms will allow us to determine if the screening and passive surveillance case data are behaving similarly across regions and time. Each of the modification factors has a unique Gaussian prior distribution.
Model 1.0 uses effects unique for each of the outcomes. That is, none of the terms modeling θ are used in modeling p. Model 1.1 allows the spatial correlation random effect (u) to be common to both models with a factor (φ) modifying the proportion of positive cases' spatial effect. This common spatial effect allow us to link the outcomes we see in the surveillance and screening data with a common spatial trend. Model 1.1a uses the same common spatial trend, and incorporates a common temporal term (γ) trend with a factor (χ) modifying the effect. Model 1.1b uses common spatial, temporal, and spatio-temporal error terms with a modification factor ρ introduced for the spatio-temporal term. Model 1.2 uses all common random effect terms but a different intercept, trying to completely connect the two sets of data, just scaling by a different intercept.
Model 2 scenario (with variants 2.0, 2.1, 2.1.a) splits the passive surveillance data into children and adults, where child cases have a Poisson distribution with mean e(child) ij � θ.c ij and adult cases with mean e(adult) ij � θ.a ij . All terms are defined similarly to when data was aggregated across age groups. In this case, age-based population data was used to develop the expected rates for children and adults, respectively. Also, the models for child and adult surveillance are linked by a common spatio-temporal term, δ, that has a zero mean Gaussian prior distribution. S2 Table shows the models tested using the split passive surveillance data. Models 2.0, 2.1, and 2.1a mirror the relationships shown in Models 1.0, 1.1, and 1.1a; however, instead of modeling θ as the relative risk of disease aggregated for both adults and children, we split children and adults into two separate models with two separate θs. Model 2.0 uses separate random effect terms for each of the 3 models (θ.c, θ.a, and p). Model 2.1 uses a common spatially correlated random effect (u) across all 3 models with modification factors φ and φ1 scaling the effect from θ.c to θ.a and p respectively. Model 2.1a uses the common u term with its modification factors and also uses a common temporal random effect term (γ) with modification factors χ1 and χ2.

Human and animal analysis
In this section, we used the animal data as a covariate in our analysis of the human data. We evaluated the animal data in two separate ways. For Model 1.3a, we looked at aggregated data In Model 1.3a we added β 2006i and β 2010i as coefficients for the effect of the covariates in the surveillance model. The regression coefficients were allowed to have spatially structured prior distributions so that different areas could respond differently to the effect of predictors. We assumed that the regression parameters (β� i ) have spatially correlated ICAR prior distributions while the precisions have uniform prior distributions.
We also used the same βs for the screening model but with modification factors ω1 and ω2 to link the two models. In Model 1.3b we excluded β 2010i for both surveillance and screening models (not shown).

Model fitting procedure
All models were fit using WinBUGS, with a burn-in of 100,000 iterations and a posterior sample of between 5,000 and 25,000 iterations. We checked convergence based on the Brooks Gelman Rubin (BGR) diagnostic statistic for deviance first, and then for each of the model parameters being estimated. Models were compared using the Deviance Information Criterion (DIC) and effective parameters (pD) to compare the models' fits.

Ethics statement
The data used in this study consists of counts of diseased animals and humans from small areas. The human data arises from passive and active surveillance. All cases are de-identified and only the aggregated counts of cases are used in the analysis. Confidentiality of the human cases is assured by this de-identification and the aggregation ensures that the addresses are anonymized and hence not reported. Ethical approval was received for use of the data from the Departo Zoonosis, Ministerio Salud, Provincia de Rio Negro. For 2003-06 data, no strong spatial patterns emerge, but we see that El cuy, Sierra Grande, and Valcheta have over 15% of sampled farms with transmission (Bariloche has 100%; however, only 1 farm was sampled).

Spatial analysis of animal data
For 2010 dog data, again, no spatial patterns seem to emerge, but El bolson, Jacobacci, Maquinchao, and Niorquinco have over 15% of sampled farms with recent transmission. Because the affected areas have clearly changed over the years, a spatio-temporal analysis may be more revealing.

Spatio-temporal analysis of animal data
S3 Table shows  Here we see that in 2005 almost all regions showed >15% of sampled farms with transmission.

Human case analysis
Our next goal was to analyze human case data to identify when and where disease rates spike.  S4 Fig and S5 Fig represent rates when adult and child cases are taken separately. The child data is fairly sparse, and the human data mimics the overall data, with perhaps a greater skew to cases in the central areas.
Model comparison. Our models held different assumptions on the existence and strength of the association between human and animal data. We want to compare our model paradigms to investigate which is the best fit for our outcome data, providing some insight to the relationship between animals and humans. Because Model 1 aggregates all cases together and Model 2 splits the outcome into discrete categories of adult and child, the models cannot be compared using overall DIC. Within Model 1.0, 1.1, 1.1a, and 1.2 all DIC and pD values can be compared. Similarly, all DIC and pD values can be compared across models 2.0 and 2.1. Model 2.1a has a negative pD for outcome y2 (adult surveillance cases), which suggests a poor model fit and hence model mis-specification. Model 1.1a is the best model, as its overall DIC is lowest and convergence was strong. It models passive surveillance data together (child and human), incorporates a common correlated spatial random effect and a common temporal random effect across both passive surveillance and screening data, modifying the effects by universal scaling factors φ and χ, respectively.
The modification factors φ and χ are estimated at close to 1 (0.668 and 1.286, respectively); however, their standard deviations are quite large (5.4 and 9.9, respectively). This suggests that the two surveillance streams (active and passive) are behaving similarly over space and time. In Fig 4, the correlated spatial error term (u) is shown in one map and suggest clustering of risk, and it is constant over all years. The southwest region again shows higher relative risk. Fig 5 provides probabilities for program areas exceeding a RR of 1, that is P(RR>1), based on Model 1.1a. The figure splits regions into three categories: probability greater than 0.8, probability less than 0.2, and somewhere in between. These maps demonstrate where extreme risk estimates are located. These again show highest rates in the southwest, but the central regions showing higher values in more recent years. Note that any RR greater than 5 was set to 5 for legibility; many of the relative risks were much higher than 5 for these regions. displays the posterior mean estimates of risk and it is also evident that the south western areas are highest in disease risk.

Human and animal analysis
Finally, we see results of using animal data as covariates in the human model. Fig 7 and

Discussion
The analysis of screening data and passive surveillance data showed that the most appropriate model was that with a common spatial and temporal trend for both types of data, indicating that both surveillance systems are capturing similar disease patterns across regions and timespans. This suggests there may not be a strong difference between the observed adult and child incidence of echinococcosis. The analysis of the human data while using the animal data as a covariate did not show any conclusive relationship for the passive and screening data models; however, the areas of most elevated human risk (southwest), also showed the strongest association with animal case data. S1 The southwest area's elevated human rates and higher impact of animal disease may merit a finer scale case study of the transmission of disease between animals and humans, to account for the likely heterogeneity within health program areas.
In general, the geographical areas identified as being most at risk are those located in the west and center of the province, including Comallo, Pilcaniyeu, Ñ orquinco and Ingeniero Jacobacci and their rural areas, in the Patagonian plateau region, coinciding with previous reports [10,22].
The CE control program in Rio Negro has maintained a regular activity since its inception in 1980, however with interruptions in the deployment of some surveillance streams throughout the years. As a result of this situation, the initial occurrence of CE in people, including children from 6 to 14 years old, has persistently decreased, but the transmission is maintained in some pockets. In this context, adjustments to the epidemiological surveillance system towards detection of such defined areas are warranted. Our analyses aim to provide the evidence to support such adjustments. Likewise, it is essential to identify the effectiveness of each of the surveillance systems applied, including their correlation and / or enhancement, in order to eventually reduce the operating costs of the system. Surveillance in dogs, for its part, is the only tool to identify present transmission. With the advent of coproELISAs in the 1990s, laboratory testing of dog faeces was possible on a large scale and copro tests were used for surveillance in many programs, including Rio Negro [23].

Conclusion
Due to the complexity of parasitic cycles such as that of EG (passage through the definitive host, environment, time of evolution in the human host and time of detection of infection) the association of animal-human surveillance sources becomes more difficult to characterise.
Added to this is the difficulty in obtaining adequate data on animal surveillance in time and space, due to the geographical extension of control programs, the inherent associated logistical challenges, and limited operational capacity. Conducting studies at finer spatial scales and in smaller areas with high case rates where animal surveillance can be intensified may be an option for future research. Still, we were able to identify some relationship between animal and human data, specifically with the 2010 dog data. These results support a continued study of the predictive capabilities of animal disease data in understanding human risk.

Limitations
While our study has exploited the available data across surveys and passive surveillance, and used state of the art methods to inform the transmission behavior of CE, there are a number of limitations. Primarily, the fragmented scope of the animal and human data has led to reduced ability to find equivalent data frames. This has also limited the ability to detect lagged effects. Differences in data quality also inevitably impact the inferential ability of any methods. In addition, the sparseness in terms of low or zero counts in some areas can lead to less certainty in estimation of rates. While CE outbreaks are affected by environmental factors such as humidity, temperature and rainfall, we have endeavored to assess the animal-human linkage by accounting for confounding partially generated by these factors. It may be that inclusion of these factors could also help to elucidate the animal-human linkage. This we would examine in future work.
Supporting information S1 Table. Human spatio-temporal analysis models using aggregated surveillance data.