Impact of the dog population and household environment for the maintenance of natural foci of Leishmania infantum transmission to human and animal hosts in endemic areas for visceral leishmaniasis in Sao Paulo state, Brazil

When it comes to visceral leishmaniasis (VL) in Brazil, one of the main targets of public health policies of surveillance is the control of domestic canine reservoirs of Leishmania infantum. This paper aims to evaluate the effect of the dog population and household environment for the maintenance of natural foci in the transmission to human and animal hosts in an endemic city for VL, Bauru, in Brazil. We collected 6,578 blood samples of dogs living in 3,916 households from Nov.2019 to Mar.2020 and applied geospatial models to predict the disease risk based on the canine population. We used Kernel density estimation, cluster analysis, geostatistics, and Generalized Additive Models (GAM). To validate our models, we used cross-validation and created a receiver operating characteristic (ROC) curve. We found an overall canine VL (CVL) seroprevalence of 5.6% for the sampled dogs, while for the households, the positivity rate was 8.7%. Odds ratios (OR) for CVL increased progressively according to the number of canines for >2 dogs (OR 2.70); households that already had CVL in the past increased the chances for CVL currently (OR 2.73); and the cases of CVL increase the chances for human VL cases (OR 1.16). Our models were statistically significant and demonstrated a spatial association between canine and human disease cases, mainly in VL foci that remain endemic. Although the Kernel density ratio map had the best performance (AUC = 82), all the models showed high risk in the city’s northwest area. Canine population dynamics must be considered in public policies, and geospatial methods may help target priority areas and planning VL surveillance in low and middle-income countries.

This study aimed to calculate the impact of the household environment and canine population for visceral leishmaniasis risk through geospatial methods. We hypothesize that: a) in endemic areas for VL, a higher number of dogs in the households increases human or canine VL cases. b) the urban area is stratified by different geographical profiles that allow the remaining endemicity, needing targeted strategies as control measures. Using spatial analysis and statistical approaches, we constructed a space framework based on a large serosurvey conducted between November 2019 and March 2020 in the urban area of Bauru.

Materials and methods
Bauru is a central municipality in the state of Sao Paulo (22˚18'52" S, 49˚03'31" W). It is crossed by important highways: SP-300 Marechal Cândido Rondon Highway, SP-294 Comandante João Ribeiro de Barros Highway, SP-321 Cezário José de Castilho Highway, and SP-225 Engenheiro João Batista Cabral Rennó Highway, giving access to the countryside cities of the state and the capital Sao Paulo (Fig 1).

Study design
The canine serosurvey was conducted from December 2019 to March 2020. Agents of the Center for Zoonoses Control visited 3,916 households to collect the blood samples of 6,578 canines. In addition, a short survey was applied (S1 Fig) to the dog's guardian to identify the previous presence of an infected dog in the household (Fig 2A). We tested and mapped the samples (Fig 2B and 2C), and then we used spatial analysis to prepare data for creating thematic maps ( Fig 2F) and statistical models (Fig 2D-2I).
We applied a framework of starting with a statistical to select variables most influential in running the spatial models. The chosen method of statistics was the binary logistic regression, used to associate the size of the canine population related to the odds ratio of having VL cases. As the results of the statistical model were statistically significant, we ran the spatial models. The generalized additive model-GAM, geostatistic model, and Kernel density ratio were chosen to identify the spatial dependence of the cases and their spatial association with the number of the canine population, emphasizing hotspots of CVL. We finally validated our model using cross-validation (Fig 2F and 2J). The methodology performed here will be detailed in the following sections.

Definition of cases.
Dual-Path Platform rapid test (TR-DPP, Biomanguinhos1, Rio de Janeiro, Brazil,) is used by the Brazilian VLCP to test the samples. The TR-DPP1 is a test for Leishmania infantum based on the reaction of IgG to the antigen K28. Enzyme Linked Immunosorbent Assay ELISA-Biomanguinhos1 is used to confirm the positive diagnoses. ELISA is characterized by the reaction of soluble and purified antigens of Leishmania promastigotes, obtained from cultures and adsorbed in microtiter wells with Leishmania-specific antibodies present in serum samples. The diagnostic was run in a Multiscan spectrophotometer using a 450 nm filter and cutoff values ("Cutoff" = CO): CO = average negative controls x 2. The diagnoses were performed according to the manufacturer's instructions and the directions of the VLCP.
A combination of TR DPP1 and ELISA reagent was considered a positive result according to the Brazilian VLCP recommendations for canine diagnoses, routinely used by the Centers for Zoonoses Control in Sao Paulo [20]. TR DPP1 non-reagent was considered a negative result- Fig  2A. The prevalence was calculated based on the outcome, being a proportion of a dog found positive for CVL divided by the sampled dog population. The guardian of the dog provided consent for sample collection involving domestic dogs in the areas surveyed. All serosurvey was supervised by the veterinary group of the Center for Zoonoses Control in Bauru municipality. Households without dogs, closed or that refused to give consent were excluded from the analysis.
The human laboratory diagnoses are based mainly on serological methods and microscopic diagnoses (parasitological). When amastigotes are identified, it is considered a certainty diagnostic. Patients with clinical manifestation and reagent rapid immunochromatographic test rK39 and/or Indirect immunofluorescence with titers equal to or greater than 80 are considered positive for VL [20]  cartographic street map. A lower score of geocoding was topologically adjusted to ensure the correction of georeferencing. Point features were plotted in a Geographic Information Systems (GIS) ArcGIS 10.2.2 (ESRI, Imagem).
Canine data and the households were mapped as point data; however, they have different cartographic scales once the household (mapped by address) is the boundary, besides dogs are also represented by geocoded addresses. It means that more than one dog in a household is visualized as one point, but there are overlapped(s) dog(s) (points) in there. Bearing this in mind, the dog layer was transformed into the household layer with the number of dogs (Fig 2C and  2D) and categorized as negative or positive for VL in each survey. Fig 3 shows the mapped data.
To run the Generalized Additive Model (GAM), we created a fishnet grid of 1000x1000 of 50m containing the estimated number of dogs per domicile in each coordinated. We calculated the number of dogs based on the human population by census tracts [21], in a ratio of 1:4 dogs/inhabitants [19]. The mean number of dogs was divided by the mean number of households in each census tract. We then created a centroid and calculated the Inverse distance weighted (IDW) interpolation, considering the mean number of dogs per domicile. The grid values were extracted from the raster surface generated by the IDW (S7 and S8 Figs).
To analyze the area of influence of households with infected dogs in the environment, we created buffer zones of 100m (Fig 3). We then calculated the number of dogs, negative dogs, and positive dogs using spatial analysis tools. Finally, we aggregated features of point data into polygons, using the census tracts database, to stratify the prevalence spatially.

Statistical and spatial analysis
For all the performed calculations, we considered a significant value at p�0.05. We used the geographic information system (GIS) ArcGIS 10.2.2 and R language, with several packages described in the sections below.
The number of dogs per household was categorized as binary to verify each group, for instance, households that have only one dog (1 = 1 dog and 0 > 2 dogs); two dogs (1 = 2 dogs; 0 = 1 or > 2 dogs); and two or more dogs (1 = > 2 and 0 = 1 or 2 dogs). Households that already had a positive dog or a human case were also categorized as binary, e.g., 1 for cases and 0 for non-cases.

Pearson's correlation.
We calculated Pearson's correlation to identify a possible association between the number of cases of CVL and: i) the number of investigated samples or ii) the number of households that already had an infected dog, or iii) the number of households that already had and currently have an infected dog/dogs.

Binary logistic regression.
We tested if the households with an infected dog (outcome = 0 for a household with no infected dog/dogs or outcome = 1 for a household with an infected dog/dogs) or an area of influence of household (outcome = 0 for areas of influence of household with no infected dog/dogs or outcome = 1 for an area of influence of household with infected dog/dogs) could increase the chances to have cases of the disease. We used 'oddsratio' package in RStudio (4.0.0).

K-function.
Being aware of spatial dependence of CVL promoting different risks or protection, we evaluate, locally, the spatial interactions in the urban neighborhoods. Ripley's Kfunction with 999 permutations was applied to identify households' spatial patterns at distances [22]. In this function, K(t) is the number of events within a distance of an arbitrary event, divided by the overall density of events. We plotted maximum and minimal envelopes of K(t) simulated values, giving the statistical significance for clustered or dispersed patterns (S2 Fig).

Cluster analysis.
We used cluster analysis to detect significant concentrations of CVL within Confidence Intervals (CI) of 90%, 95%, and 99%-S5 Fig. Clusters were calculated using the Getis-Ord Gi statistic, which identifies features with high or low values of a spatial cluster. The pattern can be expressed by clustered, dispersed, or random features representing a measurable spatial aggregation unit.

Kernel density.
Using the K-function dependence, we chose the minimal distance of concentration of our data, 0.5 km, to set the bandwidth. We used the quartic kernel function [23], which is given by where: A Kernel density ratio map was then performed (CLV: samples), which gives a visualization of the risk for the disease.
2.2.6. Geostatistical approach. According to the number of dogs, a geostatistical approach was performed to predict the higher risk areas for CVL. We used the Ordinary Kriging method and select two datasets: cases of CVL and number of dogs. We adjusted data in a stable model in a semivariogram, in which for a set of experimental values z(x) and Z (x1+h), separated by h distance, is defined by the Eq 2: Where,

N(h) is the number of experimental pairs;
h is the regular interval that separates z(xi) e z(x i +h) Geostatistics parameters were adjusted as follow: number maximum and minimal neighbours = 5 and 2, respectively; lags = 12; lag size = 0.64; nugget = 0.46; range = 3.8974; sill = 0.062 and 45 degrees- S6 Fig and S1 File.
2.2.7. Generalized additive model. We run a GAM according to an approach reported for case-control data [24,25], in which we considered Y i = 1 (cases) and Y i = 0 (non-cases), d is the number of dogs in location i, and P(Y i = 1|d i S i ) is calculated according to Eq 3: where, a is the ratio of cases to non-cases, b is the coefficient for the number of dogs per household, S i is a function of the residual spatial variation after accounting for the effect of the number of dogs.
We model S i by a locally estimated scatterplot smoothing (LOESS) regression smoother against the Universal Transverse Mercator coordinates. We choose the optimal smoother parameter of the models based on Akaike's Information Criterion (AIC) [24] after testing multiple bandwidths (S2 File). We predicted the adjusted log odds for each location and omitted the covariate and smoothing terms through a null model. GAM was run in RStudio using the 'gam' package.

Cross-validation
For further analysis, we validated our models using cross-validation. We created random samples in ArcGIS and split our database into training (75%, 2,937 points) and testing (25%, 979 points). Spatial models were created using the training dataset to predict the risk for the testing dataset. For each model, the best threshold was chosen, and we calculated specificity, sensitivity, and accuracy for correctly predicting the observed value of a case or non-case at the testing coordinates. To sum up, we calculated the area under the receiver operating characteristic (ROC) curve (AUC) with 95% confidence interval, which plots the true positive rate versus false positive rate, allowing identifying the performance of the models. We used the 'pROC' and 'ggplot2' packages in RStudio.

Results
We investigated 6,578 dogs, in which we found Anti-Leishmania spp. antibodies in 8.1% of TR DPP1 (535/6,578) and 5.6% in both TR DPP1 and ELISA (369/6,578). We found different spatial prevalence in the city, ranging from 0 to 50%. The mean prevalence was 2.67%, and the higher prevalence (>7.5%) was regularly distributed in the sampled area (Fig 4).

Cluster analysis
We identified a clustered pattern of households with CVL with statistical significance from approximately 0.5 to 6.5 km, and a clustered pattern of human cases from 0.  We found spatial clusters of high values (hot spots) in the west, north, east, south, northeast, southwest, southeast, and in Tibiriçá, a municipality district (Fig 4).

Kernel's map
The ratio of cases per sample concentration represented in the Kernel map shows high-risk areas in the Pq. Jaraguá, Pq. Santa Edwiges and Vila Nipônica neighborhoods (Fig 5). Other high-risk concentrations in Kernel's map represent the border effect.

Pearson correlation
Pearson's correlation was positive and moderate considering the number of infected dogs and the households investigated (0.508, p-value = 0.000); positive low for infected dogs and the households that already had an infected dog/dogs (0.240, p-value = 0.000); and positive and low for infected dogs and the households that already had and currently have a dog/dogs with VL (0.129, p-value = 0.000). All conditions were statistically significant.

Binary logistic regression for visceral leishmaniasis
We investigated 3,916 households, in which 16.7% (656/3,916) already had a positive dogindependently when it was (Table 1) The maximum number of dogs per household was 17, but the mode was 1, and the mean was 1.67, with a standard deviation of 1.08. In an area of influence of a household (a = 31,374m2), the maximum number of dogs was 58, the mode 11, mean 16, and the standard deviation 11.37. Households that contained only one dog represent almost 60% of the domiciles, two dogs 27.6%, and more than three 14%.
Analyzing the census tracts, the odds ratio (OR) for the number of CVL and the examined dogs was 1.37 (Table 2). OR for CVL increased proportionally to the number of dogs. For households that contained only one dog was 0.40 and increased 242% for those with two dogs (OR = 1.39); and 97% when more than two dogs (OR = 2.70). For households that already had a positive dog, the OR was 2.73.
Similarly, OR for the area of influence of household (buffer of 100m) also increased according to the dogs' population. From 10 to 20 dogs, OR was 1.25 and increased 120% for 21 to 30 dogs (OR = 2.76). The influence area of households with more than 30 dogs increased by more than 150% (OR>7). In an area of influence of household, households that already had a positive dog/dogs with VL increase the chances of CVL 299% (OR = 2.99), analogous to the condition of households that already had dog/dogs with VL, in which the OR was 2.73.
Considering the human cases in an area of influence of household, the number of dogs increased the chances 102%, and the number of positive dogs 116%, demonstrating the association between canine and human VL. The number of dogs increased the chances 261% for more than 40 dogs.

PLOS ONE
Impact of the dog population and household environment for visceral leishmaniasis

Spatial risk
Considering high OR for CVL according to the number of dogs, we created the spatial models using dogs as a predictor. Fig 6 shows that both models (geostatistical and GAM) were considerable commonality in the spatial pattern regarding the higher odds ratio areas. The higher risk is in the city's borders, especially in the northwest and in the southeast. The last one can be a border effect. Overall, both models are spatially consistent with the Kernel density ratio map (Fig 5).

Cross-validation
We plotted the Kernel density ratio, Geostatistical, and GAM models in a receiver operating characteristic (ROC) curve (Fig 7). The Kernel density ratio map presented the best threshold of 0.059, a sensitivity of 88%, a specificity of 62%, and an accuracy of 64%. The geostatistical model presented the best threshold of 0.075, a sensitivity of 65%, a specificity of 52%, and an accuracy of 53%. The GAM model presented the best threshold of 0.076, a sensitivity of 88%, a specificity of 19%, and an accuracy of 25%. The first had an AUC of 0.81 (CI 0.76-0.85), the second of 0.59 (CI 0.53-0.66) and the third of 0.54 (CI 0.47-0.60). The Kernel density ratio map presented the best performance in the ROC curve (Fig 7).

Discussion
In the current study, we found a CVL TR DPP1 sero-reaction rate of 8.1% (535/6578) and 5.6% in TR DPP1 confirmed by ELISA (369/6578), likely consistent with an endemic area of Sao Paulo state, Araçatuba, where the average prevalence between 2010 and 2015 was 6.8%  [29]. Particularly, prevalence can reveal bias once it may not represent the real number of canines. Overall, the serosurveys are directed to human case areas or areas of a suspect or identified CVL case [6,20]. Historically, in Bauru, some neighborhoods have never performed a serosurvey before this study. On the other hand, some neighborhoods were investigated more

PLOS ONE
than once since the first human case appearance, recognized by its recurrence of CVL. Our study planned the serosurvey to collect a large number of dog samples in different neighborhoods, giving a panorama of VL's endemicity and spatial epidemiological profile in a short time. Nevertheless, sampled canines comprised less than 7% of the estimated dog population (6,578/99,815 dogs).
In the present study, our scale is the household instead of only the dogs, identifying spatial characteristics regarding the domiciles and dog population. We highlight that on the household scale, the positivity rate of domiciles that contains infected dogs (8.7%) is superior to the sampled prevalence of CVL (5.6%), which emphasizes the importance of the household environment in the disease context. Clusters of households that already had CVL can point out the areas that remain a source of infection and are unnoticed. Almost all investigated areas had these clusters. Additionally, asymptomatic dogs can be highly competent [33] and remain a source of infection without being identified. They contribute to the silent endemic areas. It can turn out into highly endemic areas or possibly a human case site.
The recent expansion of VL to new endemic areas has been attributed to the adaptation of L. longipalpis (sandflies) to naïve ecological niches. The risk of expansion of VL increases in areas identified as migratory poles of attraction. Moreover, CVL has been highlighted as the primary cause of outbreaks [34]. In these areas, canine enzootic disease precedes the appearance of human cases. In our study, human cases can increase 102% according to the dog population and 116% by the CVL, demonstrating the spatial association between canine and human VL. Other studies found that the risk increased substantially for individuals when the presence of seropositive dogs [35] or previous cases of CVL in the household [36,37]. Furthermore, we identified a clustered pattern for both human and canine cases from approximately 500m.
As we identified, households that contain CVL and the dog population can increase the odds ratio for VL. They may influence the natural foci of Leishmania infantum transmission to human and animal hosts, which urges specific public policies focused on education in animal health, especially in areas target as critical. We found the same mean number of dogs per household (1.6), as reported in previous research [19]. However, about 15% of the domiciles contain more than three dogs. They should be monitored and investigated as possible infection sites. In the neighborhoods where the canine population is large, animal health assistance is required. Therefore, canine population dynamics must be considered in public policies.
Our results revealed that the risk of increasing CVL or human cases oscillated by areas. Of note, kernel maps studies have used the total number of cases or applied a constant [30]. Our study used the number of cases and samples, which gives a visualization of the risk. In accordance, a study [29] found a similar pattern of critical areas in the city's borders. This peripheral spatial pattern seems to be expected in small and medium-sized cities of similar urbanization processes in low and middle-income countries where VL is endemic. The Kernel density ratio map was the best in the ROC curve, showing the potential of spatial analysis tools. Studies that use spatial models predicting disease risk are promising for decision-making in the control of VL.
Regarding leishmaniasis, such studies use machine learning for cutaneous leishmaniasis vectors prediction [38] or human cases prediction [39]. Nonetheless, studies using a machine learning-based approach of CVL risk prediction considering the number of dogs are still incipient. Bi et al. stress that future research about VL should focus on spatial simulation and agentbased simulation [40]. Machine learning is a novel approach that allows the forecast of disease risk. It can anticipate disease transmission dynamics and identify disease control strategies to fight endemic and emerging diseases [40]. Our models bring new insights for thinking VL through dogs from a social perspective, which has been one of the most debatable points of control programs and tends to be of low priority in the context of general public health.
It is a time of changing public policies in relation to VL. The general principles that guided the past control programs are now questionable. Guided by the VLCP, Brazilian municipalities have presented operational difficulties in executing VL control strategies [41]. Additionally, we emphasize the unavailability of the proven effectiveness of technical alternatives for laboratory diagnosis, identification, and elimination or protection of reservoirs [42].
In Brazilian cities, culling dogs has been recommended as a control measuring to reduce VL [20,43], which creates a dog stigma. Nonetheless, culling dogs is highly controversial, considering the time between diagnosis and action. Also, the rapid replacement of euthanized dogs, entrance of new animals into the households [44][45][46], and the persistent disease spread challenges public policies [13,42,47,48]. On the other hand, other strategies, such as the use of dog collars with insecticides for sand flies, or vaccination [49][50][51], have been highly encouraged due to their effectiveness in reducing the population of vector, parasitic load, and potentially the VL transmission [52][53][54][55].
Furthermore, vast territorial areas should be treated by priority order, emphasizing different profiles of VL. The decision-making should be supported by an integrated approach, considering the genetic diversity of vectors [56,57] and the protozoa [58,59] but integrated with education, health, and environment, including vectors, causal agents, canines, households, population density, urbanization, industries, and environmental factors, such as vegetation, water bodies, temperature, and precipitation. Animal health needs to be discussed in public policies without its stigma. Furthermore, VL should be addressed in the context of One Health [42].
To conclude, this paper had several limitations that should be recognized. Firstly, we had to use the census tract information based on the human population to calculate the dog population grid because of the lack of animal information. This could be solved with an updated canine census, hardly achieved in low and medium-income countries. Secondly, the performance of our spatial models had medium and low accuracy, although the critical areas being commonly similar to the Kernel density ratio map of higher performance. A better model's performance could be improved with an updated census and adding real-world covariates when data become available.
Yet, there are research gaps concerning VL, and many areas of study remain unexplored. It remains the question of balancing the effectiveness and costs involved in such a VL control plan [40]. As future work, the next step of our study is to analyze the canines' role with new insights of controlling VL, for instance, the canine cohort of insecticide-impregnated collars, vaccination, and treatment in different areas of this endemic site, as an individual and collective measure in the environment.

Conclusions
As a rule of thumb, one can say that the number of dogs and the households impact the risk for maintenance of natural foci of Leishmania infantum transmission to human and animal hosts in endemic areas for VL. Overall, this investigation serves as a case study for regional and global applications. It reveals the importance of canines on the household scale in low and middleincome countries. It is time for changing VL public policies using a targeted plan of priority through spatial analysis. This statement invites further investigations regarding VL characteristics involving socioeconomic and environmental variables in the context of One Health. This is an example of a cluster map for the households that have CVL currently. For each category, cluster maps were created: i)households that have CVL; ii) households that already had CVL; iii) households that already had and currently have CVL. The coldspots and the non-significant data were excluded in the final cartographic representation (Fig 4).