Environmental Niche Modelling of Phlebotomine Sand Flies and Cutaneous Leishmaniasis Identifies Lutzomyia intermedia as the Main Vector Species in Southeastern Brazil

Cutaneous leishmaniasis (CL) is caused by a protozoan of the genus Leishmania and is transmitted by sand flies. The state of Espírito Santo (ES), an endemic area in southeast Brazil, has shown a considerably high prevalence in recent decades. Environmental niche modelling (ENM) is a useful tool for predicting potential disease risk. In this study, ENM was applied to sand fly species and CL cases in ES to identify the principal vector and risk areas of the disease. Sand flies were collected in 466 rural localities between 1997 and 2013 using active and passive capture. Insects were identified to the species level, and the localities were georeferenced. Twenty-one bioclimatic variables were selected from WorldClim. Maxent was used to construct models projecting the potential distribution for five Lutzomyia species and CL cases. ENMTools was used to overlap the species and the CL case models. The Kruskal–Wallis test was performed, adopting a 5% significance level. Approximately 250,000 specimens were captured, belonging to 43 species. The area under the curve (AUC) was considered acceptable for all models. The slope was considered relevant to the construction of the models for all the species identified. The overlay test identified Lutzomyia intermedia as the main vector of CL in southeast Brazil. ENM tools enable an analysis of the association among environmental variables, vector distributions and CL cases, which can be used to support epidemiologic and entomological vigilance actions to control the expansion of CL in vulnerable areas.


Introduction
Geographical location of the state of Espírito Santo, southeastern Brazil, South America and the geo-climatic zones. By definition, a hot zone has a minimum average temperature of 11.8-18.0˚C and a maximum of 30.7-34.0˚C. A mild zone has a minimum average temperature of 9.4-11.8˚C and a maximum of 27.8-30.7˚C. A cold zone has a minimum average temperature of 7.3-9.4˚C and a maximum of 25.3-27.8˚C. A steep slope zone has a slope above 8%, and a plain relief occurs when the slope is below 8%. A wet zone has < 4 dry months per year, a wet/dry zone has 4-6 dry months, and a dry zone has > 6 dry months [21]. Datum: SIRGAS 2000. doi:10.1371/journal.pone.0164580.g001 Environmental Niche Modelling and Cutaneous Leishmaniasis southern part of the Central Atlantic Forest Corridor, which is one of the main stretches of the dense ombrophilous forest found in this biome, presenting a high degree of endemism and species diversity [25].

Entomological and epidemiological data
The occurrence data of phlebotomine sand fly species were obtained from samples taken in several rural areas of 78 municipalities of ES every year between 1997 and 2013. Each collection occurred during the first 3 hours after sunset through active and passive capture using a manual Castro-type suction tube and Centers for Disease Control and Prevention CO2 trap (CDC) and Shannon light traps installed in the areas surrounding homes. During active surveillance, insects were collected from the internal and external walls of homes, home additions, animal shelters, and tree trunks. Sampling was performed by well-trained technicians, regardless of the occurrence of disease outbreak. The species identification and nomenclature were in accordance with Young and Duncan [26].
The localities and epidemiological data of the CL cases in ES were extracted from the medical records of patients cared for at Hospital Universitário Cassiano Antônio de Moraes (HUCAM) from 1978-when an effort began to initiate a careful and accurate identification of infection, with characteristics being detailed-to 2013. HUCAM is a reference hospital used in CL diagnosis and treatment for the municipalities of ES, centralizing every patient attendance, which verifies that the sample is representative of CL occurrence in that region. The type of information extracted from the medical records included geographical coordinates of the locality where the patient was infected.

Environmental analysis
Environmental variables from the WorldClim database were used for the modelling, with an accuracy to within approximately 1 km [24]. This database comprises 19 bioclimatic variables derived from temperature and precipitation, as well as terrain elevation, and calculated the percent slope that is a measure of terrain inclination relative to the horizontal plan.
The data for these bioclimatic variables were generated through interpolations of climate information from 1950 to 2000, obtained from approximately 50,000 weather stations distributed around the world [24]. The elevation data were obtained from the Shuttle Radar Topography Mission (SRTM) of the National Aeronautics and Space Administration (NASA). All variables had a 30-arc second (~1 km at the Equator) spatial resolution [24].
Spatial procedures of this study were performed using the Geographic Information System (GIS) ArcGIS ver. 10.1 (ESRI, Redlands, CA, USA) with Geocentric Reference System for the Americas (SIRGAS, 2000) data. All studied locations were georeferenced with a Global Positioning System (GPS).

Environmental niche model of Phlebotominae species and CL
A maximum-entropy modelling approach using Maxent version 3.3.3k (https://www.cs. princeton.edu/~schapire/maxent/) was used to build potential distribution models of the five most common species, namely, L. intermedia, L. migonei, L. whitmani, L. choti, and L. lenti and the distribution of CL cases. Maxent is an a-based model from presence-background data that estimates a target probability distribution by finding the probability of distribution of maximum entropy. Maxent generates a "background" or "pseudo-absence" sample of the occurrence points. By default, 10,000 pseudo-absences are randomly selected from the entire rectangular study area [27]. Recently, this method has been widely used in species conservation studies. Similarly, these tools have been used in healthcare studies, presenting considerable development over the last 10 years [28-38].
The occurrence data for the species were separated into two sets, as follows: one set for the model calibration/training (75% of the occurrence localities) and the other set for the model evaluation/test (25% of the occurrence localities). The percentage contribution of each variable to the final model based on how much the variable contributed to the model dependent upon the path selected for a particular model run was provided by Maxent; in this heuristic approach, the contribution values are determined by the increase in gain in the model provided by each variable [27,39,40]. The model training and test procedure were replicated 10 times for each species, and the mean was calculated. The sampling technique used was Bootstrap, in which the training data are selected by sampling with replacement from the presence points, with the number of samples equalling the total number of presence points [40].
ENMtools version 1.3.3 (http://enmtools.blogspot.com.br/) was used to evaluate which species among those analysed in this study are more associated with the occurrence of the disease by overlapping the distribution of the vectors with the distribution of the CL cases. ENMtools overlaps the number of replicas generated by the Maxent software between individual species and cases of the disease using Schoener's D index calculation, which evaluates the similarity between these models. Schoener's D index was calculated with the construction of 10 replicates for each evaluated model, and therefore, 100 values of this index were obtained by analysing the overlap. The mean of this index was also compared to assess which species present a greater degree of niche overlap within the area of CL occurrence in the state of ES [41].

Statistical analysis
The validation of the models was performed using the receiver operating characteristic (ROC) curve, in which the true positive fractions are plotted against the false positive fractions. The area under the curve (AUC) is used as a measure of accuracy of the model [39,42].
The AUC values range from 0.5 to 1.0. The classification system for the AUC provided by Hosmer and Lemeshow was used in the current study. That system classifies the AUC values as follows: 0.5-0.6 = no discrimination, 0.6-0.7 = poor discrimination, 0.7-0.8 = acceptable discrimination, 0.8-0.9 = excellent discrimination, and 0.9-1.0 = outstanding discrimination [43].
The Kolmogorov-Smirnov test was performed to determine the normality for Schoener's D data and the variables' percentage contribution distribution. The Kruskal-Wallis test with posthoc Dunn's test was used to compare the medians of this indicator between species. The Kruskal-Wallis test followed by the Student-Newman-Keuls test was used to compare the percentage contribution of the five most important variables in the model construction. The significance level adopted was 5%.

Ethics statements
The study was approved by the Human Research Ethics Committee of the Health Sciences Centre of the Federal University of Espirito Santo (UFES) (opinion no. 494.029 from 13 December 2013). Patient information was anonymized and de-identified prior to analysis. The insects were collected on private lands with the permission of the landowners. No endangered species or protected areas were involved.

Descriptive analysis
In total, 249,783 specimens belonging to 43 species of phlebotomine sand flies were collected at 466 localities during the period between 1997 and 2013, covering all geo-climatic zones of the state of ES (Fig 2, S1 Table) [23].

ENM of the main species of phlebotomine sand flies
For the construction of the environmental niche models of the main vectors of the disease, the five species of phlebotomine sand flies most commonly reported in the state of ES were selected: L. intermedia, L. migonei, L. choti, L. lenti, and L. whitmani, excluding L. longipalpis, which is a vector of visceral leishmaniasis (Fig 3).
The AUCs were greater than 0.80, indicating excellent discrimination for L. migonei, L. choti, L. lenti, and L. whitmani ( Table 1). The model was also considered acceptable for L. intermedia; however, the average AUC was lower than the averages of the other species (0.78).
According to the analysis of the variables using the percent contribution heuristic test, all species showed high sensitivity to temperature, precipitation, elevation, and slope. Therefore, slope was an important predictor and contributed 37.85%, 33.62%, 20.32%, 15.61%, and 11.28% for the species L. intermedia, L. whitmani, L. migonei, L. choti, and L. lenti, respectively (S2 Table).
Regarding the other variables, L. intermedia and L. migonei showed similarities in the construction of their models. Slope, BIO13, BIO12, BIO5, and elevation contributed 71.71% to the construction of the model for L. intermedia, whereas slope, BIO13, BIO16, elevation, and BIO12 contributed 59.31% to the construction of the model for L. migonei.
Lutzomyia lenti and L. whitmani also presented similarities. BIO15, slope, BIO13, BIO16, and BIO11 contributed 69.27% to the construction of the model for L. lenti, whereas slope, BIO4, BIO13, BIO15, and BIO5 contributed 79.36% to the construction of the model for L. whitmani. Except for slope, L. choti did not present similarities with the other species. Slope, BIO4, BIO18, elevation, and BIO17 contributed 66.05% to the construction of the model for this species ( Table 2).

ENM of CL occurrence area
In total, 1,472 autochthonous human cases of CL were reported between 1978 and 2013. Among these cases, 1,264 (85.9%) presented cutaneous lesions, 49 (3.3%) presented mucocutaneous lesions, and 159 (10.8%) presented mucosal lesions. The model constructed in the current study shows that this disease occurs in the south-central region of the state of ES and is limited to areas below 1,119 metres; however, only 17 (4.76%) of 357 occurrence localities were related in elevation above 850 metres (Fig 4).
The model was considered excellent, with an AUC of 0.817 ± 0.020. The variables BIO4, BIO12, slope, BIO14, and BIO5 contributed 70.31% of the model construction (Table 3, S2  Table). Overlap between the distribution of human CL cases and Phlebotominae species Values of Schoener's D range from 0 (no overlap) to 1 (perfect overlap). This indicator was compared between species to evaluate which one is most involved in disease transmission ( Table 4). The Kruskal-Wallis and Dunn tests (multiple comparisons) were used to define the importance of each species. In the state of ES, the L. intermedia model showed significantly higher overlap, indicating that it is more associated with the disease transmission than secondary vectors such as L. migonei and L. whitmani (p < 0.05), with no difference between the latter two species (p > 0.05), followed by L. lenti and L. choti, most likely without epidemiological importance in the transmission.

Discussion
This is the first study to build and associate environmental niche models using algorithms between vector species and autochthonous CL cases in endemic areas of south-eastern Brazil.
Environmental  The current study revealed that L. intermedia was the species most associated with the occurrence of CL in the state of ES, followed by the secondary vectors L. migonei and L. whitmani. However, the species L. lenti and L. choti were not considered important vectors of the disease in that region.
Lutzomyia intermedia has been indicated as the main vector of CL in the state of ES [9,10]. This species is abundant in endemic areas of southeastern Brazil, and specimens have been naturally infected by Leishmania (Viannia) braziliensis Vianna, 1911 [5,11,[44][45][46]. This species is Environmental Niche Modelling and Cutaneous Leishmaniasis more associated with domiciles and peridomiciles and is strongly attracted to domestic dogs and horses [8][9]10,47]. Studies suggest that this species is adapted to habitats modified by human activity and presents a high degree of anthropophily and endophily [45,47].
Lutzomyia migonei has been considered a secondary vector of CL in anthropic environments, with occurrence at altitudes above 750 m in ES [11]. This species has an affinity for humans and domestic dogs. A study conducted in the municipality of Afonso Cláudio (midwestern region of the state) revealed that L. migonei is the most common species found inside the domicile [45,[47][48].
Lutzomyia whitmani has also been considered a secondary vector of CL, with high density in the endemic areas of the states of São Paulo, Minas Gerais, and ES [11,[49][50]. This species is more abundant in protected forest areas. Recent studies have identified this species in peridomiciliary areas, especially in domestic animal shelters [9,11,45,47]. This species has been implicated as being responsible for the link between the wild environment and areas surrounding homes; however, it has a low propensity to invade households.
Rocha et al. [51] suggested that a greater genetic diversity of Leishmania braziliensis is observed in areas of remaining forests due to the larger number of reservoirs and vector species, mainly L. intermedia and L. whitmani. However, in peri-urban areas, L. intermedia is the only vector, and domestic dogs are the reservoirs, a situation that reduces the diversity of the protozoa in this area. This reinforces the idea that L. whitmani is a connecting vector between the preserved and anthropic environments.
Lutzomyia lenti and L. choti have no association with the occurrence of CL. Despite its occurrence in all regions of Brazil, with a wide distribution and abundance, L. lenti has not been reported as an important vector in endemic areas of the state of ES. This species has been found in the mid-western region of the country in areas surrounding houses in the state of Mato Grosso do Sul and Goiás, mainly in domestic animal shelters [52][53][54][55]. A positive association between this species and cases of CL was observed in the state of Goiás [53]. Lutzomyia choti is a species with heterogeneous habitats and is present in both wild environments and   [56] suggested that this species may be associated with the occurrence of CL cases in Pernambuco, northeast Brazil, due to its predominance in the Zona da Mata area of the state.
Regarding the ecological niches constructed in this study, slope was a relevant variable in the occurrence of all species of phlebotomine sand flies. Areas with steep slopes have dimples, which allow a greater variability of water and organic matter accumulation, as well as a reduction in sunlight and wind exposure, thereby creating a greater diversity of habitats; this diversity in habitats contributes to the maintenance of food and shelter conditions for these insects [33].
Lutzomyia whitmani is limited to the western region of the state, which has lower precipitation rates, higher elevations, steep slopes, and low temperatures. Lutzomyia intermedia is found in warmer areas with lower slope gradients and lower elevations. In addition, this species is less susceptible to temperature and rainfall variations. Lutzomyia migonei presents a distribution between L. intermedia and L. whitmani and is concentrated in the mid-western region of the state of ES.
Elevation appears to be inversely proportional to the abundance of phlebotomine sand flies [9,11,51,57]. In the current study, L. intermedia was collected at 1,123 m above sea level but at low density, suggesting the importance of vector density in the occurrence of disease cases in a particular region.
The precipitation and maximum temperature of the warmest month were also relevant to the construction of the models. Temperature and humidity have positive effects on the activity and abundance of phlebotomine sand flies, which depends on the species [47,58]. In the neighbouring state of Rio de Janeiro, L. whitmani was more abundant in colder and drier months (June, July, and August), whereas L. intermedia was more abundant in the warmer months of the year (December, January, and February). These data are in agreement with previous studies performed by Souza et al. [47] and Costa et al. [49].
The variables of precipitation seasonality and temperature seasonality were considered important in the occurrence of L. whitmani, with the latter being more tolerant to variations in temperature and precipitation [18,49].
The temporal mismatch between the species distribution data (1997-2013), the cases data (1978-2013), and climate data  did not significantly affect the modelling results because during the last 50 years, there have been no important environmental changes in peridomiciles in endemic areas of CL in ES. Furthermore, the geographic expansion of the disease occurred from the migration of rural populations to the outskirts of the cities in the 1980s, carrying infected dogs from the western area to east of ES, where vector species were already adapted to the peridomicile [12,14].
A low sandfly frequency in some areas of the state did not influence the species environmental modelling because this method can provide a measure of potential species occurrence in areas not covered by biological surveys and consequently, has become an interesting tool for healthy planning [27].
Maxent presented a strong performance compared with different methods because it performed well and remained stable with respect to the prediction accuracy and the total area predicted, indicating that Maxent can compensate for small species occurrence data sets [59].
The results of the current study indicate that modelling and geoprocessing tools enable a reliable analysis of the association between geo-climatic variables, geographic distribution of vectors, and CL cases in the state of ES. The definition of areas at potential risk for CL transmission allows us to make available relatively accurate information at the regional level that can guide entomological and epidemiological surveillance activities to control the geographic expansion of this endemic disease in vulnerable locations.
Supporting Information S1