Potential distribution of mosquito vector species in a primary malaria endemic region of Colombia

Rapid transformation of natural ecosystems changes ecological conditions for important human disease vector species; therefore, an essential task is to identify and understand the variables that shape distributions of these species to optimize efforts toward control and mitigation. Ecological niche modeling was used to estimate the potential distribution and to assess hypotheses of niche similarity among the three main malaria vector species in northern Colombia: Anopheles nuneztovari, An. albimanus, and An. darlingi. Georeferenced point collection data and remotely sensed, fine-resolution satellite imagery were integrated across the Urabá –Bajo Cauca–Alto Sinú malaria endemic area using a maximum entropy algorithm. Results showed that An. nuneztovari has the widest geographic distribution, occupying almost the entire study region; this niche breadth is probably related to the ability of this species to colonize both, natural and disturbed environments. The model for An. darlingi showed that most suitable localities for this species in Bajo Cauca were along the Cauca and Nechí river. The riparian ecosystems in this region and the potential for rapid adaptation by this species to novel environments, may favor the establishment of populations of this species. Apparently, the three main Colombian Anopheles vector species in this endemic area do not occupy environments either with high seasonality, or with low seasonality and high NDVI values. Estimated overlap in geographic space between An. nuneztovari and An. albimanus indicated broad spatial and environmental similarity between these species. An. nuneztovari has a broader niche and potential distribution. Dispersal ability of these species and their ability to occupy diverse environmental situations may facilitate sympatry across many environmental and geographic contexts. These model results may be useful for the design and implementation of malaria species-specific vector control interventions optimized for this important malaria region.


Occurrence data
The endemic malaria region Urabá Bajo-Cauca Alto-Sinú (UCS) is located at northwest Colombian (Fig 1), has an estimated population at-risk of malaria equivalent to 2,500,000 people, distributed in the departments of Antioquia and Córdoba [17]. The predominant economic activities are mining, the banana agro-industry, cattle production and the timber industry; some of these activities have been linked to malaria transmission [3]. Twenty municipalities of UCS were sampled (Fig 1) between December 2012 and March 2015, using human landing-catches, under an informed consent agreement and collection protocol reviewed and approved by the Universidad de Antioquia Institutional Review Board (Bioethics Committee, Facultad Nacional de Salud Pública-Universidad de Antioquia, Acta 063). In light of the risk that these vectors represent for the transmission of the disease to humans, collections were performed in or near human residences. Two localities were selected per municipality (Table 1), based on criteria such as high numbers of reported malaria cases in 2012 and accessibility. Collections were conducted during three nights at each locality, by a team of four people, two persons sampling one site each night, for a total of six sites by locality, from 18:00 to 24:00 h, both indoors and outdoors. Adult mosquitoes were identified using a morphological key [18]; species assignation for a percentage of the specimens and all those presenting problems in morphological identification were confirmed by PCR-RFLP-ITS2 [19][20][21] and COI barcoding [22][23][24].

Environmental variables
To characterize environmental variation across the study region, the Normalized Difference Vegetation Index (NDVI) was used. NDVI is a measure of photosynthetic activity; its variation reflects the spatial and temporal dynamics of vegetation that indirectly influence mosquito reproduction and development [25]. NDVI data were obtained in the form of imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS) Terra satellite, at 250 m spatial resolution and 16-day temporal resolution (https://reverb.echo.nasa.gov/reverb/). In all, 69 images from 2012 to 2014 were used. Images were reprojected to the MAGNA-SIRGAS coordinate system using the MODIS Reprojection Tool [26]. To reduce inter-correlation among data layers, a principal components analysis (PCA) was performed using all 69 NDVI images as variables [27]. Climatic information was not included, owing to the coarse spatial resolution of such data sets, which would not permit fine-grained predictions. Regardless, rainfall is indeed reflected in the NDVI data via vegetational responses and consequently increased or decreased photosynthetic mass [28].

Ecological niche modeling
To evaluate potential distributions of the vector species, ENMs were developed using a maximum entropy approach implemented in Maxent, version 3.3.3k [29]. Maxent assesses suitability for species by integrating occurrence records with relevant environmental predictors across a defined geographic space [29]. Maxent attempts to estimate a target probability distribution by finding the probability distribution of maximum entropy (i.e., that which is most spread out, or closest to uniform), subject to a set of constraints that represent the incomplete nature of information about the target distribution [30]. A sampling bias layer was designed to improve niche and range estimates with Maxent and to point process models by integrating spatially explicit information [30], with our sampling locations defined as 1; the remaining pixels that were not sampled had "no data" in the grid. Given the broad potential distribution previously reported for An. nuneztovari, An. albimanus, and An. darlingi in South America [15,31,32] and the known distribution of the species across the country [3,7,18,21], a very broad hypothesis of accessible area (M) was used, considering the entire endemic region as accessible [33].
To assess how robust and predictive the models were, occurrence data for each species were split into training (50%) and testing (50%) subsets; this random splitting was done five times. No clamping or extrapolation was permitted in MaxEnt; remaining parameters were left as default. Logistic output formats were used. ENMs were calibrated for each species, with 10 bootstrapped replicates; the median across replicates was used as a basis for further analysis. All maps were converted to binary via a conservative least presence thresholding approach, consisting of the lowest predicted value corresponding to any occurrence record of the species in the calibration data set [34,35].
The partial receiver operating characteristic (ROC) approach was used to assess models performance [36]; the evaluation dataset was bootstrapped, and probabilities obtained by direct count of area under the curve (AUC) ratios falling 1 via a Visual Basic script (N. Barve, pers. comm.; https://kuscholarworks.ku.edu), with 100 iterations. Model performance was tested for different combinations of principal components (the first 5, 10, 15,20,25,30,35,40,45,50, and 55 components). The omission rate and partial ROC AUC ratios were used as criteria to select optimal environmental data sets for each species. Final models were produced for each species using all available data [37]. We inspected the loading values of each raw variable (16-day composite NDVI) on each of the first two principal components and how they related to monthly rainfall averages in the study area to have a better understanding of how vegetation dynamics reflected in NDVI related to suitability for each species. We also compared available (i.e., within M) and occupied environments in terms of NDVI values across the endemic area. The occupied niche was represented in two ways by: (1) considering NDVI values over the study period for each occurrence coordinate, and (2) taking only NDVI values matching the date of the mosquito's capture.

Niche overlap
To evaluate a hypothesis of niche similarity or difference among the three species within this endemic area, we used a background similarity test implemented in ENMTools v. 1.4.3 [38]. The test generates a similarity measure by overlaying predictions of the two species compared [38]; niche overlap values were calculated using the Schoener's D metric, with values ranging from 0 (no overlap) to 1 (complete overlap). The background similarity test uses the original predictions for each species and compares each of them with overlaps generated from models based on random occurrences selected from across the accesible area for each species. Observed similarity was compared with the distribution of similarities between the focal species and the suite of null models. Probabilities were determined by direct count, and two comparisons were made for each pair of species, with each species serving in turn as the focal species. In all, 100 pseudoreplicates were generated for each pair [38,39]. Finally NicheA was used to visualize overall overlap based on minimum volume ellipsoids for the species in three PCA dimensions [40]; the Jaccard index (I J ) was used as a numerical estimation of environmental overlap among species [41]. random expectations (all p<0.01; S1 Table). According to omission rate and ROC partial values, optimal principal component combinations were, An. nuneztovari first 25 PCAs, An. darlingi first 35 PCAs, and An. albimanus first 25 PCAs.
Potential distributions estimated for the three species indicate that the area with highest environmental suitability for An. darlingi was Bajo Cauca (39% suitable ; Fig 2A). In contrast, the broadest potential distribution for An. nuneztovari was in parts of Bajo Cauca and Alto Sinú (48.3% suitable; Fig 2B). The most suitable areas for An. albimanus were in coastal areas of Urabá and Alto Sinú and some areas of Bajo Cauca (31% suitable; Fig 2C).
The relationship between the original variables (NDVI values) and the principal components showed that the first principal component (PC1) was positively associated with NDVI for all dates throughout the three years, while PC2 exhibited positive and negative associations with NDVI in each of the three years. Both PC1 and PC2 exhibited temporal trends associated with NDVI variation, reflecting seasonality and corresponding leaf phenology, partially in response to fluctuation of rain (Fig 3), which is evident in two areas that are unoccupied and that present contrasting seasonality, was well as overall levels of NDVI (Fig 4). Areas with highest PC1 values had high NDVI values and relatively little seasonal fluctuation, whereas areas with the highest PC2 values had clear fluctuations in both years. PC1 and PC2 explained 39.7% and 11.0% of the total variance, respectively. Anopheles albimanus and An. darlingi occupied areas with lower NDVI values throughout the year (Fig 5) as compered than An. nuneztovari. All species were apparently absent from areas with high NDVI values throughout the year (evergreen forests) and areas with very pronounced NDVI fluctuations.

Fig 3. Loading plot from a principal component analysis using NDVI values for the Urabá-Bajo Cauca
Assessment of both geographic and environmental dimensions showed high overlap among the three species (D > 0.35). Nonetheless, observed overlap between An. nuneztovari and An. albimanus was significantly lower than expected from a null distribution (p = 0.032 in the background similarity tests, S1 Fig). NicheA visualization revealed broad overlap among the niches of the three species, and showed that niche breadth was greater for An. nuneztovari than for the other two species (Fig 6). The Jaccard index indicated that environmental overlap between An. nuneztovari and An. darlingi was 0.63, that for An. nuneztovari versus An. albimanus was  0.36, and that for An. albimanus versus An. darlingi was 0.13. These results are consistent with Schoener's D metric.

Discussion
The three main Colombian Anopheles vector species, An. nuneztovari, An. albimanus, and An. darlingi, occur in sympatry in the Urabá-Bajo Cauca and Alto Sinú region [3,7,42]. Using remotely sensed environmental data with high spatial and temporal resolution, we were able to identify small but meaningful differences in environments occupied by these three species. These details can improve the utility of ENMs with the aim of improving the knowledge of the biology of these important species, and guiding integrated vector management efforts [43]. Deforestation and changes in land use are occurring at a higher speed than changes in climate; as a result, use of environmental variables indicative of changes in vegetation cover will be a useful input in the prediction and prevention of vector-borne disease transmission.
These results agree with previous efforts to map Anopheles species' distributions in the Americas, in which models indicated their presence across our study region [12,15,31,32]. At difference from those studies, we used environmental layers with higher spatial and temporal resolution and thus greater information content at finer scales, allowing documentation of more specific and detailed distributional patterns. The three main Colombian vectors An. darlingi, An. nuneztovari and An. albimanus have an important role in malaria transmission in the UCS region [3,6,7]. Anopheles nuneztovari and An. darlingi, have been reported naturally infected by Plasmodium spp. in UCS [8], and reports have indicated that these two species are able to maintain transmission even at low abundances [4]. While An. albimanus has a predominant coastal distribution and in the Colombian Pacific Coast was detected infected with Plasmodium vivax and Plasmodium falciparum [6]. Our final results showed that An. nuneztovari has the widest geographic distribution, occupying almost the entire study region; this breadth is probably related to the ability of this species to colonize both natural and disturbed environments [3,44]. Anopheles nuneztovari is the most frequently detected species in some localities of the Bajo Cauca and Alto Sinú region [3,7]. Anopheles darlingi is also an important malaria vector in other regions of Colombia [45,46], it has been associated with breeding sites in riverine or gallery forest [47-50]. Our model for An. darlingi showed that most suitable localities for this species are in Bajo Cauca; specifically, along the Cauca and Nechí rivers, that conform an important aquifer system in this region [51]. The riparian ecosystems in this subregion and the potential for rapid adaptation by this species to novel environments [49,52,53] may favor the establishment of populations of this species. Particularly, An. darlingi is associated with riparian ecosystems in the upper Orinoco River in southern Venezuela, where overflow of the river creates lagoons that constitute suitable larval habitats [54]. Therefore, the results of the present study suggest the importance of authorities enforcing regulations on deforestation for the Cauca and Nechí rivers basins, where illegal activities such as gold mining may be increasing human contact exposition to An. darlingi, as has been previously suggested [55].
Finally, in Colombia, An. albimanus is a species with a mostly coastal distribution [7,56,57]. The model identified suitable areas for An. albimanus in Urabá and Bajo Cauca. Absence of this species from some areas modeled as suitable may relate to factors such as competitive exclusion, existing vector control measures, or rapid changes in land use as a result of agricultural activities and mining, as previously suggested for this species in Mesoamerica and the Caribbean region [32].
Previous studies have shown that distributions of mosquito species are partly related to land use factors such as presence or absence of wetlands, type of surrounding vegetation, and agricultural practices [58]. Various studies on species modeling are using NDVI instead of land use factors, given that this index is a measure of photosynthetic activity; thus, allows an indirect approximation to the suitable conditions for mosquito development [25]. Our result concerning the relationship between precipitation and NDVI patterns in different time periods showed that PC1 was more indicative of overall NDVI values (ie; forest versus non-forest), whereas PC2 was more indicative of the magnitude of seasonality in NDVI. Apparently, the three main Colombian Anopheles vector species in Urabá-Bajo Cauca and Alto Sinú do not occupy environments either with high seasonality, or with low seasonality and high NDVI. Because NDVI has been used to estimate vegetation characteristics [59], the relation between NDVI and species' occurrence suggests similar affinities of the three species for particular vegetation types; An. albimanus tended to occupy places with relatively low NDVI values, representing ecosystems with bare soils and low forest cover [60]. Previous reports indicate that An. albimanus is associated with larval habitats exposed to sun [48], which as in the present study, are related to scarce forest cover, and also its larvae can tolerate some salinity [61]. Places with low seasonality and high NDVI values probably represent residual forests unoccupied by people.
The background similarity tests and NicheA analysis showed high overlaps in the geographic and environmental conditions occupied by An. darlingi, An. nuneztovari and An. albimanus. The dispersal capacity of these species and their ability to colonize different ecosystems are well documented [3,53,62,63]. These aspects may facilitate their sympatry across areas presenting diverse environmental conditions.
The models developed in this study have important applications, since they could be projected or replicated for different eco-epidemiological malaria zones of Colombia, as has been done elsewhere in the world [64]. In other applications of these approaches, models of the potential geographic distribution of Ebola and Marburg viruses in Africa [65] were able to anticipate the potential for Marburg outbreaks in Angola [66]. Variation in NDVI in relation to the potential distribution of the vector species may be reflected in patterns of malaria transmission in Urabá-Bajo Cauca and Alto Sinú region. Historically and until recently, this region reported the highest numbers of malaria cases in Colombia [ 3]. The main vectors An. nuneztovari and An. darlingi have been detected naturally infected by Plasmodium spp. in UCS [7], and recent reports indicate that these species are the most important malaria vectors in some localities of the region [3]. Vector distribution and malaria transmission might be highly dynamic because of antrophic or natural causes; thus a good characterization of environmental variation, through the use of NDVI at high spatio-temporal resolution [58,67,68] to predict mosquito spreading, will provide baseline information that can be used to reduce malaria risk in this region.
Supporting information S1

Author Contributions
Conceptualization: MA MMC.