Biogeographic origin and phylogenetic relationships of Mepraia (Hemiptera, Reduviidae) on islands of northern Chile

Chagas disease is one of the main zoonoses mediated by vectors in America. The etiological agent is the protozoan Trypanosoma cruzi, transmitted mainly by hematophagous insects of the subfamily Triatominae. Mepraia species are triatomines endemic to Chile that play an important role in T. cruzi transmission in the wild cycle and are potential vectors for humans. In addition to the continental distribution, populations of Mepraia genus have been reported inhabiting islands of northern Chile. The presence of individuals of Mepraia in insular areas might be explained through passive dispersion by marine birds or by vicariance of an ancestral widespread population. To clarify the biogeographic origin and phylogenetic relationships of island individuals of Mepraia, mitochondrial COI and cyt b genes were sequenced in individuals from island and continental areas. Gene sequences were used to estimate phylogenetic relationships, divergence dates and migration rates between insular and continental populations. The dates of divergence estimates are congruent with sea level and tectonic changes that originated the islands during Pleistocene. Migration rates suggest symmetric historical island-continent gene flow. We suggest that the origin of island triatomines can be explained by both vicariance and dispersion. Phylogenetic relationships show that individuals from Santa María Island and the continent clustered in a clade different from those previously reported, indicating a new lineage of Mepraia genus. This study will contribute to understand the origin of the T. cruzi infection in coastal islands of northern Chile.


Introduction
Phylogeographic and biogeographic historical processes of hosts, vectors and pathogens are crucial to understand the current epidemiological status of zoonotic diseases and provide critical information for control management [1,2]. Chagas disease is one of the main zoonotic diseases mediated by vectors in America. The etiologic agent is the protozoan Trypanosoma cruzi, (Inca N = 9) and San Felipe (SF N = 4, Table 1). Five additional localities were explored (Chañaral Island, Damas Island, Choros Island, Gaviota Island and Punta Choros), but triatomines were not detected (Table 1 and Fig 1). Insects were passively collected by trained people as described in Campos-Soto et al [20] and actively by lifting stones in rock piles and nests. Samples of Triatoma eratyrusiformis and Triatoma breyeri were used as outgroup for phylogenetic analyses.

Ethics statement
Fields research were authorized by the Corporacion Nacional Forestal (CONAF) from Atacama Region, Chile (permit N˚049/2017) and CONAF from Coquimbo Region (permit N˚22/2019) Chile. The research project that includes this study was approved by the Bioethic Committee of the Pontificia Universidad Católica de Valparaíso (permit # BIOEPUCV-A98b-2017).

DNA extraction, mitochondrial amplification and sequencing
DNA was extracted from legs of bugs using the DNeasy1 Blood & Tissue kit. A gene segment of 636 base pairs (bp) for cytochrome oxidase subunit I (COI) and a segment of 682 bp for cytochrome b (cyt b) were amplified by the Polymerase Chain Reaction (PCR) using the polymerase SapphireAmp1 fast PCR Master Mix. We used the primers described in Folmer et al. (1994) [22] for the COI gene, and the primers described in Monteiro et al. (2003) [23] for the cyt b gene. The same conditions were used to amplify the cyt b and COI genes: initial denaturation at 94˚C for 3 min, 30 cycles of 1 min at 94˚C for denaturation, 45˚C for 1 min of annealing, and 72˚C for 1 min of extension, followed by a final extension of 10 min. Amplification was verified by 1% agarose gel electrophoresis. Sequencing of both strands was performed by Macrogen Inc. (South Korea) using the same PCR primers. Sequences were edited to obtain consensus sequences using Bioedit 7.0.4.1 [24] and aligned using Clustal W [25] as implemented in Bioedit. After alignment, sites that showed nucleotide substitutions were re-examined by visual inspection of each individual's raw chromatogram (see S1 Data). COI and cyt b gene sequences were deposited in GenBank with accession numbers MN117859-MN117888 (S1 Data). Additional GenBank sequences of both COI and cyt b genes were also included (accession numbers available in S1 Data [7]). Outgroup samples were sequenced for the COI gene, while cyt b gene sequences were extracted from GenBank. Sequences of both genes of  each individual were manually concatenated (S1 Data), and the haplotype data file was obtained using DnaSP6 [26]. The best-fitting model of nucleotide substitution (HKY + G for COI and TN93 for cyt b) was selected with the Bayesian information criterion (BIC) using Smart Model Selection [27] in the ATGC bioinformatics platform [28].

Estimates of Time to the Most Recent Common Ancestor (TMRCA)
Haplotype sequences were partitioned by gene and codon position and linked using BEAUti, implemented in BEAST v.2.5.2 [29]. TMRCA were estimated through the Bayesian Markov Chain Monte Carlo (BMCMC) method available in BEAST v.2.5.2 [29]. The coalescent Bayesian skyline was used as tree prior, and run using both strict and uncorrelated lognormal relaxed molecular clocks. The best molecular clock was estimated using model comparison by AICM [30] available in Tracer v.1.6 [31]. The resulting best estimate was the strict molecular clock. We assumed a fixed mean substitution rate of 0.016 sub/site/my estimated for COI gene in hemipterans [32], and 0.023 sub/site/my for cyt b gene in Triatominae [23] Two separate MCMC runs of 5 × 10 8 generations, sampling every 50000 generations, were run using the BEAST program. Mixing and convergence on the basis of the effective sample size (ESS) values of each run were verified in Tracer v.1.6 and combined in LogCombiner; only ESSs >200 were accepted. We discarded 10% burn-in, and the statistical uncertainty was depicted in values of the 95% Highest Probability Density (HPD) with mean node heights selected in TreeAnnotator. A consensus tree was calculated with TreeAnnotator and visualized in FigTree v1.4.3. The Median Joining method [33] implemented in PopART software [34] was used to visualize haplotype distribution of island and continent samples.

Migration rate estimates analysis
Migration rates were estimated by Migrate-n v.3.6.11 software [35]. The software estimates migration rates and effective population sizes between two populations using genetic data, under either maximum likelihood or Bayesian frameworks [36]. These analyses were done with a concatenated matrix using all the individual sequences from each population (Table 1). We ran two independent analyses, first comparing SM-I and SM-C, and then PA-I and PA-C populations. Analyses were run under a full model using Bayesian inference strategy. The parameter M ij defines the proportion of genes of population j that comes from population i per generation and with size Ne. M i is calculated as the immigration rate per generation (m i ) divided by mutation rate per site per generation (μ), consequently M i = m/μ. The parameter θ i is defined as θ i = xNeμ where x is the inheritance parameter equivalent to 1 for mtDNA data [36]. Effective number of migrants (Nm) was estimated as Nm = M ij θ i . M ij and θ i parameters were based on F ST calculations. Four Metropolis-Coupled Chains (5 million visited genealogies, 1,000 recorded steps) were run, with a burn-in of 10,000 iterations. We used an adaptive heating scheme with 4 chains with temperatures set by default (swapping interval = 1) to increase the efficiency of the MCMC search. A heatmap was constructed with the migration rates estimated from Migrate-n using R package ggplot2 [37]. Genetic differences between populations were estimated with F ST (pairwise difference) and its significance with a test of 1000 permutations using Arlequin v.3.5.2.2 [38].

Phylogeographic relationship and estimates of TMRCA
The sample sizes collected for each locality and their geographic locations are detailed in Table 1 and  2). One haplotype is shared between PA-I and PA-C localities (Fig 3).

Migration rate estimates
The migrate-n analyses showed good full model performance. The posterior distribution of all parameters resulted in convergence and no warning was recorded during the run. The analyses

Biogeographic origin of island triatomines of northern Chile
Processes that determine the origin of populations occurring in insular areas are traditionally explained by dispersal, vicariance, and/or a combination of both. Particularly for zoonotic diseases, understanding the origin and maintenance of the pathogen in its wild cycle has acquired more relevance in the last decades [2,39]. Our study shows that both dispersal and vicariance hypotheses are not exclusive and have played major roles in the biogeographic origin of individuals of Mepraia in islands of northern Chile. Sagua et al. [10] suggested that the occurrence of individuals of M. parapatrica in Pan de Azúcar Island could be explained through passive transport by marine birds. Mepraia species have opportunistic habits, being able to feed on birds both in the continent [40,41] and in the island [10]. Triatomines hidden in cracks may be attracted to birds and make their way between the feathers to feed; they may attach to the feathers and the stylet would pricks the skin. If the stylet remains inserted during feeding, some triatomines would stay attached to their prey. This behavior has been observed under laboratory conditions, when individuals (of the three species of Mepraia genus) were feeding on rodents (Campos-Soto, observation).
Pan de Azúcar and Santa María Islands are separated from the continent by about 1.5 km, a distance that marine birds are able to reach without major difficulty. Birds have been suggested to be passive carriers of triatomines and are one of the dispersal mechanisms of triatomines over large areas [42,43]. Alternative dispersal mechanisms of triatomines include people (park rangers and poachers), who sail to the islands carrying insects or eggs in their clothes or backpacks. Vectors and pathogens have been historically transported due to human migration, such as T. infestans and other triatomines [42][43][44]. Therefore, if T. cruzi has infected the transported kissing bug, the likelihood of spreading the pathogen to new geographic areas exists.
Parapatric speciation was proposed as the main mechanism for the appearance of M. parapatrica, and its occurrence on Pan de Azúcar Island by vicariance [6]. The latter suggests that insular and continental populations were connected and formerly part of an ancestral population that inhabited a wider area. Our molecular analyses showed that the estimations of divergence dates between populations from Pan de Azúcar Island and Pan de Azúcar continent are not recent (0.249 Mya 95% HPD = 0.139-0.355, Fig 2). However, a frequent shared haplotype (H20, see Fig 3), low genetic difference (F ST = 0.091 P<0.016) and high migration rates (Table 2, Fig 4) suggest that historical dispersal has been a major driver for the occurrence of M. parapatrica on this island. Despite finding a shared haplotype, our results showed that there are also several derived haplotypes found only in Pan de Azúcar island (Fig 3). This result suggests that both vicariance and dispersal processes have played a major role in the occurrence of triatomines on Pan de Azúcar Island.
We found more recent divergence (0.09 Mya 95% HPD = 0.032-0.162, Fig 2) between Santa María Island and Santa María continent populations compared to that of Pan de Azúcar Island and the continent. Also, absence of shared haplotypes (Fig 3), high genetic differences (F ST = 0.904 P<0.001) and low migration rates (Table 2, Fig 4) were found between populations of Santa María Island and Santa María continent, suggesting a weak role of dispersal and a probable major role of vicariance.
Geological data shows that the continuous tectonic and sea level changes in marine terrace formation during the Pleistocene-Holocene [12][13][14][15][16] formed both islands, resulting in allopatric splitting of terrestrial populations. Therefore, if enough time has passed to prevent the flow of kissing bug migrants and passive dispersal was not strong, divergent haplotypes between islands and continental populations are expected, and estimations of old divergence dates. According to the International Chronostratigraphic Chart [45], our estimated dates of divergence between Pan de Azúcar Island and Pan de Azúcar continent (0.249 Mya 95% HPD = 0.139-0.355, Fig 2) and between Santa María Island and Santa María continent (0.09 Mya 95% HPD = 0.032-0.162, Fig 2) are in the middle-upper Pleistocene. These ages are congruent with estimated dates (0.03 to 0.7 Mya) of tectonic and sea level changes in marine terraces that impacted the coast in this zone [12][13][14][15][16]. A previous study with scorpions suggested that tectonic changes shaped their current population structure, and that coastal desert species adapted to habitats that have had continuous sea level changes [46].
Intensive sampling in other islands of northern Chile (Damas, Choros, Gaviota and Chañaral) did not result in capture of individuals of Mepraia (Table 1, Fig 1). These islands are within the geographic distribution of M. spinolai [7,8]. M. spinolai has colonized only some coastal areas (28˚S to 30˚44´S) different from M. gajardoi and M. parapatrica, in which the colonization route has been reported mainly from north to south along the coast [6]. We hypothesize that the current absence of individuals of Mepraia in those islands may be explained by the absence of this kissing bug when the islands were formed or that ancient allopatric populations were extinguished. Later, the low occurrence of kissing bug populations in continental coastal areas close to those islands (e.g. Punta Choros, Table 1), coupled with the distance from the continent, might prevent their colonization by passive (e.g. bird) transport. Other factors not yet evaluated may also be influencing the absence of kissing bugs in those islands such as food (host) availability.

Biogeography and phylogeny of Mepraia species
The origin and dispersal routes of Mepraia genus are still under discussion. Mepraia species and Triatoma eratyrusiformis would have originated from an ancestral population that diverged due to the uplift of the Andes Range [6,9,47,48]. When Frías [6] described M. parapatrica, he proposed an origin and latitudinal dispersion of the genus from north to south, suggesting a model of parapatric speciation. M. gajardoi would be the northernmost and most ancestral species. followed by M. parapatrica and M. spinolai as the most recent species. The first molecular systematic study in Mepraia genus found that with nuclear markers, the individuals that inhabit in M. parapatrica areas clustered with the M. spinolai clade, while with a mitochondrial marker were included in M. gajardoi clade. This incongruence was explained as a result of mitochondrial introgression due to past hybridization [49], suggesting a possible hybrid origin of M. parapatrica. A posterior study using two mitochondrial genes found three supported clades congruent with the described species and estimated that M. spinolai is the most ancestral species of the genus. Consequently, the dispersion and subsequent population differentiation and speciation would have occurred from south to north [7]. Populations from Pan de Azúcar Island and continent were included with significant support in the M. parapatrica clade (green branch in Fig 2). We expected this result because these populations occur in the range where M. parapatrica occurs [6,7,8,50]. The topology of our phylogenetic tree slightly differs from that previously reported by [7], which can be explained due to the nature of our current analyses that involves adding new samples, new criterion for the phylogeographic inferences, and the uncertainty associated with low-support of bootstrap values of some clades reported in the previous phylogenetic tree of   [7]. Although the Santa María Island and Santa María continental populations are located between the distributions of M. parapatrica and M. gajardoi, our results showed that samples from these populations were grouped with high support in a clade different from all previously reported species (yellow branch in Fig 2). However, the phylogenetic relationship of this new lineage remains unclear due to the low support (PP = 0.46, Fig 2) in the phylogenetic tree. Additional molecular markers coupled with morphological information may help to solve this uncertainty.

Insights to understand T. cruzi infection in coastal islands of Chile
The T. cruzi life cycle alternates between triatomines and mammalian host species, while birds and reptiles are refractory to infection [51,52]. It has been reported that individuals of M. parapatriica feed mainly on seabirds and reptiles in Pan de Azúcar Island [10]; however, hosts have not been studied in Santa María Island due to the recent description of the occurrence of kissing bugs [11]. Strikingly, triatomines infected with T. cruzi were reported both in Pan de Azúcar and Santa María Islands, raising the question of how the T. cruzi life cycle is completed with the high abundance of hosts refractory to infection. We speculate that infected triatomines were likely present in the islands since they were formed and historical events of migration carrying infected triatomines from the continent to the islands may have contributed to the current detection of infected kissing bugs. Maintenance of the T. cruzi life cycle requires mammal hosts, therefore we do not rule out the presence of small mammals inhabiting these islands. Additional studies including captures and surveys of mammals as hosts of T. cruzi will help to elucidate this pattern.

Conclusions
In this study we show that the biogeographic processes of vicariance and dispersal may have contributed to the origin of island triatomines in the north of Chile. Also, using phylogenetic inference we identified a new lineage of Mepraia that includes haplotypes occurring both in the northern island populations of Chile and the near continental areas. The identification of biogeographic processes leading to the occurrence of kissing bugs on islands and the new lineage of Mepraia genus will provide new insights to understand the T. cruzi life cycle in insular areas of northern Chile.