Macrogeographic genetic structure of Lutzomyia longipalpis complex populations using Next Generation Sequencing

Lutzomyia longipalpis is the main vector of Leishmania infantum, the causative agent of visceral leishmaniasis in the Neotropical realm. Its taxonomic status has been widely discussed once it encompasses a complex of species. The knowledge about the genetic structure of insect vector populations helps the elucidation of components and interactions of the disease ecoepidemiology. Thus, the objective of this study was to genotypically analyze populations of the Lu. longipalpis complex from a macrogeographic perspective using Next Generation Sequencing. Polymorphism analysis of three molecular markers was used to access the levels of population genetic structure among nine different populations of sand flies. Illumina Amplicon Sequencing Protocol® was used to identify possible polymorphic sites. The library was sequenced on paired-end Illumina MiSeq platform. Significant macrogeographical population differentiation was observed among Lu. longipalpis populations via PCA and DAPC analyses. Our results revealed that populations of Lu. longipalpis from the nine municipalities were grouped into three clusters. In addition, it was observed that the levels of Lu. longipalpis population structure could be associated with distance isolation. This new sequencing method allowed us to study different molecular markers after a single sequencing run, and to evaluate population and inter-species differences on a macrogeographic scale.


Introduction
The history of visceral leishmaniasis (VL) in the Americas is closely related to the Lutzomyia longipalpis species, which was described by Lutz and Neiva [1] and identified as a Leishmania infantum vector based on consistent evidence from various studies on sand fly vector competence [2,3,4,5,6]. PLOS

Polymerase Chain Reaction-PCR
Illumina Amplicon Sequencing Protocol was used to amplify possible polymorphic sites present on the Lu. longipalpis from each locality, totaling 45 specimens. PCR was performed using three pairs of oligonucleotides to which we added the sequences 5' TCGTCGGCAGCGTCAG  ATGTGTATAAGAGACAG before the forward oligonucleotide, and 5' GTCTCGTGGGCTCGG AGATGTGTATAAGAGACAG before reverse oligonucleotide ( Table 2). These sequences are required for the hybridization of each amplicon in the Illumina flowcell. Sand fly mitochondrial 12S rDNA amplification and sequencing. PCR amplification of Lutzomyia sp. 12S rDNA mitochondrial region was performed with the primers T1B and T2A, according to Beati et al. [24].
Sand fly nuclear DNA amplification and sequencing. To analyze the polymorphisms in nuclear DNA, we used: Llcac and 5LIcac for the region IV6S cacophony [25] and 5L1per1 and 3L1per1 for the period region [18].

DNA sequencing
Five specimens from nine different municipalities were analyzed (5x9 = 45 specimens). Each specimen was amplified to the period gene, IVS6 from cacophony gene and 12SrDNA (45x3 = 135 PCR amplicon). The product of the 3 amplified from each individual was pooled in a single tube, totaling 45 pools. These samples containing the amplicons were purified with AMPure XP beads at 1.80x total volume. An index pair (P5 and P7) Nextera 1 Index Primers (Illumina, San Diego USA) was added for each sample through a PCR with limited cycles. The conditions for Nextera 1 PCR indexing were: initial denaturation at 95˚C for 3 minutes, followed by 12 cycles of denaturation at 95˚C for 30 seconds, annealing at 55˚C for 30 seconds and extension at 72˚C for 30 seconds, with a final extension at 72˚C for 5 minutes. A new purification with AMPure XP beads at 1.80X was performed. Then, the amplicons were quantified using Qubit 2.0 fluorometer following the manufacturer's recommendations for Qubit dsDNA HS (High Sensitivity, Invitrogen) kit.
Samples with distinct multiplexing indices were combined in equimolar ratios to compose a final library for sequencing. The library quantification was made with KAPA library quantification kit 1 in a qPCR reaction. The reaction was carried out in a thermal cycler as follows: initial denaturation at 95˚C for 5 minutes followed by 35 cycles of denaturation at 95˚C for 30 seconds and annealing/extension at 60˚C for 45 seconds.
The samples were pooled, normalized and denatured, and finally loaded on the Illumina reagent cartridge. One library was paired-end sequenced in 150-cycles in a Miseq Illumina 1 (Instituto de Biociências de Botucatu, Universidade Estadual Paulista "Júlio de Mesquita Filho").

Statistical and structural analyses
Briefly, all sequence reads were quality filtered using the default parameters. Then, each individual's sequence reads were aligned to the Lu. longipalpis reference genome using Bowtie with default parameters [26].
Genetic differentiation among subpopulations was estimated by the Wright fixation index (F ST pairwise) [27], using Genepop software version 4.2.
The two matrices (AB) of genetic diversity mean (FST pairwise) and geographic distance (Km) were tested for a linear correlation using the Mantel test [28,29,30,31,32]. P values were calculated using the correlation coefficient r (AB), estimated for 9,999 permutations. The Mantel test was performed using the Ade4 package of software R v.3.5.1. Geographic distances between locations were determined using the Google Earth Pro program.
In order to infer the number of subpopulations that best explains the analyzed dataset, we performed two approaches. These tools focus on the minimization of sources within the variation of the group. Therefore, we used Principal Component Analysis (PCA) and Discriminant Analysis of Principal Component (DAPC).
PCA is a statistical method used to simplify a multivariable dataset, with minimal loss of information. This technique uses Euclidean distance as a measure of dissimilarity and can be used to emphasize variation and bring out strong patterns in a dataset [33].
DAPC uses a k-means clustering algorithm and a Bayesian Inference Criterion (BIC) to determine the number of population groups (K), optimizing variation between groups and minimizing variation within groups [34].
Adegenet package in R software was used to perform PCA and DAPC.

Results
Forty-five DNA samples extracted from sand flies specimens representing all Brazilian regions were PCR amplified for three gene fragments (period gene, IVS6 from cacophony gene and 12SrDNA). In total, 103 loci of nine populations were evaluated (BioProject ID: PRJNA555630). After processing the sequencing data, we observed that in 3 specimens (1 from São Borja, 1 from Fortaleza and 1 from Palmas) we obtained very few sequences for at least one of the amplified fragments (period gene, IVS6 from cacophony gene and 12SrDNA), for this reason we decided to exclude them for further analysis.
Estimates of FST pairwise were not significant. However, these values revealed that the mean of genetic differentiation among populations ranged from 0 (0.000) to 5% (0.0500) ( Table 3). Although the values do not indicate significant genetic differentiation, populations of Lu. longipalpis from Belém (PA) and Jequié (BA) were genetically closer, whereas populations from São Borja (RS) and Fortaleza (CE) were most genetically distinct.
Statistically significant correlations were detected between genetic differentiation and geographic distances based on the Mantel test [r(AB) = 0.9381417; p < 0.00001 e alpha = 0.05]. The results suggested that geographic distance significantly contributes to the genetic differentiation observed in the Lu. longipalpis populations.
We obtained similar results depending on the approaches used to estimate the clusters, considering that PCA and DAPC revealed three clusters (Figs 2 and 3).

Discussion
Studies about the genetic structure of natural populations of vector insects have many applications in evolutionary biology and conservation, an essential aspect to understand the ecoepidemiology of the diseases [35]. Due to the importance of Lu. longipalpis in the epidemiological chain of visceral leishmaniasis, the knowledge about the taxonomic status of this species has increased considerably in recent years, producing several studies in Latin American countries [36].
Over the past 30 years, more than 100 articles have been published using molecular markers to relate or separate sand flies populations, species, groups or genera [36]. Different methodologies have been used, isolated or integrated, such as biochemical, morphological and molecular studies [37, 38,39,40]. Some molecular approaches have been used in order to investigate these intra and interspecific variations: Restriction Fragment Length Polymorphism (RFLP), DNA hybridization, Random Amplification of Polymorphic DNA (RAPD-PCR), Single Strand Conformation Polymorphism (SSCP), microsatellites, sequencing of the mitochondrial and nuclear genome [17,36,37,41,42,43,44,45].
The increase of the molecular studies has encouraged the enhancement of DNA sequencing technologies. One of the most relevant approaches within ecological and population genetics was the development of high-throughput genotyping (Next Generation Sequencing-NGS). This technique allows the analysis of large-scale molecular markers, as well as improving inferences about the genetic variability of populations [46], kinship attributions [47] and understanding of historical demographic patterns and introgression events [48].
There is much interest in applying NGS for sequencing directed to specific genes and in large numbers of individuals [49]. Considering the decreasing sequencing costs and increased molecular markers studies, Golczer and Arrivillaga [36] suggested that researchers should use more than one molecular marker to understand genetic and evolutionary issues greater robustness of the result. Therefore, we investigated genetic diversity in nine Lu. longipalpis populations, using the sequencing with the Illumina MiSeq platform, in three markers, simultaneously.
It is noteworthy that our sequences generated by the Illumina MiSeq platform of each individual are the product of more than a thousand direct and reverse sequences per fragment.  Thus, possible sequence interpretation errors could be detected and filtered, since thousands of sequences are considered. It was different if we used Sanger's sequencing when only a single sequence is present. Shokralla et al. [50] compared the investment cost and total time spent between the Sanger and Illumina Miseq processes. They observed a 27% reduction in total time and 79% reduction in labor costs using NGS. This cost reduction enables the development of a larger number of projects with the same value in a single survey. Estimating genetic differentiation among populations (FST) is fundamental in genetic studies to understand population demographic history [51]. Our results demonstrate low FST pairwise values (Table 3) indicating a low level of genetic differentiation among Lu. longipalpis subpopulations, according to the classification proposed by Wright [52], Hart and Clark [53].
Genetic differentiation results from several evolutionary processes, such as gene flow, natural selection, and isolation by geographic distance. The last one is a vicarious process capable of causing speciation in different populations of the same species. This event occurs because geographic space leads to environmental variation [54,55,56] and, with increasing distance, gene flow attenuates and tends to cease [57,58].
We used the Mantel test in order to evaluate the hypothesis that the levels of population structure of Lu. longipalpis could be associated with distance isolation. This test showed a https://doi.org/10.1371/journal.pone.0223277.g003 significant (p<0.00001) positive correlation (r(AB) = 0.9381417) between the genetic distances (FST) and geographic (Km). This result shows that, although there was little genetic differentiation between the individuals, there is a strong isolation by distance between the nine analyzed populations that covered a distance variation of up to 3,334 km.
Our results confirm a study of genotyping based on microsatellites, which showed Brazilian populations of Lu. longipalpis genetically divergent, consistent with geographic distance [37]. Based on the mitochondrial gene ND4, Soto et al.
[31] also observed a strong correlation between the geographic distances in the genetic differentiation among the Honduran populations of Lu. longipalpis, suggesting that this fact is due to the limited capacity of locomotion of the sandflies and caused by the innumerable geographic and climatic barriers that could limit or even prevent gene flow among populations. This hypothesis may explain the distance isolation found in our results, considering that the specimens were collected at geographic distances higher than the study by Soto et al. [31]. Furthermore, Brazil is a country with continental dimensions and the specimens were collected in different biomes and, consequently, in different ecological conditions. Evaluating the isolation by geographic distance is fundamental. This phenomenon causes, through the time, a new characteristic in one of the populations like, for example, a new sound of courtship, causing the break of the gene flow [59]. This event can lead to important epidemiological and ecological consequences, such as changes in vector capacity and competence, resistance to insecticides and limit/prevent/retard the speciation process [60].
Lutzomyia longipalpis populations analyzed in this study were submitted to PCA to verify possible clusters by similarity of frequencies of the examined alleles. Jombart, Devillard and Balloux [61] argue that this method does not have some essential characteristics to investigate the genetic structure of biological populations because it does not provide a group assessment and does not require an a priori definition of clusters to study population structures. However, PCA has been widely used for genetic analysis as an alternative to Bayesian clustering algorithms. In addition, PCA (Fig 2) provided a clear separation between the different Lu. longipalpis populations.
Due to the possibility of bias introduced by the absence of a priori cluster determination using PCA, we still use another approach to confirm possible clustering. Observing DAPC (Fig 3A, 3B and 3C), Lu. longipalpis populations were also grouped into three distinct clusters (k = 3). This analysis confirmed the clusters demonstrated by the PCA, validating its results. The use of different approaches to analyzing clusters in population genetics is extremely important, providing less biased data evaluation. An advantage of DAPC in relation to other clustering approaches is the possibility of generating a graphical representation of the relationship between inferred clusters [61], as observed in Fig 3C. Brazilian populations of Lu. longipalpis present morphological variation and differ in the number of spots present in the abdominal tergites, a characteristic observed by Mangabeira [11], when he was studying specimens collected in the States of Ceará and Pará. The specimens from the state of Pará (PA) had a pair of pale spots on the IV abdominal tergite (phenotype called one spot-'1S'), while those from the state of Ceará (CE) had two pairs of spots (the phenotype two spot-'2S'), one in the IV tergite and one in the tergite III. Mangabeira also mentioned that the two forms were found in different ecological conditions, speculating that they could represent different species or a variety of them. Subsequently, intermediate phenotypes (a pair of pale spots with a minor point on tergite III) were observed, indicating an intraspecific polymorphism [38,39].
In the DAPC (Fig 3), the populations analyzed in this study formed clusters similar to the phenotypes (morphotypes 1S and 2S). In the red cluster, Lu. longipalpis populations were grouped morphotype 1S of the North and Northeast of the country: Palmas (TO), Belém (PA), Jequié (BA) and Governador Valadares (MG). In the green cluster, Lu. longipalpis populations were grouped of the Northeast morphotype 2S of Recife and Fortaleza. Finally, in the blue cluster, Lu. longipalpis specimens were grouped of the Midwest and South of the country-Campo Grande (MS), Miranda (MS) and São Borja (RS). Lutzomyia longipalpis populations of Miranda (MS) and São Borja (RS) present morphotype 1S, while the population of Campo Grande was composed by specimens morphotypes 1S and 2S.
Molecular studies presuppose that Lu. longipalpis populations 1S (one spot) and 2S (two spot) are in recent process of speciation (0.22 to 1.02 million years ago) [62,63]. This period is believed that was sufficient to generate morphological diversity and to create the new species, Lu. pseudolongipalpis [64], which is possibly occurring among the different Lu. longipalpis populations.
In addition to phenotypic differences, a behavioral aspect studied is the production of "lovesongs" by Lu. longipalpis males during mating [65,66,67]. The reproductive isolation observed among Brazilian Lu. longipalpis populations are caused by failures during intercourse between specimens that produce different types of copulatory lovesongs. This acoustic signal may play an important role in species recognition, acting as a reproductive barrier and reducing gene flow [40,65,66]. These lovesongs are controlled by genes such as period (per) and cacophony (cac). These genic regions are analyzed in the present study, since they play an important role in species recognition.
Beyond the phenotypic and behavioral differences found in the Brazilian populations of Lu. longipalpis, there are also differences considering the main component of sex pheromones. In Brazil, the populations of Lu. longipalpis produce four different chemotypes (9-methylgermacrene-B, 3-methyl-α-himachalene, cembrene-1 and cembrene-2) [39,68]. This fact is relevant since there is evidence that members of the same species that produce different sex pheromones are reproductively isolated [68]. In Fig 3, we can observe that the distribution of the clusters in our results coincides with the location of the different sex pheromone chemotypes of Lutzomyia longipalpis in Brazil presented in the Spiegel et al. [39] research. Lu. longipalpis populations of Recife (PE) and Fortaleza (CE) produce cembrene-1, whereas the populations of São Borja (RS), Campo Grande (MS) and Miranda (MS) produce 9-methylgermacrene-B. This fact could justify these clusters found in our results and be the factor responsible for the reproductive isolation of the species.
Studying complex of species, such as Lu. longipalpis, we used genetic markers directly involved in the speciation process. Therefore, we chose to amplify and analyze two nuclear regions: the IVS6 region of the cacophony gene and the period gene. The IVS6 region of the cacophony gene encodes the α-1 subunit of a voltage-dependent calcium channel. While the gene period is involved in the circadian rhythm and controls the species-specific differences in locomotor and mating activities [18,19,21].
The results obtained in studies of Lu. longipalpis complex using the per gene are consistent with those obtained in studies of pheromones and copulatory lovesongs [12]. Bauzer et al. [18] identified results that pointed to the existence of a species complex, since polymorphisms of the per gene may result in reproductive isolation.
Lins et al. [25] demonstrated that the IVS6 region of the cac gene can be used as an excellent molecular marker in population genetics studies, because this genomic region presents an intron with high variability and divergence between species. Bottechia et al. [19] observed that this region shows greater variability of gene flow than the per gene, suggesting that the IVS6 region may be more affected by introgression.
Another target of studies has been the mtDNa, as the 12S region, to evaluate the differentiation of populations of the Lu. longipalpis complex. The mtDNA presents maternal origin, evolves rapidly and does not recombine, being an adequate target to trace genealogy and evolutionary history. The 12S region of mtDNA was used by Beati et al. [24] when analyzing genetically different species of the genus Lutzomyia of Colombia and Peru, and identified different species of the genus in those countries.

Conclusions
Lutzomyia longipalpis genetic structure showed similar patterns according to the approach used, since both PCA and DAPC identified three populations. Thus, the use of different approaches to analyze clusters in population genetics was useful to provide a less biased data evaluation.
Studies about the Lu. longipalpis complex genetic structures can provide details on population differentiation and contribute to understand the processes of divergence and speciation, mechanisms responsible for the heterogeneity of vector capacity and competence, as well as vector susceptibility to infectious agents or insecticides. Thus, the evaluation of the population genetics of this vector can help to plan control measures appropriate to the real conditions of each transmission area of this important endemic in public health.