Identification of French Guiana sand flies using MALDI-TOF mass spectrometry with a new mass spectra library

Phlebotomine sand flies are insects that are highly relevant in medicine, particularly as the sole proven vectors of leishmaniasis. Accurate identification of sand fly species is an essential prerequisite for eco-epidemiological studies aiming to better understand the disease. Traditional morphological identification is painstaking and time-consuming, and molecular methods for extensive screening remain expensive. Recent studies have shown that matrix-assisted laser desorption and ionization time-of-flight mass spectrometry (MALDI-TOF MS) is a promising tool for rapid and cost-effective identification of arthropod vectors, including sand flies. The aim of this study was to validate the use of MALDI-TOF MS for the identification of Northern Amazonian sand flies. We constituted a MALDI-TOF MS reference database comprising 29 species of sand flies that were field-collected in French Guiana, which are expected to cover many of the more common species of the Northern Amazonian region, including known vectors of leishmaniasis. Carrying out a blind test, all the sand flies tested (n = 157) with a log (score) threshold greater than 1.7 were correctly identified at the species level. We confirmed that MALDI-TOF MS protein profiling is a useful tool for the study of sand flies, including neotropical species, known for their great diversity. An application that includes the spectra generated here will be available to the scientific community in the near future via an online platform.

Phlebotomine sand flies are insects that are highly relevant in medicine, particularly as the sole proven vectors of leishmaniasis. Accurate identification of sand fly species is an essential prerequisite for eco-epidemiological studies aiming to better understand the disease. Traditional morphological identification is painstaking and time-consuming, and molecular methods for extensive screening remain expensive. Recent studies have shown that matrixassisted laser desorption and ionization time-of-flight mass spectrometry (MALDI-TOF MS) is a promising tool for rapid and cost-effective identification of arthropod vectors, including sand flies. The aim of this study was to validate the use of MALDI-TOF MS for the identification of Northern Amazonian sand flies. We constituted a MALDI-TOF MS reference database comprising 29 species of sand flies that were field-collected in French Guiana, which are expected to cover many of the more common species of the Northern Amazonian region, including known vectors of leishmaniasis. Carrying out a blind test, all the sand flies tested (n = 157) with a log (score) threshold greater than 1.7 were correctly identified at the species level. We confirmed that MALDI-TOF MS protein profiling is a useful tool for the study of sand flies, including neotropical species, known for their great diversity. An application that includes the spectra generated here will be available to the scientific community in the near future via an online platform. PLOS

Introduction
Phlebotomine sand flies (Diptera: Psychodidae: Phlebotominae) are insects of great medical relevance because they are the most frequent vectors of leishmaniasis [1,2] and also transmit various other human pathogens including bacteria and viruses [3]. Leishmaniases are a range of diseases caused by flagellated protozoans of the genus Leishmania (Kinetoplastida: Trypanosomatidae), transmitted through the bite of an infected female sand fly [2]. The epidemiology of leishmaniasis is complex, due to the wide diversity of Leishmania and the sand fly species involved. In the Americas, 56 sand fly species are known to be potential vectors of 15 Leishmania species [2]. In French Guiana, Leishmania (Viannia) guyanensis is the most prevalent Leishmania species in humans and is mainly responsible for localized cutaneous leishmaniasis [4]. Other Leishmania species such as L. braziliensis can be more clinically debilitating, since they can cause mucocutaneous (nose, mouth, and throat commitment) or diffuse cutaneous leishmaniasis, requiring specific medical management [4,5].
On the Guiana Shield, the main vector of L. guyanensis is Nyssomyia umbratilis [6,7]. However, sand fly species transmitting medically important Leishmania species are partially identified or not yet identified, such as those related to the L. braziliensis local transmission cycle [2,7]. Additionally, vector ecology can evolve with environmental changes such as deforestation and urbanization [1,2,7]. Human activities in deforested areas may result in epidemics, as seen by the reported outbreaks of cutaneous leishmaniasis in French Guiana [5] and Argentina [8], for instance. Therefore, the identification of sand fly species associated with transmission hotspots together with a better description of sand fly communities are essential to improving the understanding of leishmaniasis epidemiology [1,2,8].
To date, sand fly ecology studies have been limited due to the complexity of species identification, which requires the meticulous and time-consuming labor of entomological taxonomy experts [9]. Molecular identification has been proposed as an alternative method and molecular reference databases have been made available [10,11]. However, cost analysis for huge series of samples shows that extensive screening remains expensive. The recent advent of protein profiling using matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is about to revolutionize medical entomology [12]. Several studies have shown that MALDI-TOF MS is suitable for sand fly identification, and in-house databases have been constructed [13][14][15][16][17]. Only four species of sand flies from South America have been tested to date and implemented in an in-house MALDI-TOF MS database [13]. A disadvantage of in-house databases is the restricted access to local utilization. As a solution, the implementation of a centralized online platform has been suggested for sand fly identification [12,13].
The aim of the study was to create a MALDI-TOF MS reference database of French Guiana sand flies. Sand flies were captured from different geographic locations in French Guiana, an Amazonian territory along the Northern South American Atlantic coast. A mass spectra library (MSL) was implemented, based on molecular identification of field-collected sand flies. The reliability of the MSL was then evaluated using a blind test. This MSL will be included in an online platform dedicated to phlebotomine sand fly identification that is currently being developed at Sorbonne University.

Experiment design
To constitute and validate a MSL, collected sand flies were identified by DNA sequencing and divided into two different panels, a construction panel and a validation panel (Table 1 and Fig 1). The separation of the sampling into construction and validation panels was sequential. The first 206 sand flies analysed were attributed to the construction panel because it gave a sufficient number of species (n = 20) and individuals by species (from one to ten) to build a MSL. The remaining 199 specimens, different from the samples of the construction panel, which had not been analysed before, were attributed to the validation panel. The construction panel comprised 260 sand flies collected from four study sites located in a pristine forest (at the Nouragues reserve), in secondary forest (Saint-Georges and Regina) and in logging forest (Counami). The validation panel comprised 199 sand flies, including 93 individuals that were collected at the same study sites and periods as those of the construction panel and 106 that were collected at four additional study sites and study periods. Additional sites were three peri-urban sites of secondary and edge forest (around Cayenne) and one pristine forest site (along the Approuague River in Saut Grand Machicou).

Field capture of sand flies
Between February 2017 and July 2017, male and female sand flies were collected from different types of forested habitats in French Guiana (Table 1, Fig 2). Sand flies were captured using Centers for Disease Control (CDC) Miniature light traps with Incandescent light ([John W. Hock Company, Gainesville, FL, USA, CDC]), set between 6 pm and 6 am. Back at the laboratory, the sand flies were rapidly killed by freezing at −20˚C and dissected immediately thereafter into four parts (head, thorax with legs and wings, abdomen and genitalia), at room temperature. After dissection, the thoraxes with legs and wings were stored dry-frozen at −80˚-C, and the other body parts were stored in 70% ethanol at −80˚C. Thoraxes with legs and wings were put aside for MALDI-TOF MS analysis and abdomens were put aside for molecular identification. Head and genitalia were kept for morphology in a future study (not analysed at the time of this study), in case of discordant or uninterpretable identification results. Before analysis, the period of dry-frozen storage at −80˚C varied between 10 days and 7 months, with a mean time of 4 months. The abdomens of all 677 sand flies were subjected to nucleic acid extraction, using a without-boiling Chelex protocol [18]. Sand fly DNA was amplified using Ins16S_1 primers targeting the 16S rRNA mitochondrial gene (Ins16S_1-F: TRRGAC-GAGAAGACCCTATA; Ins16S_1-R: TCTTAATCCAACATCGAGGTC), as previously published (216 bp) [11,19]. PCR amplification was performed in 20-μl mixtures containing 2 μl of 1/10 diluted DNA template, 10 μl of AmpliTaq Gold PCR Master Mix (5U/μl; Applied Biosystems, Foster City, CA, USA), 2 μl of each primer (5 μM) and nuclease-free water (Promega, Madison, WI, USA). The PCR conditions were a first denaturation at 95˚C (10 min) followed by 35 cycles of 30 s at 95˚C, 30 s at 50˚C and 30 s at 72˚C and a final elongation step at 72˚C for 10 min. When PCR amplification failed, an additional DNA purification was performed with the Qiagen kit (Qiamp DNA mini kit, Hildesheim, Germany), according to the manufacturer's instructions, and PCR amplification was tried again, resulting in 482 successful PCR amplifications.
Sequence editing and multiple alignments. Sequence chromatograms were visually inspected and consensus sequences were generated using the Molecular Evolutionary Genetic Analysis (MEGA) software version 7.0 [20]. Multiple sequence alignment was performed using the Clustal W tool implemented in MEGA. The nucleotide sequence assignment was achieved in two steps. First, we performed a maximum likelihood analysis with PhyML [21], using the Akaike Information Criterion to define the more likely molecular substitution model and the Shimodaira-Hasegawa approximate Likelihood Ratio Test [22] for branch support, by implementing all the sequences to the 208 reference sequences (i.e. 40 sand fly species) of French Guiana sand flies (GenBank accession number: KU761816-KU761608) [11]. Our sequences (GenBank accession number: MH389256-MH389715) were assigned at the species level when they clustered within the clade of the species' reference sequences. Then, for sequences branching outside the reference clade, the pairwise genetic diversity index was calculated using Arlequin software version 3.5 [23]. As soon as the clade diversity index remained below 3% (a distance threshold that was suggested for species delimitation [24]), the specimen was assigned to the species' reference clade. On the other hand, if the inclusion of the new sequence in the clade increased intraspecific diversity above 3%, it was discarded, considering that the likelihood of assignment error was too high. Within the complex of species of a given genus (e.g. Genus 1), if the sequences to assign did not cluster within any species clade (e.g. clade1 for species1, clade 2 for species2), the assignment remained at the supraspecific level (e.g. "Genus 1 species 1/species 2"; S1 Fig, S1 Table). The taxonomic classification was settled as recently described for American sandflies [25].

Sample preparation for MALDI-TOF MS analysis
Thoraxes with wings and legs were rinsed in ethanol 70% for 10 min in a 1.5-mL microcentrifuge tube. Tubes were centrifuged at (13,000 rpm, 10 min) and supernatant was discarded. After a second centrifugation (13,000 rpm, 2 min), the remaining ethanol solution was then eliminated using a micropipette and left to evaporate. Proteic extraction consisted in adding 10 μL of 70% formic acid. After a manual homogenization with a micropipette, the homogenate was incubated for 5 min. Then 10 μL of 100% acetonitrile was added and left to incubate for 5 min. The homogenate was centrifuged (13,000 rpm, 2 min) and 1 μL of the supernatant of each sample containing the protein extract was deposited onto a steel target plate (Bruker Daltonics, Wissembourg, France). Once dried, the deposits were covered with a 1-μL alphacyano-4-hydroxycinnamic acid (HCCA) matrix prepared in 50% acetonitrile and 2.5% trifluoroacetic acid. To ensure the reproducibility of the results, a total of ten replicates were spotted for each isolate to be included in the MSL and a total of four replicates for each isolate of the panel to be tested.

Mass spectra acquisition
Mass spectra were acquired with a Microflex LT (Bruker France SAS) using the default acquisition parameters. The spectra were acquired in linear mode in the ion-positive mode at a laser frequency of 60 Hz and mass range of 2-20 kDa. The data was automatically acquired using AutoXecute in FlexControl v3.4 software (Bruker France SAS), and exported into Maldi Biotyper v4.1 software for data processing with the default parameters and spectra analysis.

Reference database creation
MALDI-TOF MS spectra analysis. A main spectrum profile (MSP) was created for each sand fly species of the MSL construction panel. A MSP is an average spectrum composed of 10 raw spectra. It results from a spotting of ten replicates of each isolate. Each single sand fly led to one MSP. At least one MSP of each specimen and up to 10 MSPs for a given species were included, resulting in a MSL composed of 89 MSPs. To assess the technical reproducibility and the spectrum quality of the MSP resulting from a single sand fly specimen, the spectra were visually examined and the log (score) (LS) values of each spectrum composing an MSP were checked. The LS values were obtained by comparing each raw spectrum with its proper MSP and were valid if greater than 2. To better assess reproducibility between spectra, a composite correlation index (CCI) that considers peak positions, peak intensity distribution and peak frequency was computed with default settings (mass range, 3.0-12.0 kDa; resolution 4; eight intervals; auto-correction off). The matrix of the correlation indexes was represented as a heat map grid (index variation from 0 to 1). The levels of mass spectrum reproducibility are indicated from red to blue, revealing relatedness and incongruence between spectra, respectively. To assess the MSP relationship to one another, a cluster analysis (MSP dendrogram) according to protein mass profiles (m/z signals and intensities) was performed. The calculation mode was set to the default settings, the distance was measured by correlation, the linkage by the mean and the score threshold value for a single organism was 300 arbitrary units and 0 arbitrary units for related organisms. The closeness of one sand fly spectrum to other spectra was reflected by an arbitrary distance level.
Blind test. The accuracy of the MSL for sand fly identification was evaluated during a blind test with the validation panel including 199 sand flies. Anonymous sand fly thoraxes with legs and wings were provided to the experimenter. Each of the four raw spectra obtained from each sand fly specimen was identified by comparison with the MSL. The analysis report for each spectrum indicates a series of species with the highest LS value; the best score value of this series was considered as the isolate identification result (S2 Table). As previously published [26], four replicates of each isolate were deposited, but only the replicate with the highest LS was selected and the identification corresponded to the one obtained for this replicate. Resulting identifications were compared to molecular identifications for every isolate. Species from the validation panel that were not represented in the construction panel were used as negative controls. The performance of the identification system was tested with different LS thresholds, from 0 to 3, with an increment of 0.1 units for each cut-off.

Implementation of reference database
All specimens from the validation panel, with a valid MALDI-TOF MS-based identification and mass spectrum quality, were secondarily implemented in the reference database. The species of the blind test that were not previously represented in the MSL or that were represented at a number of specimens less than 10, were spotted in ten replicates until reaching ten references per species, with the same method as applied for the MSL construction.

Taxonomic assignments
The intra-clade genetic distance calculated on the 208 reference sequences, before and after the addition of our sequences to the analysis, was less than 3% for all the species. As expected, because of insufficient resolution, the 16S rDNA sequencing identification failed to discriminate morphologically closely related species Trichopygomyia trichopyga / Trichopygomyia depaquiti as well as three species of the genus Nyssomyia: N. umbratilis, N. yuilli and N. antunesi (S2 Table). Ten individuals did not cluster with any species in the available reference sequences.

Species composition of panels
MSL was composed of 20 species of field-collected sand flies. Between one and ten specimens of each MSL species were included, corresponding to 89 specimens. The validation panel was composed of 199 specimens, including from one to 45 specimens of 24 different sand fly species. The details of the sand fly species in each panel are found in Table 2.

Mass spectra protein profiles
The spectra had good resolution and intensity, with a large mass/charge interval, ranging from about 2 to 10 kDa (Fig 3A, 3B and 3C). The spectra were highly homogenous and reproducible when obtained from different protein extract deposits of a single specimen (Fig 3A). Variability of mass spectraprotein profiles was observed between different specimens of a single species, including when comparing spectra obtained from specimens of the same sex (Fig 3B).
When comparing spectra within a complex of species or between different species, the heterogeneity of mass spectraprotein profiles was observed (Fig 3C).

Reproducibility of mass spectra
In the heat map grid of the CCI matrix values (Fig 4), the coloured squares of the central diagonal reflected the degree of reproducibility of each specimen's mass spectra when compared to itself. Hot colours reflected high reproducibility of each specimen's mass spectra. Around the central diagonal, spectra from various specimens of the same species were compared; hot colours showed a high level of intraspecific reproducibility of mass spectra (diagonally), distributed in a cluster of species (square). A mosaic of hot colours inside a cluster of identical species was indicative of heterogeneity of mass spectraprotein profiles between specimens and reflected intraspecific diversity. Outside of the diagonal, when the spectra of different species were compared, colder colours revealed lower CCI values with very little between-species MSP correlation, confirming the high intra-species specificity of the mass spectra. The matrix of CCI values of sand flies MSL is available in the supplementary data (S3 Table).

Relationship between mass spectra
Cluster analysis of the dendrogram (Fig 5) showed that each specimen belonging to the same species, either males and females, collected from various sites in French Guiana, clustered on the same branch. This result attests to the intra-species specificity of MALDI-TOF MS sand fly protein profiles and of the consistency with molecular identification. In concordance with molecular results, T. trichopyga and T. depaquiti were grouped together in a unique cluster of mass spectra. For the Nyssomyia genus, specimens were separated on two different branches of the dendrogram, whereas three were clustered in a monophyletic group of the maximum likelihood tree by molecular analysis of the 16S rDNA (S1 Fig).

MALDI-TOF MS-based identification of sand flies
According to the distribution of LS values, the interpretable identification result was defined as the best match of four spots with a LS value �1.7 (Fig 6). Of all the sand flies tested by MAL-DI-TOF MS during the blind test, 79% (157/199) gave interpretable MALDI-TOF MS-based identification results. A total of 37 samples corresponded to species molecularly identified that were missing in the MSL. When a corresponding reference spectrum was available in the MSL, 97% (157/162) of the MALDI-TOF MS-based identification results were interpretable. For specimens that did not have a corresponding reference spectrum in the MSL, 100% (37/37) of the identifications were not interpretable, because the best match of the four spots did not reach the threshold of 1.7 (mean LS value = 1.36±0.095).
With the threshold value �1.7, no misidentification was observed. Of the 157 sand flies with interpretable identification results, 100% were correctly classified at the species level with the best match LS value ranging from 1.7 to 2.6 (mean LS value = 2.23±0.19). Overall sensitivity was 79% when considering all the sand flies tested and 97% when considering only species with a corresponding reference in the MSL. Specificity was 100%.
Five sand flies tested with a corresponding reference in the MSL had a LS value <1.7 and could not be identified. Three of them, a P. hirsutus (LS value = 1.5), a T. trichopyga/T.  Table).

Implementation of the MSL
The MSL was implemented with the mass spectra of nine additional sand fly species. Mass spectra of additional specimens of species already present in the MSL were also implemented to increase the diversity of mass spectra. The resulting reference database was made up of 282 specimens and 29 sand fly species (Table 2).

Discussion
The newly generated MALDI-TOF MS reference database was composed of 29 sand fly species collected in the field from eight forest sites displaying various ecological niches of French Guiana. This work is the first to validate MALDI-TOF MS for the identification of sand flies from Northern Amazonia, a region hosting a great diversity of invertebrates and an almost infinite set of ecological niches. The use of DNA barcoding in entomological investigations highlighted the presence of both cryptic and complex species that may complicate taxonomic identification [11,28,29]. Few previous studies have used the MALDI-TOF MS for identification of sand flies, using the thorax with legs and wings [13][14][15][16][17]. Morphological identification is usually regarded as the gold standard to build entomological MALDI-TOF MS reference databases. The originality of this study was that all the sand flies were previously identified by DNA sequencing instead of morphologically. This method of identification was possible because of a previously published molecular database of 40 species of French Guiana sand flies [11]. Although we have been limited in the identification of closely related sand fly species such as species of the Nyssomyia complex, availability of molecular reference sequences and a stringent methodology of molecular assignation allowed us to rapidly and accurately identify most of the sand fly species in our sample and avoid morphological misidentifications. Nevertheless, of the 482 field-collected sand flies subjected to DNA sequencing, ten sequences did not match any sequences in the molecular database. These sequences may correspond to sand fly species missing in the molecular database and/or not previously described, requiring morphological identification before including them in the MSL. In this regard, despite the extensive molecular database, available morphological identification is usually required, associated with extensive long-term field work in order to fill gaps on rare fly species. Indeed, sand fly fauna in forested environments is usually represented by a few dominant species and a large number of species with few individuals [30].
Additional trapping methods must also be considered, since light traps have sampling limitations [31] and multi-trapping approaches have been demonstrated to promote more representative sampling of the local species community [6]. Nevertheless, a previous study [16] showed that the sampling method must be taken into consideration because it can considerably impact the quality of proteic spectra, especially when using sticky traps. When compared with CDC light traps, the quality of the spectra obtained from sand fly specimens collected by sticky traps was always lower and did not allow correct identification.
Given the diversity of species encountered in the Guiana Shield, a high number of sand flies were captured (n = 677) from multiple environmentally distinct collection sites (n = 8) to build the MSL. Storing mode (-80˚C) and sampling preparation provided high-quality mass spectra. A great diversity of mass spectra was observed, reinforcing the necessity to include a high number of specimens for each species. Differences between the spectra of various specimens of the same species were observed, including between specimens of the same gender, in contrast to a previously published study of sand flies in Algeria that observed a specific pattern of mass spectra protein profiles according to gender [15]. The intraspecific diversity of the mass spectra observed in the present study reveals a great variability of protein content, in relation to the genetic diversity of sand flies. A mosaic of environmental settings, evolutionary history adaptation, demographic history and genetic drift could have been involved. The phenotypic plasticity of sand flies has been previously highlighted by intraspecific morphometric variations, as shown in Phlebotomus ariasi submitted to different environmental pressures, for instance [32]. This heterogeneity of mass spectra was also reported when comparing mosquito and sand fly spectra from various geographical origins, and between reared and field mosquito spectra, although storage conditions were strongly implicated rather than phenotypic distinctness [13,14,33]. The validation panel included a reasonable number of sand flies (n = 199), from the same sites as those used for the MSL as well as from additional sites. This increased the probability of adding genetic diversity of sand flies to the panel, including additional sand fly species.
Overall sensitivity was 79% and specificity was 100% with LS �1.7, using a MSL composed of 20 reference species and despite sampling discrepancies of the construction and validation panels. Thirty-seven sand flies on the validation panel belonged to species that were not included in the MSL construction panel. As expected, these could not be identified. When considering the identifications of sand fly species represented in the MSL only, sensitivity was 97%. No misidentifications were observed, confirming the high specificity of mass spectra.
The five identification failures were attributed to the poor quality of mass spectra or to the insufficient number of mass spectra included in the library. Comparable results were also observed in previous sand fly MALDI-TOF MS studies [13,15]. However, higher LS values were obtained in a study of six sand fly species from Algeria [15], and a LS threshold of 1.9 was defined. Given the high reproducibility of mass spectra, disparities in LS values may be attributed to the greater number of species, which statistically increased the probability of variability of LS values. Indeed, Yssouf et al. [34] found lower LS values, much like our results when testing a database of 20 mosquito species with a LS threshold of 1.8. We showed that a lower LS threshold, set to 1.7, correctly identified sand flies, confirming the high quality of mass spectra. Comparison of the distributions of best log (score) values when spotting one, two, three and four replicates revealed a significant difference between one spot and three or four spots (see S4 Table, S2 Fig). When the number of replicates increased, the best log (score) was higher, but the distributions did not significantly differ when comparing two, three and four spots. Since sand flies are precious samples and given that the proteic extraction and deposit could not be repeated, we recommend that future users of the MSL deposit four replicates of proteic extract from thorax with legs and wings to ensure the best identification results when using the MSL.
Using clustering analysis, the groups of spectra were consistent with molecular data analysis and maximum likelihood branching. Nevertheless, we highlighted that both MALDI-TOF MS and the mitochondrial 16S rRNA marker may lack taxonomic resolution for cryptic and/or closely related species. For MALDI-TOF MS, this phenomenon was previously shown when used for mosquito identification [33][34][35]. For the 16S rRNA marker, this was observed in the French Guiana sand fly description by Kocher et al. [11]. Indeed, the 16S rRNA marker failed to accurately distinguish the morphologically distinct, but closely related species, T. trichopyga and T. depaquiti, whereas the MALDI-TOF MS spectra of these specimens were clustered together. Conversely, when some species of the Nyssomyia genus formed a monophyletic group by molecular analysis of the 16S rRNA gene, including N. umbratilis, N. antunesi and N. yuilli pajoti (recently raised to the species level as N. pajoti, [25]), these Nyssomyia genus species were grouped in two clusters of spectra in the MALDI-TOF MS dendrogram. Based on morphology closeness, we can assume that the two clusters of spectra correspond to N. umbratilis for one cluster and N. antunesi and N. yuilli pajoti for the other. This suggests that MAL-DI-TOF MS may sometimes have a higher resolution than DNA barcoding. In the context of this study, a morphological identification associated with the use of higher-resolution molecular markers such as the cytochrome oxidase I mitochondrial gene, should be performed for more relevant taxonomic assignment and inclusion of Nyssomyia specimens in the database at the species level. Nevertheless, northern Amazonian sand fly studies using higher-resolution markers such as a 1,181-bp and a 663-bp barcode of the cytochrome oxidase I mitochondrial gene showed that N. umbratilis was a species complex that is difficult to accurately describe taxonomically [26,36]. In addition, they revealed that N. umbratilis was closely related to Nyssomyia anduzei, a species sequence that was not available for comparison in the molecular database of Kocher et al. [11]. Few sand fly studies have clearly associated MALDI-TOF MS, morphology and DNA barcoding using high-resolution markers [13,15], and genetic analyses were not applied to the entire sample. A recent study revealed a novel spectrum for a morphologically identified specimen of Phlebotomus perfiliewi, which was further identified as belonging to the Phlebotomus perfiliewi complex using cytochrome b and cytochrome oxidase I markers [13]. The existence of a species complex may explain the variability in mass spectra and suggests that MALDI-TOF MS can harbour specific proteic patterns when dealing with a species complex, as envisaged for some species of the Nyssomyia complex. The lack of resolution of MALDI-TOF MS for closely related species has also been observed for Leishmania species [37] as well as for molds such as dermatophytes [26]. A protein-profiling approach has been developed to discriminate cryptic species of the Anopheles gambiae complex including the separation between the M and S molecular forms of A. gambiae sensu stricto, but it was conducted using laboratory colonies and it does not seem promising with field-collected specimens [35]. Resolution of MALDI-TOF MS must be improved to better discriminate cryptic species and to better elucidate taxonomic relationships, using combinations of morphologically based and DNA-based molecular identifications.
After implementation of mass spectra from the validation panel, the MALDI-TOF MS reference database comprises 29 species (282 mass spectra), accounting for about 36% of the 81 sand fly species previously described in French Guiana and may cover the most common species [38][39][40], including those dominating the communities in a large diversity of forested habitats, encountering different environmental stress levels.
The MALDI-TOF MS reference database covered the major vectors of Leishmania species involved in human cutaneous leishmaniasis in the Guiana Shield, such as species of the Nyssomyia genus, P. squamiventris maripaensis, T. ubiquitalis and B. flaviscutellata [6,7].
Considering the species composition of this database, the MALDI-TOF MS identification tool may be useful for the large-scale inventory of sand fly species. It may facilitate the description of sand fly communities in the Guiana Shield and vector investigations in emerging leishmaniasis foci. There are few sand fly entomological experts and most of them are specialized in a specific geographical area. The conventional approach to sand fly species identification usually requires the mounting of each specimens' head and genitalia, which bear the key characteristics. Both slide preparation and species identification are laborious and time-consuming and require ability and expertise [14]. Thus, there is a need for developing other sand fly identification methods. Using the present molecular protocol, we estimate that 500 specimens can be analysed by a single person in 2 weeks and for a total cost of $8-12 per sample. In contrast with molecular methods, which require several steps of analysis from DNA extraction to sequence editing and assignment, MALDI-TOF MS analyses are assessed in a few hours. We estimate that once the reference database is created, 500 specimens can be analysed by a single person in 1 week (considering a rate of four replicates per specimen). Once the MALDI-TOF MS instrument is acquired, which is expensive and therefore a major investment, this method requires inexpensive consumables and the cost is estimated at $1-2 per sample. Therefore, when compared to molecular methods and at the same level of taxonomic resolution, MAL-DI-TOF MS should be the best suited method for eco-epidemiological studies in areas where entomological experts may not be available.

Conclusion
We confirm that MALDI-TOF MS protein profiling is well adapted to the identification of sand fly species, including in neotropical areas, known for its great diversity of sand fly species. MALDI-TOF MS can be a useful tool for rapid, inexpensive and accurate identification of sand flies but, like molecular methods, better accessibility to reference libraries for the scientific community would extend its utility. In the near future, this Northern Amazonian sand fly spectral database will be included in an online platform dedicated to phlebotomine sand fly identification, as already applied with success for identification of fungi and Leishmania of medical interest [26,37]. Recent studies have shown that MALDI-TOF MS was also accurate for the detection of Rickettsia spp. [41] and Borrelia crocidurae [42] in ticks, Plasmodium spp. in Anopheles mosquitoes [43] and Bartonella spp. in fleas [44], by generating distinct mass spectra protein profiles between infected and uninfected arthropods. The possibility of identifying sand flies to the species level as well as the infection status by Leishmania parasites using MALDI-TOF MS would offer a significant opportunity for sand fly eco-epidemiology studies.