Molecular Taxonomy Provides New Insights into Anopheles Species of the Neotropical Arribalzagia Series

Phylogenetic analysis of partial mitochondrial cytochrome oxidase c subunit I (COI) and nuclear internal transcribed spacer 2 (ITS2) sequences were used to evaluate initial identification and to investigate phylogenetic relationships of seven Anopheles morphospecies of the Arribalzagia Series from Colombia. Phylogenetic trees recovered highly supported clades for An. punctimaculas.s., An. calderoni, An. malefactor s.l., An. neomaculipalpus, An. apicimacula s.l., An. mattogrossensis and An. peryassui. This study provides the first molecular confirmation of An. malefactorfrom Colombia and discovered conflicting patterns of divergence for the molecular markers among specimens from northeast and northern Colombia suggesting the presence of two previously unrecognized Molecular Operational Taxonomic Units (MOTUs). Furthermore, two highly differentiated An. apicimacula MOTUs previously found in Panama were detected. Overall, the combined molecular dataset facilitated the detection of known and new Colombian evolutionary lineages, and constitutes the baseline for future research on their bionomics, ecology and potential role as malaria vectors.


Introduction
Malaria elimination remains a goal in Colombia where 64,309 malaria cases were reported in 2012 [1]. After Brazil, Colombia consistently has the highest number of annual malaria cases in Latin America [2], and underreporting is common [3]. Vector control remains one of the most effective measures to prevent malaria transmission [4,5] and for this, accurate Anopheles species identification is an essential part of targeted control strategies [5]. However, in Colombia several species in the subgenus Anopheles, including some potential malaria vectors, are relatively understudied.
The Anopheles subgenus comprises 187 valid species of which 56 are reported in the New World; 24 of these species are in the Neotropical Arribalzagia Series [6]. All species included in under a protocol and written informed consent agreement approved by a University of Antioquia Institutional Review Board (Comité de Bioética, Sede de Investigación Universitaria, CBEIH-SIU, approval number 07-41-082). In addition, some An. malefactor fourth stage larvae were collected and reared to adults. Selected larval exuviae and male genitalia were mounted on microscope slides using Euparal, and at least one voucher specimen per species was deposited in the collection of the Laboratorio de Microbiología Molecular, Universidad de Antioquia, Colombia. Morphospecies were identified using the keys of González & Carrejo [8] and Wilkerson & Strickman [26]. Genomic DNA was extracted from abdomens using a salt precipitation protocol [38].  Table 1

Barcode region
The COI gene region was amplified using the LCO and HCO universal primers [39] and modified PCR conditions [40]. PCR products were subjected to bidirectional sequencing. All the sequences were translated to amino acids to detect stop codons and potential shifts in reading frame as a test for possible nuclear mitochondrial pseudogenes (Numts). The COI protein sequence published for Anopheles gambiae (GI: 5834913) was used as a reference to indicate positions of amino acid changes. Potential contamination was explored using BLAST searches [41] (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Additional COI sequences from GenBank were included for comparison with the original dataset ( Table 2). COI sequences obtained in this work were deposited in GenBank under accession numbers KF698801-KF698878.

ITS2 region
The rDNA ITS2 was amplified using the primers ITS2-F (5'-TGAACTGCAGGACACAT-GAAC-3') and ITS2-R (5'-ATGCTTAAATTTAGGGGGTAGTC-3'), and analyzed by a RFLP assay [25,34,36]. Available confirmed specimens from the Arribalzagia Series species were used as controls for amplification and RFLP. The in silico restriction digestion for each species was predicted using Webcutter 2.0 tool available at http://rna.lundberg.gu.se/cutter2. PCR products were cloned using the CloneJET PCR Cloning Kit (Thermo Fisher Scientific Inc., PA, USA), and three to five clones per specimen were selected for sequencing. The ITS2 sequences were deposited in GenBank under accession numbers KF698879-KF698921 and KM262754-KM262760.

COI and ITS2 analyses
The COI and ITS2 sequences were edited in Geneious version 6.0.6 [42]. For ITS2, the flanking 5.8S and 28S regions were identified using the Diptera model through the ITS2 annotation tool [43,44] and excluded before ITS2 analysis. Analyses of intragenomic, intra-and interspecific ITS2 variation were performed in at least two specimens per morphospecies using DAMBE [45]. Mean uncorrected pairwise distances and standard errors were calculated with MEGA 5.0 [46]. The presence of interspersed and tandem repeat sequences was explored using the bioinformatics software Spectral Repeat Finder (SRF) [47] and Tandem Repeat Occurrence Locator (TROLL) [48], respectively.
For interspecific comparison, ITS2 sequences of Anopheles species belonging to Arribalzagia Series retrieved from GenBank also were analyzed (Table 2). Manual editing and multiple sequence alignment were performed with Geneious version 6.0.6 under default parameters. Sequences were checked for insertions or deletions.

Phylogenetic analysis
Sequences were aligned and gaps were treated as missing data. Because saturation in substitutions can have negative effects on phylogenetic inference, saturation levels were tested in DAMBE [49]. The best-fit model of DNA substitution and the parameter estimates used for tree construction were chosen according to the Akaike Information Criterion (AIC) as implemented in jModeltest version 2.1.4 [50]. The results provided TrN+I+G and TIM3+G as the best-fit models for the COI and ITS2 datasets, respectively. Phylogenetic trees based on ITS2, COI or the concatenated (COI+ITS2) datasets were constructed using methods of Neighbor-Joining (NJ) in MEGA 5.0 [46], Maximum Likelihood (ML) in PhyML 3.0 [51] and Bayesian (BI) analysis in MrBayes [52]. To construct the NJ tree we chose Kimura's (1980) two-parameter model (K2P), typically used in the DNA barcode strategy [53]. ML and BI were run with the parameters inferred from jModeltest. Although all ITS2 variants were included in the ITS2 phylogeny, only the most frequent variant of each specimen was included in the concatenated analysis. For Bayesian inference, the analyses were initiated with random starting trees and the Markov chain Monte Carlo search was run with four chains for five million (ITS2 and ITS2+COI) for ten million generations (COI), sampling every 1,000 generations and discarding and average of 25% of each run as burn-in. Bootstrap sampling (1,000 replicates) was performed to test inferred phylogenies. Anopheles (Anopheles) pseudopunctipennis was used as the outgroup [54].

DNA-based approaches for species recognition
Molecular Operational Taxonomic Units (MOTUs) were identified according to the reciprocal monophyly criterion for the different phylogenetic approaches using individual markers (COI or ITS2) or concatenated (COI+ITS2) trees. The species delimitation plugin for Geneious [55] was used to calculate Rosenberg's P AB value, a test for taxonomic distinctiveness based on the null hypothesis that the observed monophyly was found by chance alone [56]. Correspondence between morphospecies and MOTUs in the gene trees was evaluated.
Criteria for assessing and comparing the COI barcode for specimens of the seven Arribalzagia Series species, included Best Match (BM), Best Close Match (BCM) and All Species Barcodes (ASB), as performed in TaxonDNA. These three algorithms are used to test identification success [57]. The presence or absence of the "barcode gap", or the result of a smaller intraspecific divergence with respect to interspecific divergence, was evaluated. In addition, identification success was determined based on the 1% standard threshold cut-off value suggested by The Barcode of Life Data System [58], using SPIDER [59]. Automatic Barcode Gap Discovery (ABGD) allowed the partitioning of the DNA sequence datasets into clusters of like taxa setting a range of maximum values of intraspecific divergence (P) without an a priori species hypothesis [60].
The PCR-RFLP-ITS2 assay was used as an initial approach for species confirmation. The PCR-ITS2 yielded a different product size for each species (Table 2), ranging from 393 bp for An. punctimacula to 481 bp for An. apicimacula s.l. Interestingly, there were two slightly different PCR product sizes for Anopheles malefactor Dyar and Knab: 396 bp (specimens from Antioquia) and 401 bp (specimen from Norte de Santander). The species An. punctimacula s.s., An. apicimacula, An. neomaculipalpus could be differentiated by their PCR-RFLP-ITS2 patterns (Table 3). Furthermore, An. mattogrossensis and An. peryassui yielded similar banding patterns with slight differences in band sizes, which could not be discriminated on the electrophoresis gel. Two restriction patterns were detected for An. malefactor. Specimens from Antioquia and Córdoba Departments yielded a similar restriction pattern to An. punctimacula s.s. However, the ITS2 fragment corresponding to the Norte de Santander specimen was not cut by the enzyme, a result confirmed by bioinformatic analysis.

ITS2 sequence characterization
The Arribalzagia Series species had a range of ITS2 size from 265 bp in An. punctimacula s.s. to 353 bp in An. apicimacula from the Colombian Caribbean or lineage C. Overall, ITS2 size was conserved at the intraspecific level, except for a few specimens within some species (S1 Table). At the individual level, low intragenomic ITS2 variation was detected in all species and was restricted to a few mutations. The highest ITS2 intragenomic mean uncorrected p-distance of 1.23% was detected for one An. calderoni specimen from Risaralda. Anopheles mattogrossensis specimens had two ITS2 variants that differed by one bp (336 and 337 bp); the mean uncorrected p-distance between them was 1.19%. Comparison of both ITS2 variants with those available from GenBank found that the 336 bp ITS2 variant had the same deletion as a sequence reported for An. mattogrossensis from the state of Rondônia in western Brazil (AF461754). Likewise, the 337 bp variant was similar to the one reported for An. mattogrossensis from southern Colombia (JX198307). Further details about the specimens, number of clones and the mean uncorrected p-distance among ITS2 sequences can be found in S1 Table. At the intraspecific level, two ITS2 variants were detected in An. malefactor. The ITS2 variant detected in Norte de Santander Department, NE Colombia, was identical to a previously  [22], were also found in those from Colombia. Two additional nucleotide substitutions were detected, one transversion at position 108 of the ITS2 alignment (T!A) in some sequences from Antioquia and La Guajira, and a unique sequence with a transition (G!A) at position 169 from La Guajira. The mean average ITS2 interspecific K2P distance was 48.8%. The highest ITS2 genetic distance was between An. peryassui and An. mattogrossensis (66.2±3%), and the lowest between An. punctimacula s.s. and An. malefactor (11.5±1.8%) (S2 Table). A common pentanucleotide tandem repeat (CACCT) 2 present in all ITS2 of the Arribalzagia Series species from Panama [22], was detected in five Colombian species, An. punctimacula s.s., An. calderoni, An. malefactor, An. neomaculipalpus and An. apicimacula s.l. Additionally, one hexanucleotide tandem repeat (TGCGCA) 2 was detected in An. calderoni.

DNA barcoding
The COI alignment was 611 bp and yielded 79 unique haplotypes. There were 180 polymorphic sites (29.46%), from which 160 were parsimoniously informative (26.18%). Nucleotide changes mainly occurred at the third-codon positions and were silent. However, some interspecific COI nucleotide differences led to non-synonymous amino acid substitutions and some single amino acid differences were fixed at the species level. Anopheles punctimacula s.s. had a substitution (serine to alanine) at position 171, and An. malefactor at position 191 (valine to isoleucine). The average genetic distance among all COI sequences was 9. The most frequent An. calderoni haplotype (KF698801 = 27.27%) was shared among Pacific region localities within 122 km (Buga-Pereira). For An. punctimacula s.s., the most frequent haplotype (KF698833 = 66.67%) was from localities in Antioquia, Córdoba and La Guajira Departments (maximum straight-line distance between the farthest localities = 495 km). Lastly, the localities of Antioquia and Córdoba contained the most frequent haplotype for An. neomaculipalpus (KF698843 = 20%). There was a high number of unique haplotypes in the analyzed taxa (49.6%). The overall mean nucleotide diversity for the barcode was 0.084.
Intertaxa COI genetic distances among the seven Arribalzagia Series members ranged from 9.3±1.3% between An. punctimacula s.s. and An. malefactor, to 14.7±0.8% between An. calderoni and An. peryassui (S3 Table). Each of the Anopheles members formed a monophyletic group in the NJ, ML and BI trees with high support (Fig. 2). Results were also supported by Rosenberg's P AB values that were significant for all MOTUs (p<0.05). The BM and BCM criteria yielded identical results with 100% "correct" identifications. Moreover, 97.87% and 2.13% of the COI sequences were assigned as "correct" and "ambiguous" using the ASB criterion, respectively. The problematic identifications corresponded only to COI sequences of An. apicimacula lineage C. A COI threshold value equal or greater than 1% provided a perfect species identification for the dataset, with the presence of a barcoding gap. The ABGD method consistently revealed eight groups using an a priori intraspecific genetic divergence 1.47% under Jukes and Cantor's model (JC69), supporting each morphospecies as a single species, except for An. apicimacula that encompassed two provisional MOTUs for all lower cut-offs.

Phylogenetic relationships
There was no saturation signal among the COI sequences (p < 0.05), validating the dataset for phylogenetic analyses. Phylogenetic trees based on NJ, ML and BI approaches with each marker (i.e. COI or ITS2) showed highly supported discrete clades for An. punctimacula s.s., An. calderoni, An. malefactor, An. neomaculipalpus, An. apicimacula Caribbean and Pacific lineages, An. mattogrossensis and An. peryassui. Two highly supported mitochondrial lineages (BPP: 0.98) were detected in all trees for An. apicimacula s.l. that corresponded exclusively to specimens from the Caribbean and Pacific regions of Colombia (Figs. 2-4). Bayesian trees derived from ITS2 and COI+ITS2 data showed very similar topologies, whereas species groups in the COI tree were less evident.

Discussion
This study confirms that previously recognized morphospecies of the Arribalzagia Series of Colombia constitute independent evolutionary lineages or MOTUs and reveals hidden lineages. Strong molecular evidence supports at least two geographically separated MOTUs of An. apicimacula in the Colombian Pacific and Caribbean regions, respectively. The level of COI intraspecific variation for An. apicimacula s.l. (4.44%), compared to the standard value usually found for Anopheles species (<2%) [61,62], and the fixed ITS2 sequence differences together support the hypothesis that An. apicimacula is a complex [22]. These lineages are distributed along the Chocó/Darien/Western Ecuador biodiversity hotspot [63]; a variety of factors in this region, including landscape heterogeneity, historical demographical processes and Pleistocene environmental changes might have driven divergence [64,65]. It will be interesting to include An. apicimacula specimens from the type locality (Livingston, Guatemala) [66], to determine whether either of these lineages constitutes An. apicimacula s.s. Further sampling of An. apicimacula s.l., and the application of integrative taxonomic analysis will assist new species delimitation and geographical range [67,68].
Low intragenomic ITS2 variation was detected for most of the Arribalzagia species (<1%), with values comparable to those for other Anopheles species such as African Anopheles arabiensis (0.07%), An. gambiae (0.43%) [69], Neotropical An. nuneztovari (<0.2%) [70], and members of the Albitarsis Complex (<0.57%) [71]. At the intraspecific level, an unexpected 4.5% divergence between the N and the NE An. malefactor ITS2 variants was higher than those reported among 21 species of the subgenus Nyssorhynchus (0-2.8%) [72] or members of the Albitarsis Complex (0.28-1.17%) [71]. Nonetheless, ITS2 divergence was not supported by COI analysis (K2P distance <1%). Although the greatest distance between the collection localities for the Colombian An. malefactor specimens was 517 km, geographical distance alone cannot explain the ITS2 differentiation. The Panamanian An. malefactor ITS2 sequences were identical to those of the Colombian NE, collected 550 km away, whereas the NW specimens collected 227 km from the Panamanian An. malefactor collection site differed. Other factors may be responsible for this divergence, e.g., although it is recognized that concerted evolution homogenizes the ITS2 at the species level [73,74], differences in population size or migration rates could also affect ITS2 evolution among populations [75]. Similar COI but highly divergent ITS2 sequences in An. malefactor is hypothesized to be the result of mitochondrial introgression or incomplete lineage sorting at the mitochondrial locus, as observed for other species such as An. cruzi [76][77][78][79] and it also suggests that An. malefactor comprises at least two MOTUs. An integrative taxonomic approach that includes analysis of additional molecular markers should provide details of population structure, demographic history, and the formation of evolutionarily independent lineages in An. malefactor [64,80]. Morphological keys to adult females were useful for initial species identification, but some An. apicimacula were misidentified as An. punctimacula or An. neomaculipalpus. Overall, some Arribalzagia Series species females differ in morphological characters such as wing spot pattern (S1 Fig.). For instance, the small and appressed dark scales on the cubital vein of An. apicimacula s.l., which differ in form in An. punctimacula and An. neomaculipalpus, [8,26] was not easily detected in some specimens, resulting in their misidentification. This character of An. apicimacula is only shared with An. intermedius, a species that has been historically reported in forested areas from Central and South America [81], but with just one doubtful early record in Villavicencio, Colombia [8,82]. In this study, An. malefactor s.l. was relatively easy to separate based on its entirely white hindtarsomere 5, which usually has at least one dark band in An. punctimacula and An. calderoni [8]. Misidentification of specimens could result from intraspecific variation in wing spots as documented for other anopheline species [14,83], the loss of thoracic scales due to the sampling technique or during the mosquito life cycle [84] or human error in identifying ambiguous or damaged field samples. It will be important to determine if An. malefactor and An. apicimacula lineages are truly morphologically cryptic [85].
The PCR-RFLP-ITS2 strategy facilitated the identification of most members of the Arribalzagia Series. However, for accurate identification of An. punctimacula s.s.-An. malefactor s.l. and An. mattogrossensis-An. peryassui, that showed similar restriction patterns, we recommend sequencing the ITS2 or COI barcode region. Nevertheless, the low cost and effort needed to implement PCR-RFLP protocols in the laboratory [25,34,86,87], suggest the importance of designing a PCR-RFLP strategy based on a useful marker for the rapid and accurate discrimination of species and lineages in the Arribalzagia Series.

Conclusions
Nuclear and mitochondrial markers recovered monophyletic morphospecies in the Arribalzagia Series and allowed updating records of these species in several localities of the country. This is the first work in Colombia providing molecular confirmation of An. apicimacula, An. punctimacula s.s. and An. malefactor s.l. The two An. apicimacula evolutionary lineages detected, Pacific and Caribbean, with fixed differences in the mitochondrial and nuclear loci, likely represent two species. A possible mitochondrial introgression event or incomplete lineage sorting during the phylogenetic history of An. malefactor is hypothesized. Information on accurate identification of Colombian Arribalzagia Series species constitutes the baseline for future studies on their bionomics and vectorial importance that could be used for targeted and effective control efforts.