Cryptic Genetic Diversity within the Anopheles nili group of Malaria Vectors in the Equatorial Forest Area of Cameroon (Central Africa)

Background The Anopheles nili group of mosquitoes includes important vectors of human malaria in equatorial forest and humid savannah regions of sub-Saharan Africa. However, it remains largely understudied, and data on its populations’ bionomics and genetic structure are crucially lacking. Here, we used a combination of nuclear (i.e. microsatellite and ribosomal DNA) and mitochondrial DNA markers to explore and compare the level of genetic polymorphism and divergence among populations and species of the group in the savannah and forested areas of Cameroon, Central Africa. Principal Findings All the markers provided support for the current classification within the An. nili group. However, they revealed high genetic heterogeneity within An. nili s.s. in deep equatorial forest environment. Nuclear markers showed the species to be composed of five highly divergent genetic lineages that differed by 1.8 to 12.9% of their Internal Transcribed Spacer 2 (ITS2) sequences, implying approximate divergence time of 0.82 to 5.86 million years. However, mitochondrial data only detected three major subdivisions, suggesting different evolutionary histories of the markers. Conclusions/Significance This study enlightened additional cryptic genetic diversity within An. nili s.s. in the deep equatorial forest environment of South Cameroon, reflecting a complex demographic history for this major vector of malaria in this environment. These preliminary results should be complemented by further studies which will shed light on the distribution, epidemiological importance and evolutionary history of this species group in the African rainforest, providing opportunities for in-depth comparative studies of local adaptation and speciation in major African malaria vectors.


Introduction
In the equatorial forest regions of Africa, malaria remains endemic despite significant efforts in treatment of the disease and vector control [1]. Part of explanation for this situation resides in the fact that a number of genetically, behaviorally and ecologically distinct species and populations of Anopheles mosquitoes can and do transmit Plasmodium, the malignant agent of the disease, simultaneously or replace each other seasonally insuring year-round transmission [2]. Unfortunately, many of these vectors might escape traditional vector control operations by Indoor Residual Spraying of insecticides (IRS) and Insecticide-Treated bed Nets (ITNs), which are directed against indoor biting and/or resting mosquitoes [3,4]. This is exemplified in the Anopheles nili group of malaria vectors.
The An. nili group includes four closely related species namely An. nili sensu stricto (thereafter, An. nili s.s.), An. somalicus, An. carnevalei and An. ovengensis which can be identified through slight morphological diagnostic characters present at the larval and/or adult stages [5][6][7], and by a species-specific PCR assay [8]. Their larvae typically develop in river networks with a marked preference for the edges of fast running streams and rivers exposed to light and containing vegetation and/or floating debris [9]. Anopheles nili s.s. is widespread and is found throughout sub-Saharan Africa. It is highly anthropophilic, biting man and resting indoors as well as outdoors [5,[10][11][12][13][14][15][16]. The other members of the group are more localized within the equatorial forest region of Central Africa, although their true distribution range remains largely unknown [9,17,18]. Anopheles somalicus is not involved in malaria transmission because of its preference for non-human blood and exophilic habits [2,19]. A comprehensive study in Cameroon confirmed that An. nili s.s. is the major malaria vector of the group and emphasized the exophagic behavior of An. ovengensis and An. carnevalei [20]. In Equatorial Guinea, sporozoïte rates in An. ovengensis can reach 4.1% (n = 74), which is higher than that of An. gambiae in the same area (3.3%, n = 603, [17]), confirming its major role in the epidemiology of malaria. Moreover, recent studies found the circulation of various Plasmodium species and lineages related to P. falciparum in great apes and infection of gorillas and guenons by P. falciparum in the central forest region of Cameroon and Gabon [21][22][23][24][25]. However, the potential mosquito vectors have not been identified yet. These discoveries further bring urgency to the study of the taxonomic status of members of the An. nili group and their possible role in transmission of various Plasmodium species.
A recent study using 11 microsatellite markers demonstrated a significant genetic differentiation of the forest An. nili s.s. population of Kenge in the Democratic Republic of Congo (DRC) when compared to other populations from humid savannah of Central and West Africa [18]. Both local adaptation and geographic isolation could cause this differentiation. Extensive allele sharing between populations and homogeneity across loci suggested that enhanced genetic drift rather than selection was responsible for the observed pattern. Fluorescence in situ hybridization (FISH) mapping of the microsatellite loci on a recently developed cytogenetic map for An. nili [26,27] demonstrated physical independence of the markers, strengthening the view that enhanced genetic drift, rather than selection, was responsible for reduced variability and increased differentiation of the Kenge population. These data strongly suggest existence of ecological and geographical barriers that affect gene flow among An. nili natural populations in equatorial Africa.
Here, we used nuclear (i.e. microsatellites and rDNA) and mitochondrial DNA markers to further explore the genetic structure of members of the An. nili group in South Cameroon, where all four members of the group are known to occur [7,8,20]. We document further cryptic genetic substructure within An. nili s.s., reflecting a complex demographic history. We advocate for further evaluation of the impact of this genetic heterogeneity and its relevance for malaria epidemiology and control.

Ethics Statement
This study received the approval of the National Ethics Committee of Cameroon (Ref: FWA IRB0001954). No other specific permits were required. No location was privately-owned or protected and field studies did not involve endangered or protected species. Consent was obtained from all volunteers participating in the study, and they were monitored for signs of fever and treated for all diagnosed malaria cases.

Mosquito Collection and Identification
Mosquitoes were collected from 2007 to 2009 in eight riverside villages situated in different ecological settings in Cameroon including Afan-Essokyé, Ako, Ekelemba, Kentzou, Moloundou, Mbébé, Nkolbisson, and Nyabessan ( Figure 1, Table 1). Fieldcollected larvae and adult An. nili s.l. mosquitoes were first sorted using morphological diagnostic characters [5][6][7]28]. Adult females were stored individually in numbered tubes containing a desiccant whereas larvae were preserved in tubes containing 95% ethanol. Then, genomic DNA was extracted using a standard protocol [29] and morphological identification was confirmed by PCR [8].

Microsatellite Genotyping and Analysis
Eleven microsatellite loci developed in An. nili s.s. [27,30] were used to amplify An. carnevalei, An. ovengensis and An. somalicus DNA. Genotypes were determined for individual mosquitoes as previously described [18]. Genetic diversity at each microsatellite locus within each sample was characterized by estimates of allelic richness (Rs) [31] and unbiased expected heterozygosity (He) under Hardy-Weinberg Equilibrium (HWE), using FSAT V2.9.3.2 [32]. This program was also used to test for conformance of genotype frequencies to HWE for each locus and each sample, and to explore Linkage Disequilibrium (LD) between microsatellite loci. The frequency of null alleles at each locus within each population was determined using GENEPOP V4.0 [33,34], and the allele and genotype frequencies were then adjusted accordingly in MICROCHECKER V2.2.3 [35]. The null allele adjusted dataset was compared to the original dataset to explore the effect of null alleles on estimates of genetic differentiation.
Heterozygosity tests were implemented in BOTTLENECK V1.0.02 [36] in order to assess deviation from Mutation-Drift Equilibrium (MDE). The tests were performed using the Stepwise Mutation Model (SMM) and the Two Phase Model (TPM), with 10-30% indels larger than one repeat unit. Statistical significance of the tests was assessed for each sample across all loci by the Wilcoxon signed-rank test available in BOTTLENECK.
Bayesian clustering analysis was implemented in STRUC-TURE V2.2 [37] to estimate the most probable number of discrete populations or genetic clusters (K) in the dataset, while minimizing Hardy-Weinberg and gametic phase disequilibrium between loci within clusters, with no a priori assumptions of population structure. The admixture model, which assumes that a fraction of the genome of each individual can come from one of each population, was used. Allele frequencies were allowed to be correlated assuming that the populations diverged from a single ancestral population at some point in the past, and the differences in their allele frequencies are the result of drift that has occurred since their divergence [37]. The method of Evanno et al. [38] was used to determine the most likely number of clusters. Differentiation among clusters was examined with Wright F-statistics [39] computed according to Weir and Cockerham [40] using FSTAT. For the calculation of Fst, monomorphic loci were excluded from the analysis and significance tests were conducted using a randomization approach as implemented in FSTAT.
The number of putative migrants (Nm) per generation between populations was estimated from pairwise Fst [41]. The contribution of geographical distance to the pattern of differentiation observed was investigated by the regression of Fst/(1-Fst) on geographic distance, assuming migration in one direction (i.e. along hydrographic networks) [42]. The significance of this regression was tested using a mantel test available in GENEPOP [43].The Bonferroni correction [44] was used to adjust the nominal significance level whenever multiple tests were performed.

DNA Sequencing and Phylogenetic Analysis
Sequencing was performed using the primers set and protocol described in Ndo et al. [18]. Ten specimens per geographical sample (but twenty from Moloundou) among those used for microsatellite analysis were sequenced for the Internal Transcribed Spacer 2 (ITS2) and domain-3 (D3) of nuclear 28S ribosomal DNA, and for Cytochrome Oxydase subunit II (COII) and NADH deshydrogenase subunit IV (ND4) of mitochondrial DNA.
Basic sequence polymorphism statistics and various population parameters were computed using DnaSP V5.10.01 [45] and MEGA V5.05 [46]. Genetic diversity was measured by assessing haplotype diversity (hd) and nucleotide diversity (p). MEGA was also used to compute genetic distances between haplotypes and/or populations [47]. The rate of the ITS2 clock of 2.2% per million years (myr), as assessed within the Drosophila melanogaster and D. virilis groups [48], was used to estimate divergence time between sequences. For mtDNA sequences, the conventional mtDNA rate of 2% per myr was used [49]. The Neighbour-Joining (NJ) method [50] performed in MEGA was used for the construction of phylogenetic trees, which were rooted using An. moucheti sequences ( [63]; Ndo, unpublished) as an outgroup.

Samples Composition
All mosquitoes from Ako, Ekelemba, Kentzou, Moloundou and Nkolbisson were morphologically and molecularly identified as An. nili s.s., although some specimens from Moloundou and Ekelemba failed to amplify even after repeated attempts, possibly due to poor DNA quality. All individuals from Nyabessan, Afan-Essokyé and Mbébé were An. ovengensis, An. carnevalei and An. somalicus respectively, but An. nili s.s is also known to be present in the two later localities [9]. We never collected An. somalicus on human baits owing to its zoophilic behavior. Similarly, attempts to collect An. nili s.s. on human baits in Kentzou were unsuccessful suggesting possible zoophilic behavior of the species in this locality.

Microsatellite Analysis
Genetic variability. A total of 363 individual mosquitoes were genotyped and there were marked differences in amplification profiles and polymorphism of loci among species, as well as between An. nili s.s. populations ( Table 2). All 11 loci readily amplified and were polymorphic in An. nili s.s. samples from Ako and Nkolbisson, whereas only 10 (90.9%) successfully amplified in Kentzou (9 polymorphic) and Moloundou (9 polymorphic), falling down to 8 (73%) in Ekelemba (with only 5 loci being polymorphic). Similarly, only 9 (82%) loci successfully amplified An. carnevalei and An. ovengensis DNA, among which 7 and 6 respectively were polymorphic. In An. somalicus, 8 (73%) loci were scored successfully, with only 5 showing more than one allele (Table 2). These differences in locus amplification and polymorphism resulted in high genetic variability within the An. nili s.s. populations from Ako and Nkolbisson (Rs: 8.33-8.93, He: 0.833-0.857), whereas polymorphism was much lower in the other samples (Rs: 1.77-6.46, He: 0.231-0.659).

Hardy-Weinberg Equilibrium and Linkage Disequilibrium
Only polymorphic loci were considered for HWE and LD analyses. A total of 21 (33.33%) tests out of 63 were found out of HWE (P,0.001) after Bonferroni correction for multiple testing ( Table 2). Within An. nili s.s., the sample from Moloundou had the highest number of deviations. In fact, all 9 polymorphic loci were out of HWE, associated with positive Fis values suggesting population substructure, although null alleles could also account for some deviation. By contrast, no more than 2 significant deviations were observed in all other An. nili s.s. samples. The number of significant deviations from HWE detected in the other members of the group varied from 1 (out of 5 polymorphic loci) in An. somalicus to 3 (out of 7 polymorphic loci) in An. carnevalei ( Table 2). The frequency of null alleles at those loci showing significant deviation from HWE ranged from 8.73 to 41.65% (Table S1), suggesting mutations in the primer-annealing sequences were probably involved. Nevertheless, nulls alleles did not significantly bias our subsequent interpretation of populations' differentiation.
Exact tests for LD within each An. nili s.s. population resulted in 5 (2.6%) significant values (P,0.05) out of 192 comparisons after correction by the Bonferroni procedure. All these deviations were detected in Moloundou where all polymorphic loci were found out of HWE, and involved loci 1D80 and F41, 2Ateta and F41, B115 and F41, 2Ateta and 1A27, and B115 and F56. Because all these loci are located on different chromosomal arms [27], this result further supports HWE analysis in pointing towards splitting of the gene pool in the Moloundou population (i.e. the Wahlund's effect). No significant LD between loci was detected in An. carnevalei, An. ovengensis and An. somalicus populations.

Genetic Differentiation and Gene Flow
The Bayesian clustering analysis using the software STRUC-TURE was implemented in the expectation that the genotypes would segregate into four main clusters corresponding to the four previously recognized species within the An. nili group. The optimal number of clusters detected based on the method of Evano et al. [38] was K = 8 ( Figure 1). Anopheles carnevalei, An. somalicus and An. ovengensis samples each formed a distinct and homogeneous genetic cluster, while five clusters were defined within An. nili s.s. Specimens sampled in savannah (Ako) and degraded forest (Nkolbisson) areas were merged together into a single genetic cluster, whereas four clusters were detected among populations sampled within the evergreen forest block: one in  Kentzou, one in Ekelemba and two in Moloundou, namely 'Moloundou A' and 'Moloundou B' in subsequent analyses. High levels of genetic differentiation were recorded between species. The overall Fst estimate across all loci was 0.373 and was highly significant (P,0.0001). Mean pairwise Fst over all loci varied from 0.201 to 0.659 (P,0.0001) with corresponding Nm values in the range of 0.14 to 1.16, suggesting very restricted gene flow between species (Table 3).
No significant correlation was detected between genetic and geographic distances (R 2 = 0.0178; P = 0.304) and there was no indication of recent population contraction in any of the populations tested (data not shown).

DNA Sequencing Analysis
rDNA polymorphism and divergence. The D3 domain of the 28S rDNA was amplified in 90 specimens (10 from each geographic population, and 20 from Moloundou) and the alignment length, including insertions/deletions (indels), was 382 bp ( Figure S1). No sequence polymorphism (p = 0) was detected within each geographical population, except in Moloundou where two highly divergent sequence variants were observed (p = 0.034), corresponding to the Moloundou A and Moloundou B microsatellites clusters. Haplotypes detected in An. carnevalei, An. ovengensis and An. somalicus populations, and in An. nili s.s. from Ako and Nkolbisson were identical to those published by Kengne et al. [8] and Ndo et al. [18] (EMBL accession numbers: AJ429053 for An. nili s.s., AJ429052 for An. ovengensis, AJ438690 for An. somalicus and AJ429051 for An. carnevalei). The four remaining An. nili s.s. haplotypes were new and found exclusively in deep forest populations (GenBank accession numbers: KC189970-KC189973). They differed from one another, and from the deposited An. nili s.s. haplotype by 13 to 16 fixed nucleotide substitutions and many indels. Genetic distances between species' haplotypes ranged from 0.3 to 5%. Similar genetic distances were also estimated between An. nili s.s. haplotypes, ranging from 0.5 to 5.9% (Table S2).
Likewise, there was no intra-population variation for the ITS2 region and the alignment length was 480 bp for An. carnevalei, 503 bp for An. ovengensis, 513 bp for An. somalicus, 451 bp for An. nili s.s. from Ako and Nkolbisson, 445 bp for An. nili s.s. from Kentzou and Moloundou A and 480 bp for sequences from Ekelemba and Moloundou B ( Figure S2). ITS2 sequences also segregated into eight haplotypes, in straight correspondence with the 8 microsatellite clusters and the D3 sequence analysis. Four of these haplotypes were identical to those described earlier (EMBL accession numbers: AJ429048 for An. nili s.s., AJ429049 for An. ovengensis, AJ438689 for An. somalicus and AJ429050 for An. carnevalei) [8,18]. The four new haplotypes were also observed in the deep forest An. nili s.s. populations (GenBank accession numbers: KC189966-KC189969). They differed from one another, and from the deposited An. nili s.s. haplotype by 30 to 95 nucleotide substitutions and many indels, suggesting high genetic divergence ( Figure S2). Genetic distances between species   (Table  S2). The topologies of the D3 and ITS2 phylogenetic trees were congruent, showing An. nili group to be monophyletic although relationship between species changed according to the marker. All the trees revealed three major groups among Anopheles nili s.s. specimens analyzed: 1) An. nili s.s. from savannah/degraded forest (Ako and Nkolbisson), 2) An. nili s.s. from deep forest group 1 (Kentzou and Moloundou A) and 3) An. nili s.s. from deep forest group 2 (Ekelemba and Moloundou B) (Figures 2 and 3). Interestingly, the two forest clusters appeared more closely related to An. ovengensis and other members of the group, than to An. nili s.s. from savannah/degraded forest areas.

Mitochondrial DNA Polymorphism and Divergence
Mitochondrial DNA sequences were determined for a minimum of 81 individual mosquitoes and summary statistics for genetic polymorphism are given in Table 4. Alignment length was 603 bp and 319 bp for the COII and ND4, respectively. Variations among sequences only included base substitutions. Absence of indels and heterozygous sites therefore suggests that there was no nuclear transposed copy.
In contrast to rDNA data, there was no strong support for 8 distinct genetic clusters, although the existence of distinct genetic lineages within An. nili s.s. populations was confirmed. All the phylogenetic trees were congruent, showing An. nili group as monophyletic, with six clades: 1-An. carnevalei, 2-An. ovengensis, 3-An. somalicus, 4-An. nili s.s. from Ako and Nkolbssion 5-An. nili s.s. from Kentzou and Mouloundou A (deep forest group 1), and 6-An. nili s.s. from Ekelemba and Moloundou B (deep forest group 2) (Figures 4 and 5). However, the relationship between these clades was not clear as their position changed according to the gene tree. As observed above, An. nili s.s. specimens sampled in the deep forest appeared more closely related to An. ovengensis than to savannah/degraded forest populations of An. nili s.s.
Sequence divergence between consensus haplotypes from each population ranged from 0.4 to 6.2% and from 0.5 to 11.2% for COII and ND4 genes, respectively (Table S3). Because the conventional mtDNA rate of molecular evolution of 2% per myr is used exclusively for lineages separated by less than 8% sequence divergence, only COII divergence time was estimated and found to vary approximately between 0.2 and 3 myr. All new mtDNA sequences have been deposited to GenBank (Accession numbers: KC189974 -KC190023).

Discussion
Genotypes at 11 microsatellite loci and sequence variation in two rDNA regions (D3, ITS2) and two mtDNA genes (COII, ND4) were used to assess the level of genetic variability and differentiation among species of the An. nili group, and between An. nili s.s. populations from different ecological settings in South Cameroon. All the markers provided support for the current classification in the group and revealed further cryptic genetic diversity within the major vector An. nili s.s., despite morphological similarities.
Heterologous microsatellite loci are commonly used across many insect species, particularly in Anopheles [51][52][53][54]. In fact, the use of primers originally designed to amplify DNA from a species in a different one can save time and resources, especially for microsatellite loci. However, depending on the genetic distance between species, microsatellite loci could either readily amplify, fail to amplify or show low polymorphism due to the presence of mismatches in priming sites within flanking sequences or absence of nucleotide repeat units variability [52,[55][56][57]. Therefore, it is advocated to screen heterologous loci in few samples to detect suitable markers, before their use in large population genetic studies.
In the present study, 11 An. nili s.s. loci were screened for use in An. carnevalei, An. somalicus and An. ovengensis and up to 70% crossamplified in all species. This provides a clear indication that the four species are closely related, since the probability of crossamplification of microsatellite loci is inversely related with genetic distance between taxa [55,56]. However, the large differences in microsatellite allele frequencies distribution and sequence profiles, resulting in high pairwise genetic differentiation indices and low migration rates fully support speciation between An. nili s.s., An. carnevalei, An. somalicus and An. ovengensis, in agreement with previous studies using allozyme and rDNA markers [8,58]. Besides, although a high proportion of An. nili s.s. loci amplified in the sibling species of the group, only 6 in both An. carnevalei and An. ovengensis, and 5 in An. somalicus showed the qualities that make them suitable genetic markers (e.g., lack of null alleles, high genetic variability, conformity to HWE, statistical independence) and could be used for population genetic studies in these species.
Previously unrecognized cryptic genetic diversity not correlated to geographical distance between sampling sites was evidenced within the nominal species of the group An. nili s.s., by detecting highly divergent lineages in the deep equatorial forest, despite morphological similarities of all specimens analyzed. This genetic substructure was detected by a range of molecular markers with different evolutionary dynamics, providing strong support for phylogenetic inferences. Indeed, rDNA haplotypes found in these populations differed from one another, and from the An. nili s.s. haplotypes from savannah/degraded forest areas by a large number of fixed mutations and indels leading to genetic distance estimates 4 to 8 fold higher than those commonly reported among cryptic Anopheles species [8,[59][60][61][62][63][64]. According to accepted rates of molecular evolution of 2.2% per myr for ITS2 [48] and 2% per myr for mtDNA [49], we estimated the divergence time of the new molecular forms (lineages) at about 0.8 to 6 Myr or 0.2 to 3 Myr, respectively. This high genetic divergence within the An. nili group suggests that these taxa might actually belong to distinct species groups. However, these results generated using a small sample size and on a limited sampling range should be considered as preliminary. Extending the study area throughout the equatorial forest block of Central Africa will allow improving knowledge on the taxonomic status and the geographic distribution of forest An. nili s.l. populations, their ecology and behaviour, and their epidemiological role in malaria parasite transmission. The evergreen forest block of Central Africa is known to be of low overall quality for the development of An. nili s.s. which is replaced in this area by the other members of the group, namely An. carnevalei and An. ovengensis [9,18,65]. Anopheles nili s.s. is rather more frequent at the savannah/degraded forest ecotone, where it is characterized by shallow population structure with a weak effect of geographical distance on genetic differentiation [18]. Violation of MDE in deep forest populations suggests unstable demographic history for the species in the evergreen forest block of Central Africa. The low dispersal of An. nili s.s. in this environment probably restricts opportunities for genetic exchange between geographically isolated populations, prompting for genetic divergence under the action of increased genetic drift [18,66]. This is in agreement with the low genetic variability observed in forest populations suggesting small effective population size, coupled with high levels of genetic differentiation corresponding to weak migration index estimated between populations. Moreover, the fact that many of the microsatellite markers that were polymorphic in samples from Ako and Nkolbisson did not amplify at all or were not polymorphic (i.e. only one allele could be detected) in deep forest populations is also consistent with the action of increased genetic drift in the later populations. Finally, by demonstrating physical independence between markers, recent mapping of microsatellite loci on the chromosomal arms of An. nili provided further support for a major role of genetic drift in shaping the population genetic structure of An. nili in Central Africa, reflected in genome-wide differentiation patterns [27]. The development of additional molecular markers is now needed to allow more finegrained analysis of the level of genetic polymorphism and its distribution along the genome of An. nili in search for signatures of selection in ecologically-relevant genes or genomic regions. Indeed, original information gained on the genetic structure, gene flow and species diversity within this major group of malaria vectors can further be used to explore global processes such as the genetics of ecological adaptation, speciation and susceptibility to Plasmodium, within a comparative framework that will use and complement information available for other major human malaria vectors such as mosquitoes from the An. gambiae complex and the An. funestus group.
Moreover, the recent findings of circulation of P. falciparum along with other Plasmodium species in great apes in Central African forest region [24,25] raise concerns about pathogen transfer between humans and primates and highlight the need to improve knowledge of forest malaria vectors. Although it is not yet known whether the gorilla carriers of P. falciparum acquired the parasite from humans (or from other primates), nor the pathogenicity of these infections in gorillas, the continuously increasing contact between humans and primate populations, mostly due to logging and deforestation, increases opportunities for transmission of new pathogens from primates to humans and vice versa [25]. The possibility that forest mosquitoes such as members of the An. nili group might contribute to the emergence of zoonosis in humans needs to be assessed.