Population Structure of the Malaria Vector Anopheles sinensis (Diptera: Culicidae) in China: Two Gene Pools Inferred by Microsatellites

Background Anopheles sinensis is a competent malaria vector in China. An understanding of vector population structure is important to the vector-based malaria control programs. However, there is no adequate data of A. sinensis population genetics available yet. Methodology/Principal Findings This study used 5 microsatellite loci to estimate population genetic diversity, genetic differentiation and demographic history of A. sinensis from 14 representative localities in China. All 5 microsatellite loci were highly polymorphic across populations, with high allelic richness and heterozygosity. Hardy–Weinberg disequilibrium was found in 12 populations associated with heterozygote deficits, which was likely caused by the presence of null allele and the Wahlund effect. Bayesian clustering analysis revealed two gene pools, grouping samples into two population clusters; one includes six and the other includes eight populations. Out of 14 samples, six samples were mixed with individuals from both gene pools, indicating the coexistence of two genetic units in the areas sampled. The overall differentiation between two genetic pools was moderate (F ST = 0.156). Pairwise differentiation between populations were lower within clusters (F ST = 0.008–0.028 in cluster I and F ST = 0.004–0.048 in cluster II) than between clusters (F ST = 0.120–0.201). A reduced gene flow (Nm = 1–1.7) was detected between clusters. No evidence of isolation by distance was detected among populations neither within nor between the two clusters. There are differences in effective population size (Ne = 14.3-infinite) across sampled populations. Conclusions/Significance Two genetic pools with moderate genetic differentiation were identified in the A. sinensis populations in China. The population divergence was not correlated with geographic distance or barrier in the range. Variable effective population size and other demographic effects of historical population perturbations could be the factors affecting the population differentiation. The structured populations may limit the migration of genes under pressures/selections, such as insecticides and immune genes against malaria.


Introduction
Anopheles sinensis Wiedemann 1828 is a widely distributed Oriental species [1,2,3]. In China, A. sinensis was incriminated as a competent malaria vector and was responsible for the transmission during the recurrence of vivax malaria in recent years [4]. Genetically based methods have been proposed for malaria vector control. These methods focus mainly in altering vectorial capacity through the genetic modification of natural vector populations by means of introducing refractoriness genes or by sterile insect technologies [5]. Knowledge of the genetic structure of vector species is, therefore, an essential requirement as it should contribute not only to predict the spread of genes of interest, such as insecticide resistance or refractory genes, but also to identify heterogeneities in disease transmission due to distinct vector populations [6]. A complete understanding of vector population structure and the processes responsible for the distribution of differentiation is important to vector-based malaria control programs and for identifying heterogeneity in disease transmission as a result of discrete vector populations [7]. Susceptibility to Plasmodium infection, survival and reproductive rates, degree of anthropophily, and the epidemiology of malaria in the human host may all be affected by genetic variation in vector populations [8].
A. sinensis exhibits variation in ecology [9], morphology [9,10], chromosomes [9,11], isozymes [9], mtDNA [12], random amplified polymorphic DNA (RAPDs) [13], and rDNA second internal transcribed spacer (ITS2) sequences [14]. Cytogenetic studies have revealed two karyotypic forms, A (XY1) and B (XY2), in A. sinensis [11], which have distinct ITS2 sequences [14]. Both forms exist in Thailand [11], and only form B occurs in China and Korea [14,15]. The susceptibility to malaria varies in different geographic areas. In Thailand, wild A. sinensis was poorly susceptible to Plasmodium vivax [16], so were the laboratory lines of forms A and B [17]. In China, A. sinensis is more susceptible to P. vivax than to P. falciparum, and therefore it is an important vector in the areas where no other vector species exist [22,23]. In Korea, A. sinensis was incriminated as a competent malaria vector [18,19] and was responsible for the transmission during the recurrence of vivax malaria in recent years [20,21]. In Japan, due to its abundance A. sinensis has long been suspected to be the most important vector of malaria in temperate Japan, including Okinawa and Hokkaido Islands [2,3].
Despite its significance in malaria transmission, only a few studies on population genetics have been conducted [12,13]. Microsatellites are highly polymorphic genetic markers that evolve much faster than mitochondrial or nuclear genes, and are particularly useful for resolving the structure of populations at a finer geographical and evolutionary scale. They have been extensively used for population studies of anophelines, such as A. darlingi [24,25,26], A. moucheti [27], A. gambiae [28] and A. nili [29]. Microsatellite DNA markers have been isolated from A. sinenesis [30,31]. In this study, we have used microsatellite markers to estimate levels of genetic differentiation among populations of A. sinensis to determine the population structure across its range in China. Figure 1. A schematic map of China showing sampling sites for A. sinensis. The population genetic affiliation to the two clusters in each locality was displayed by a pie chart with black as cluster I and white as cluster II (see Table 3 for details). doi:10.1371/journal.pone.0022219.g001

Population sampling and species identification
Fourteen samples of wild mosquitoes were collected from 20 locations in China ( Figure 1, Table 1). A total of 327 female A. sinensis were identified by a species diagnostic PCR assay [32]. Five samples, YUN, HUB, LIA, SHD and SIC, consisted of specimen pooled from two or three collections, as stated in Table 1.

Genetic variability within populations
Polymorphism at five microsatellite loci varied, with number of alleles (A) as 12 (ANS15), 17 (ANS1 and ANS6), 18 (ANS11) and 19 (ANS5), respectively ( Table 2). The average number of alleles per locus was in a range between 6.5 (ANS5) and 10.4 (ANS11). The minimum mean number of alleles of all loci was in LIA population (6.2), and the maximum in YUN (10.2). Allele distributions across populations were depicted in Figure S1. The average observed heterozygosity (H O ) across all samples ranged from 0.398 (ANS5) to 0.757 (ANS11), the minimum Ho was in GUD (0.433), the maximum Ho in SIC (0.760). To determine if the null alleles impacted the population genetic analyses, we performed these analyses both before and after the dataset were adjusted for estimated null allele frequencies. The effect of this treatment was minimal and did not significantly change the degree or statistical significance of the estimated parameters.
The Hardy-Weinberg exact tests were performed for five loci. No locus was in Hardy-Weinberg equilibrium (HWE) for all the samples assayed. At the population level, 26 out of 70 (37.14%) comparisons did not conform to Hardy-Weinberg expectations after sequential Bonferroni correction, and the deviations were associated with positive inbreeding coefficient (F IS ), reflecting heterozygosity deficits (Table 2). Significant deviation from HWE varied across loci in a population -dependent manner. The YUN, HEN, GUD and GZH populations had the highest number of loci in departure from HWE (4 of 5), while the SHD, SHX, LIA, HAN and SIC populations had the fewest (1 of 5). The FUJ, GXI and JSU populations had no loci in departure from HWE (Table 2). In all samples, some specimen failed to amplify at one locus while succeeded at the remaining loci, suggesting the presence of null alleles. Estimates of the frequency of null alleles are given in Table 2. The locus ANS15, for example, showed both high F IS values and high frequencies of null alleles in samples CHQ, GUD, HEN and HUB.
Fisher's exact tests were conducted for linkage disequilibrium (LD) within each of the 14 collections. Out of 140 comparisons only five pairs (3.57%) were at LD (P,0.05). Two pairs were detected in HAN (ANS5/ANS15, ANS1/ANS5) and the other three pairs were in SHD (ANS6/ANS15), HUB (ANS1/ANS6), GUD (ANS6/ANS11), respectively ( Table 2). No pair of loci appeared in LD in more than one population, suggesting genetic independence between loci. When the test was performed in the pooled populations, no pairs of loci out of 10 possible combinations showed significant P values (P.0.05).

Genetic differentiation among populations
The significant deviations from HWE with heterozygote deficiency and the presence of LD suggest the presence of population subdivision within samples (the Wahlund effect). We therefore examined if there were different gene pools in these samples. The Bayesian cluster analysis divided populations into two main subgroups (posterior probability of Bayesian clustering Ln(D) likelihood score optimal for k = 2 clusters) ( Figure 2). One gene pool (cluster I) was composed of GZH, HUB, SHD, HEN, GUD and YUN, and the other (cluster II) was composed of LIA, CHQ, FUJ, GXI, HAN, JSU, SIC and SHX. Allele composition varied to a limited extent among populations within the clusters but varied considerably between the clusters. For example, 3 alleles at locus ANS1, 4 alleles at ANS5 and 3 alleles at ANS6 were differentially distributed among populations between the clusters ( Figure S2). In different localities, there are different proportions of individuals from different gene pools (Table 3). For example, in the HEN 70% of individuals were assigned to the cluster I and the remaining 30% to the cluster II; an opposite occurred in CHQ, in which 69.8% of individuals belong to the cluster II and 30.2% to the cluster I. In total, six samples (three in each cluster) were mixed with at least 15% of individuals assigned to the other cluster ( Figure 1 and Table 3), indicating the coexistence of two gene pools in these localities.
Overall genetic divergence between two gene pools was assessed. First, specimens were assigned to the cluster I or II at greater than 80% probabilities, which resulted in 126 individuals in the cluster I and 135 individuals in the cluster II. Then the two pools of individuals were analyzed by the Wright's F statistics ( Table 4). The overall value of F IT (0.379) showed significant heterozygote deficits at the total population level, most likely due to the presence of null alleles. Individually, three loci, ASN1, ASN6 and ASN11, presented significant heterozygote reduction at the population level (P,0.05). The average value of F IS (0.264) showed heterozygote reduction at the subpopulation level. These results corroborated the homozygote excess detected with the individual HWE tests ( Table 2). Locus ANS5 had the highest F ST value (0.545) while other loci ranged between 0.020-0.086. The average F ST = 0.156, indicating a moderate genetic heterogeneity between the two gene pools. An AMOVA analysis using the two clusters found that 15.6% of the variance was attributed to between populations and 84.4% to within populations. Then we tested genetic heterogeneity among populations within and between the two gene pools. We chose populations in which more than 85% individuals were assigned to the cluster I or II for the analysis. The resultant populations included three (GZH, HUB and SHD) from the cluster I and five (LIA, FUJ, GXI, HAN and JSU) from the cluster II. Table 5 Table 5).
Tests of isolation by distance were performed for each population cluster and for all of the populations together. No statistically significant correlations were detected between genetic differentiation and geographic distances based on the Mantel test in all cases ( Figure 3). The results suggest that geographic distance does not significantly contribute to the genetic differentiation observed in A. sinensis populations.

Effective population size and demographic stability
Estimates of expected heterozygosity under mutation-drift equilibrium (MDE) were calculated for detecting demographic instability. These statistics are expected to be equal in a neutral locus at MDE. Results of the heterozygosity tests (Table 6) did not reveal any evidence for departure from MDE in two models (SMM and TMP) in any of the 14 populations. However, a consistent trend for lower-than-expected heterozygosity (i.e., He,Heq) was detected in the GXI and SIC populations under SMM model (P,0.05), suggesting a recent demographic expansion.
Estimates of long-term Ne varied considerably depending on the model used. Under the heterozygote excess model all of the Ne estimates were infinity. Under the linkage disequilibrium model, diverse Ne values were detected across populations ( Table 7). The two clusters showed similar variability of Ne, from ' (CHQ) to 18.4 (SHD) in the cluster I and ' (HUB) to 14.2 (LIA) in the cluster II, respectively.

Discussion
Sampling strategy and geographic coverage greatly influence the analysis and interpretation of the data generated from the samples. A. sinensis occur in most parts of China, a range as from 100u E to 120u E, and from 19u N to 54u N [22]. In this study, A. sinensis mosquitoes were collected from most localities across its range in China. The LIA was at the most northern limit, and HAN was at the most southern limit of the distribution.
The five microsatellites in A. sinensis [30] are highly polymorphic in the populations, and thus are useful for exploring A. sinensis population genetic structure. Theses microsatellite loci have not been physically mapped to A. sinensis polygene chromosomes. Therefore, their location with respect to polymorphic chromosome forms is unknown, but linkage disequilibrium between loci was detected only in 3.6% of 140 comparisons suggests that they are at least statistically independent and might distribute across the genome. The high allelic diversity and expected heterozygosity were observed in most of populations, which was similar to the level of diversity in the A. darlingi in Peru and Brazil (H O = 0.742, 0.457) [24], A. albimanus in Latin America (H O = 0.73) [33], African vectors A. gambiae (H O = 0.59) [34] and A. funestus (H O = 0.672, 0.529) [35,36]. In China, A. sinensis occurs in the temperate climate zones, populations undergo marked seasonal variations in abundance, reaching high densities only during the summer months. The high level of genetic diversity suggests that A. sinensis are able to maintain a relatively large effective population size in spite of the seasonality of low temperature in cold winter.
In this study, significant deviations from HWE due to the heterozygote deficits were detected in most samples. These could be attributed to the Wahlund effect, inbreeding, selection or null alleles. Selection was not considered because it usually occurs in one locus, not systemically in multiple loci as detected in this study. Inbreeding has genome-wide effect. If inbreeding were important in our case, we would expect a similar heterozygosity deficiencies in all markers with each population studied, which we did not detect. The reduction of heterozygotes at several loci detected in this study is likely the results of presence of null alleles that are not amplified because of the mutations at the primer annealing sites. In this study, there were specimens that failed to amplify some alleles at certain loci, but efficiently amplified at other loci. The Wahlund effect, referring to subpopulations in a sample, is likely another cause for the heterozygote deficits. Indeed, the Bayesian analysis revealed two gene pools across the locations sampled (Figure 2), and the two gene pools coexist in at least six collections (Table 3). Thus, a part of the heterozygote deficits detected in these samples could be the result of the Wahlund effect.
The two genetic divisions of A. sinensis were represented by two population clusters. There is substantial differentiation between two gene pools, demonstrated by mean F ST = 0.156 between the two clusters (Table 4) and pairwise comparisons (F ST = 0.120-0.201) among populations between the two clusters ( Table 5). The gene flow was remarkably limited between the clusters (Nm = 1-1.7). The level of differentiation is comparable to that detected previously between A. sinensis populations in China using isozymes and RAPD markers (F ST = 0.069-0.111) [13,28], between A. albimanus populations from Central and South America (F ST = 0.114) [33], among A. gambiae populations in west Africa separated by .200 km (F ST = 0.034-0.167) [37], and among A. darlingi populations from Brazil and Peru [24] as well as between A. funestus populations from west, central, and eastern Africa (F ST = 0.110) [7]. The distribution of two clusters appears no noticeable geographic patterns, and no correlation between genetic and geographic distance was detected by the Mantel test (Figure 3). In addition, the sympatric occurrence of two gene pools was found in 6 of 14 collections (Table 3). Therefore, the differentiation between the two population clusters probably was not influenced by geographic distance and barriers (e.g., Yangtze River, Yellow River and Qinling Mountains).
Within the population clusters, pairwise differentiation was little or low (F ST = 0.004-0.048). No isolation by distance was detected ( Figure 3). However, a large amount of variability in Ne among the populations (14.4-' in the cluster I and 18.4-' in the cluster II, Table 7) suggests that ecological and/or historical heterogeneity may contribute to the differentiation observed among these populations. Similarly, Ne heterogeneity has been reported to be associated with the population differentiation for A. darling [24] and A. albimanus [38]. In the cluster II, there is some evidence of a population expansion in SIC and GXI (Table 7). Taken together, the population diversity in the form of two clusters may be caused largely by the differences in Ne among the populations and/or different demographic [8].
This study revealed that A. sinensis populations across China are primarily structured with two major genetic units. The distribution pattern could not be attributed to the isolation by distance. The wide range and sympatric distribution of the two genetic pools suggest that these two gene pools may have been segregating in the A. sinensis populations for a relative long time. Our findings emphasize the need for further investigation with deeper sampling (especially in the areas both gene pools exist) and more genetic loci in order to thoroughly elucidate the forces that shape and maintain the population structure. More studies are required to characterize the two gene pools of A. sinensis regarding ecology and malaria susceptibility. The population structure of A. sinensis implies that the expansion and spread of genes responsible for immunity against malaria or insecticide resistance would be easier within than between the genetic units. Moreover, the estimate of effective population size would help to evaluate the effectiveness of mosquito control measurements [8].

Mosquito collections and species identification
Wild adult A. sinensis were collected from 1996 to 2008, by using indoor light traps at livestock corrals. Human landing catches at human living room were tried in 1996 in Sichuan (SIC) and Hainan (HAN). Only two specimens were caught by human bait in SIC. A. sinensis is zoophilic, therefore, the human bait was no longer used in later collections. The 20 collection sites in 14 provinces in China were located from 109u589 N to 120u259 N, and 23u609 E to 31u449 E (Table 1, Figure 1). Five samples, YUN, HUB, LIA, SHD and SIC, consisted of specimens pooled from two or three sites in proximity to each other, as stated in Table 1. The distances between sites were 25-100 km.
Mosquitoes of A. hyrcanus group were identified by morphology using the identification keys of Lu et al. [22]. Specimens were kept individually in silica gel filled tubes at 4uC, until DNA extraction was performed according to Collins et al. [39]. A. sinensis species identification was done by a PCR assay based on ribosomal DNA ITS2 markers previously described in Ma et al. [32].

Genotyping and data analysis
Five microsatellite loci, ANS1, ANS5, ANS6, ANS11 and ANS15 [30], were used for genotyping. Each locus was amplified by PCR using fluorescently labeled (FAM, NED, or HEX) forward primers. Amplified fragments were separated by capillary electrophoresis in an automatic sequencer (ABI 3770, Applied Biosystems, Foster City, CA) and size scored using GENOTYPER 3.7 software (Applied Biosystems, Foster City, CA).
Genetic diversity within samples and overall was measured at each locus by estimates of allele frequencies, number of alleles A, allele richness Rs, inbreeding coefficient F IS , expected heterozygosity H E , and observed heterozygosity Ho [40], using the software FSTAT 2.9.3.2 [41]. Within each locality the frequency of null alleles was determined using the Brookfield 2 estimate [42], and the allele and genotype frequencies were then adjusted accordingly in MICRO-CHECKER 2.2.3 [43]. The null allele-adjusted dataset was compared to the original dataset to investigate the impact of null alleles on estimations of genetic differentiation. Genotypic frequencies were tested against Hardy-Weinberg equilibrium (HWE) for each locus in the pooled population and in each sample. Statistical significance was assessed by the exact probability test available in GENEPOP 3.2 [44]. Linkage disequilibrium between loci was tested by exact tests on contingency tables, also available in GENEPOP.
Genetic differentiation was estimated by calculating Wright's F statistics (F ST , F IS , F IT ) values per locus between clusters and pairs of populations using ARLEQUIN 2.001 [45] and GENEPOP. The number of migrants per population per generation (Nm) between localities was estimated from pairwise F ST [46]. An analysis of molecular variance (AMOVA) was used to examine the distribution of genetic variation in Arlequin using F ST . We focused on estimates of F ST performed under the infinite alleles model (IAM) because this model is considered more reliable when fewer than 20 microsatellites are used [47]. The significance for all calculations was assessed by 10,000 permutations and the P-values. The isolation by distance model was investigated as a potential explanation for the observed population differentiation. The significance of the regression of genetic differences on geographic distance between sample pairs was tested using a Mantel test [48] with 100,000 permutations using GENEPOP.
A Bayesian approach was used to infer the number of clusters (K) in the data set without prior information of the sampling locations, implemented with STRUCTURE 2.2 [49]. A model where the allele frequencies were correlated within populations was assumed (l was set at 1, the default value). The software was run with the option of admixture, allowing for some mixed ancestry within individuals, and a was allowed to vary. Twenty independent runs were done for each value of K (K = 1 to 9), with a burn-in period of 100,000 iterations and 100,000 replications. The method of Evanno et al. [50] was used to determine the most likely Because demographic instability such as recent population bottleneck and/or expansion might bias genetic differentiation estimates to a significant extent [51,52], heterozygosity tests [53]  were used to detect deviations from mutation-drift equilibrium (MDE). These tests compare two estimates of expected heterozygosity, one based on allele frequencies (He), assuming Hardy-Weinberg proportions, and another based on the number of alleles and sample size (Heq), assuming MDE. At MDE, both estimates should be similar in the majority of loci analyzed (i.e. He = Heq). If a population experiences a bottleneck, rare alleles will be rapidly lost and therefore Heq will decrease faster than He (i.e. He . Heq). This apparent excess of heterozygosity in a significant number of loci is an indicator of a bottleneck, whereas the converse (i.e. He ,Heq) may indicate a population expansion. Estimates of expected heterozygosity under MDE were calculated under the Stepwise Mutation Model (SMM) and Two Phase Models (TPM) with 10%-30% indels larger than the repeat unit. Calculations were done using the software BOTTLENECK 1.2.02. [53]. The longterm effective population size (Ne) was estimated using NEESTI-MATOR 1.3 [54] based on the heterozygote excess and linkage disequilibrium models.