Identifying the Main Mosquito Species in China Based on DNA Barcoding

Mosquitoes are insects of the Diptera, Nematocera, and Culicidae families, some species of which are important disease vectors. Identifying mosquito species based on morphological characteristics is difficult, particularly the identification of specimens collected in the field as part of disease surveillance programs. Because of this difficulty, we constructed DNA barcodes of the cytochrome c oxidase subunit 1, the COI gene, for the more common mosquito species in China, including the major disease vectors. A total of 404 mosquito specimens were collected and assigned to 15 genera and 122 species and subspecies on the basis of morphological characteristics. Individuals of the same species grouped closely together in a Neighborhood-Joining tree based on COI sequence similarity, regardless of collection site. COI gene sequence divergence was approximately 30 times higher for species in the same genus than for members of the same species. Divergence in over 98% of congeneric species ranged from 2.3% to 21.8%, whereas divergence in conspecific individuals ranged from 0% to 1.67%. Cryptic species may be common and a few pseudogenes were detected.


Introduction
Approximately 41 genera and 3500 species and subspecies of mosquito exist worldwide. Although mosquitoes have been studied more extensively than most other insect groups because of their role as vectors of disease, our taxonomic knowledge of these insects is far from complete. Numerous Chinese taxonomists have worked on mosquito classification since 1932, particularly since Edwards provided the modern mosquito classification system [1]. Feng Lan-Zhou reported 100 Chinese mosquito species in 1938 [2]. This number has since then increased to approximately 390 described species and new species are still being identified, particularly within the genera Armigeres, Heizmannia, Topomyia and Uranotaenia.
Some species are vectors of medically important pathogens, such as malaria, Dengue fever and Japanese B encephalitis. Species identification therefore constitutes the first step in the surveillance and control of mosquito-borne diseases. The identification of mosquito species is mainly done on the basis of morphological characteristics. This can be problematic because diagnostic morphological features are often damaged during collection or storage, or are not present in all developmental stages. Moreover, the morphological characteristics used to identify intact adult specimens often vary so little between species that usually only experienced mosquito taxonomists are able to distinguish mosquito species reliably [3].
DNA analysis provides a more accurate way of identifying species and the use of molecular data, in combination to morphological methods, has resolved some long-standing taxonomic questions [4,5]. The increase in the number of available molecular markers has facilitated the accurate identification of mosquito species, particularly within groups of sibling species. For instance, Anopheles anthropophagus and Anopheles sinensis can be identified more simply, rapidly, and accurately using the ITS2 sequence than on the basis of morphology [6,7].
After Tautz proposed using DNA sequences as the main basis of biological classification in 2002 [8,9] Paul Hebert suggested that sequencing the COI gene could allow DNA barcoding that would facilitate such classification [10][11][12]. Many studies have since then demonstrated that the COI gene is a valid molecular tool for identifying mosquito species [13,14] and revealing cryptic species [15][16][17][18].
Although several studies on the distribution of Chinese mosquito species have been conducted using classical morphology identifying sibling and cryptic species remains problematic. Here we provide an updated classification of nearly one-third of China's mosquito species based on a combination of molecular and morphological methods.

Specimen Collection
A total of 122 mosquito species belonging to 15 genera and three subfamilies were collected from sampling sites in eight Chinese provinces ( Figure 1, Table 1). We identified mosquitoes on the basis of diagnostic morphological characteristics of their adult and larval stages and cercopoda [19], and by using molecular methods to distinguish sibling species [6,7].

Sequence Analysis
Individual species were represented by one to eight individuals giving a total of 404 COI sequences, representing 122 species and subspecies. We identified and excluded 3 pseudogenes from further analyses by only selecting sequences without insertions, deletions and stop codons. COI sequences contain a large number of A+T pairs (average of 69% for all codons), particularly at the third codon position (93.4%) (Table S1). There was, however, no G content in Orthopodomyia anopheloides and Topomyia houghtoni at the third codon. As in the case of Drosophila [20,21], this quite strong bias is apparently caused by the relative abundance of isoaccepting tRNA. All sequences contained less T in the first codon compared to the second. However, the A content of the first codon was higher than that of the second. The average R-value (transitions/transversions) was 0.7.

Neighbor-Joining (NJ) Tree
The Neighbor-Joining (NJ) tree method is conceptually related to clustering, but without the assumption of clock-like behavior [22]. COI gene fragments accurately revealed species boundaries and provided a clear phylogenetic signal (Figs. 2 and 3). Most of the major branches on the tree represent distinct taxonomic groups, including all genera and subgenera. Moreover, specimens of the same species always grouped closely together, regardless of collection site, and, except for some specimens from Hainan Island, no obvious geographic differences in sequences within the same species were found.
Combining NJ tree and bootstrap analysis is the most appropriate method for evaluating phylogenetic trees using distance methods [23]. Nodes linking sequences of individuals of the same species had a high bootstrap value (98%-99%) whereas some linking sequences of geographically different individuals had low bootstrap values (6%-99%).
Transition and transversion distances varied consistently with sequence divergence (Fig. 5). Transition distance was significantly greater than transversion distance when sequence divergence was ,2%. However, transversion distances increased slowly with sequence divergence to eventually exceed transition distances at K2P divergence of $6%. Both transition and transversion distances then decreased until K2P divergence reached about 15%. The relationship between the transversion distance, sequence divergence, and morphological characteristics are shown in Tables 2 and 3.

Accuracy of COI
The primary function of DNA barcoding is accurate species identification. We found that COI sequence differences among    congeneric mosquito species were approximately 30 times higher than the average differences within species. Moreover, more than 98% of COI fragments had clear interspecific boundaries, a result consistent with the results of other authors [13]. The average conspecific K2P divergence in this study, 0.39%, is similar to values reported for fish species in Australia [24] and slightly higher than those reported for North American birds (0.27%) [25] and moths (0.25%) [10]. It is slightly less than the K2P divergence value reported for Canadian mosquitoes (0.55%) [13].

Transversion Distance and Speciation
Mitochondrial DNA (mtDNA) functions as a molecular clock in that transversions accumulate in a linear fashion over time [26,27]. Comparison of the molecular and morphological data indicates that the number of transversions may raise to about 7 value without apparent or detectable changes in morphology. (Fig. 5). Transition distance was significantly greater than transversion distance when sequence divergence was below 2% at which level there were almost no morphological differences between specimens. At higher levels of sequence divergence transversion distances slowly increased, eventually exceeding transition distances when sequence divergence reached 6%. Morphological differences were undetectable when sequence divergence was about 2% but were distinct when this reached 6%. Transversion distances increased steadily at sequence divergence levels of 6% to 15% at which level plesiomorphy also first became evident. Plesiomorphy stabilized at sequence divergence of 15%. In addition, the vast majority of intraspecific distances occurred between sequence divergence levels of 6% and 15% whereas most intergeneric distances occurred from 15% to 20% (Fig. 4). Very few intraspecific, and no intergeneric, distances occurred between sequence divergence levels of 2% and 6%.
We found that transversion distances indicated a clear boundary between species. The transversion distance between most species was ,1.1% at sequences divergence values of less than 2%. There were, however, some exceptions; although the transversion distance between two plesiomorphous species was usually ,1.1% (Table 3), some species with anomalous intraspecific COI sequences divergences .2% (Table 2) had intraspecific transversion distances .1.1%. This suggests the presence of cryptic species, which, if confirmed, in turn suggests that transversion distances may be a useful supplement to barcoding information in species identification. Further research on the use of transversion as an additional index of taxonomic similarity is recommended.

Molecular Data Versus Morphology
Sequence divergence values of 14% to 16% were indicative of either interspecific or intergeneric differences. There are two possible reasons for this; temporary substitution saturation of the COI fragment and the limitations of morphological identification.
We found some cases of high intraspecific sequence divergence among Aedes dorsalis, Aedes vexans, Culex modestus, Tripteroides aranoides, and Toxorhynchites splendens ( Table 2). Although the degree of niche separation within these species remains unclear, this result suggests the existence of cryptic species. We also detected intraspecific sequence divergence slightly greater than the 2% threshold within Coquillettidia crassipes and Anopheles sinensis ( Table 2). Although no morphological differences within these species were observed, differences in feeding habits and habitat have been documented within Anopheles sinensis populations [19]. This, togeth-  [28]. Some cases of low interspecific sequence divergence were found among some pairs of species (Table 3), including Aedes craggi and Aedes annandalei, as well as Culex spiculosus and Culex minor. Although there is no evidence of niche separation between these species, slight morphological differences were observed. This suggests that the taxonomic status of these species should be re-confirmed. Although few doubt that mtDNA barcodes are a valuable molecular tool for matching unidentified specimens to described taxa, there has been relatively little use of barcodes to delimit species [29]. More research on rDNA, morphology, biogeography and ethology are required to improve the applicability of barcoding to species-level taxonomy.
Culex neomimulus was previously classified as Culex mimulus in the Culex mirneticus group [30]. Although our COI data supports the previous view, we found that anomalous COI sequence divergence values were relatively common in the Culex mirneticus group with some morphologically distinct specimens having similar barcodes. This could be due to infection with the Wolbachia bacteria. The maternally inherited Wolbachia bacteria causes a loss of haplotype diversity in populations by inducing a selective sweep of the initially infected individual's haplotype through a population. We detected Wolbachia infection in Culex mimulus so it's possible that this may also occur in this species. Although Smith et.al concluded that the presence of Wolbachia DNA in total genomic extracts is unlikely to compromise the accuracy of the DNA barcode library, this is a complex problem that requires further investigation [31].

Pseudogenes
The presence of pseudogenes can affect the accuracy of barcoding identification but, since their incidence was ,1%, their influence on our data was presumably small. The distinctive characteristics of the COI gene (no insertions, deletions and stop codons) allowed pseudogenes to be easily identified and excluded from the sequences we obtained. Although the leakage of paternal mtDNA may influence the results of barcoding this phenomenon is only occasionally (,0.004%) found in higher animals.
A total of three pseudogenes were detected. For instance, one of the samples of Aedes dissimilis collected from the same area exhibited high interspecific sequence (3.74%) and transversion divergence (3.00%). A total of 12 different protein sequence sites were observed, which is very rare in the Culicidae. The substitution rate at nucleotide codons 1, 2, and 3 was 1:2:2, very different to the average of 5:1:18. We also amplified the pseudogenes of Uranotaenia lutescens and Culex halifaxia, which have insertions and deletions, respectively. The sequence divergence between pseudogenes and COI fragments in Culex halifaxia was 10.93% and the substitution rate at nucleotide codons 1, 2, and 3 was 5:4:11. The divergence time formula of mtDNA and pseudogenes [32] suggests that the nuclear transfer event occurred 500 million years ago in Culex halifaxia and 170 million in Aedes dissimilis. We found an insertion site at 54 bp in the sequence of Uranotaenia lutescens, with a substitution rate at nucleotide codons 1, 2, and 3 of 7:1:18. Two different protein sequence sites were also observed. These abnormal phenomena disappeared when the inserted site was deleted manually. Therefore, these anomalous sequences likely caused by the frameshift mutations of PCR.
Overall, DNA-based species identification systems depend on the ability to distinguish intraspecific from interspecific variation. This analysis of 404 COI sequences from 15 mosquito genera and 122 species and subspecies indicates that .98% of specimens formed distinctive clusters and that barcode divergence was relatively large between these groupings. Although it has limitations, DNA barcode technology has several advantages over traditional taxonomic methods as a tool for species identification. For example, it is unaffected by morphological variation between different life cycle stages. Another benefit is that it allows the homogenization, or calibration, of the taxonomic units identified in different areas. DNA barcode technology generally produces accurate results thereby greatly reducing the need for experienced taxonomists.
In summary, this study provides the first COI barcodes for mosquitoes in China and provides further evidence of the effectiveness of DNA barcoding in identifying recognized species. An insufficient number of specimens prevented in-depth investigation of sibling species complexes but we plan to address this area in the future. Care must be taken to exclude pseudogenes from COI databases to ensure the accuracy of molecular identification. COI databases also need to include specimens of the same species collected from different geographical locations in order to determine the extent of intraspecific variation. A complete evaluation of the effectiveness of DNA barcoding for the Culicidae can be achieved through multinational research.

Ethics Statement
No specific permits were required for this study. All experiments were conducted within state-owned land in China. Therefore, the local ethics committee deemed that approval was unnecessary.

Mosquito Collections
Mosquito specimens used for constructing DNA barcodes were collected from different Chinese Provinces in 2009 and 2010. Details on specimens collected are provided on Fig. 1 and Table. 1. Larval and adult mosquitoes were collected in the field. Adults were sampled with CO 2 -baited miniature light traps. Larvae were reared individually and associated larval and pupal skins were mounted. All specimens were identified using standard taxonomic keys [19].

Target Gene Preparation
Total DNA (100 mL to 150 mL) was extracted from each specimen using the Universal Genomic DNA Extration Kit (Invitrogen). PCR was performed to amplify the 59 COI region of mtDNA using the following cycle: An initial denaturation of 1 min (94uC) followed by five cycles of 94uC for 40 s (denaturation), 45uC for 40 s (annealing), and 72uC for 1 min (extension); 30 cycles of 94uC for 40 s (denaturation), 51uC for 40 s (annealing), 72uC for 1 min (extension) and a final extension at 72uC for 5 min. PCR cocktails were made as follows: A 50 mL solution comprised of 0.3 mL Taq DNA polymerase (5 U/mL), 5 mL of 106PCR buffer, 5 mL of 2 mmol/L dNTP, 2 mL of 10 mmol/L each of the forward and reverse primers, 5 mL of template DNA and sufficient ddH 2 O to make up to 50 mL. The Figure 2. NJ phylogenetic tree based on Kimura two-parameter genetic distances of COI gene sequences of mosquitoes prevalent in China. Sequence analysis was conducted using MEGA version 4.0 software with 1000 replications. Most major branches on the tree represent recognized groups, including all genera and subgenera except Anopheles and Culex which comprise separate subtrees and are shown in detail in Fig.3. doi:10.1371/journal.pone.0047051.g002 Figure 3. Two distinct sub-trees comprised of Anopheles and Culex in the NJ phylogenetic tree (Fig. 2). doi:10.1371/journal.pone.0047051.g003   primer pairs LCO1490 and HCO2198 [33] were used to amplify a 650 bp fragment of COI. The amplified fragments were run on a 1% agarose gel to check the integrity of the fragments after which the PCR product was purified with a normal PCR purification kit (Tiangen). Both reads (forward as well as reverse primer) were done.

Data Analysis
DNA sequences were aligned using Clustal X [34]. Sequence analysis and Ts/Tv calculation was conducted using MEGA version 4.0 software [14]. Sequence divergence and Ts, Tv distance among individuals was quantified using the Kimura twoparameter distance model [35]. An NJ tree of K2P distances was created to provide a graphic representation of the clustering pattern among different species [36].

Supporting Information
Table S1 Sequence divergence and nucleotide composition for the mosquito genera. The frequencies of nucleotides in sequence are presented as the total average values for all Condon positions and for each condon position separately with the accuracy to tenths of a percent. (*) Figures in brackets are the number of mosquito species used to estimates of sequence divergence for the genus (PDF)