All 37 Mitochondrial Genes of Aphid Aphis craccivora Obtained from Transcriptome Sequencing: Implications for the Evolution of Aphids

The availability of mitochondrial genome data for Aphididae, one of the economically important insect pest families, in public databases is limited. The advent of next generation sequencing technology provides the potential to generate mitochondrial genome data for many species timely and cost-effectively. In this report, we used transcriptome sequencing technology to determine all the 37 mitochondrial genes of the cowpea aphid, Aphis craccivora. This method avoids the necessity of finding suitable primers for long PCRs or primer-walking amplicons, and is proved to be effective in obtaining the whole set of mitochondrial gene data for insects with difficulty in sequencing mitochondrial genome by PCR-based strategies. Phylogenetic analyses of aphid mitochondrial genome data show clustering based on tribe level, and strongly support the monophyly of the family Aphididae. Within the monophyletic Aphidini, three samples from Aphis grouped together. In another major clade of Aphididae, Pterocomma pilosum was recovered as a potential sister-group of Cavariella salicicola, as part of Macrosiphini.


Introduction
The rapid expansion of genomic resources and explosion of new genome sequencing technologies are revolutionizing the research of phylogenetics. Researchers are now able to obtain phylogenomic data by fast and cost-effective means. RNA-seq (RNA sequencing), also called whole transcriptome shotgun sequencing, is a technology that uses the capabilities of next generation sequencing to reveal a snapshot of RNA presence, which is often used to generate sequence data for structural and functional studies. These data resource are important for many fields of organelle research [1]. Simultaneously, transcriptome sequencing can provide a relatively large proportion of sampled cDNA clones corresponding to mitochondrial transcripts. Thus, it is possible to mine mitochondrial genes as a by-product of transcriptome sequencing. present, only seven complete (Cervaphis quercus, Sitobion avenae, Aphis gossypii, Cavariella salicicola, Diuraphis noxia, Acyrthosiphon pisum and Schizaphis graminum) and three partial (Aphis glycines, Daktulosphaira vitifoliae and Pterocomma pilosum) mitogenomes in the Aphidoidea have been published in GenBank [18][19][20][21][22][23]. Although Wang et al. (2013) presented the first comparative analysis of mitogenomes of aphids and a phylogenetic reconstructions based on 11 mitochondrial PCGs, their taxon sample are very sparse (only six aphids included in their analysis) [19]. More recently, Wang et al. (2015) sequenced the complete mitogenome of aphid Mindarus keteleerifoliae [24] (however, to date, this data can't be available in GenBank, and thus this species is not included in our study). Their phylogenetic analysis based on 13 PCG nucleotide sequences from eight aphids recovered Aphidini (included only two species) and Macrosiphini (included no Pterocomma) in a monophyletic Aphidinae [24]. With conventional PCR and Sanger sequencing method, it is difficult to achieve the aphid mitogenome sequences, owing to the complex secondary structure, higher A + T composition and large tandem repeat regions [19].
In this study, all thirty seven mitochondrial genes of A. craccivora were obtained from transcriptome sequencing. In addition, combined with other hemipteran mitogenome sequences, the phylogenetic position of A. craccivora within Aphididae was investigated. The present study demonstrates the effectiveness of determining the whole mitochondrial gene data from transcriptome sequencing for those species with difficulty in sequencing mitogenome by conventional PCR methodology.

Ethics Statement
No specific permits were required for the insect specimens collected for this study in China. These specimens were collected in the residential garden of the author. The field studies did not involve endangered or protected species. A. craccivora is a common aphid species in China and is not included in the ''List of Protected Animals in China".

Sample collection and RNA sequencing
Samples of A. craccivora (about 300 heads) were collected from locust trees on May 2015 in Zhengzhou of China (the geospatial coordinates: 34.723°N, 113.635°E). Total RNA was extracted from 35 to 50 winged adult individuals by Trizol reagent (Invitrogen, CA, USA) following the manufacturer's procedure. The total RNA quantity and purity were determined using a Bioanalyzer 2100. Sequencing libraries were constructed using IlluminaTruSeq™ RNA Sample Preparation Kit (Illumina, San Diego, CA, USA). The results of RNA quantity and purity checking were the following: the concentration was around 3.15 ug/ul, A260/280 was 2.09, A260/230 was 1.52, total content was 300 ug, 28S/18S was 1.1, and RIN was 8.0. Prior to sequencing, two samples were prepared for repeat checking. The best run was used for further analysis. The RNA transcript was sequenced on an Illumina (Solexa) GAII sequencing machine in Shanghai OE Biotech CO., LTD. The sequencing depth was set to 4 Gb Raw data for each sample.

Transcripts assembly and mitochondrial gene identification
The transcripts were assembled de novo using the Trinity method [25], and the results were inputted into BioEdit version 7.0.5.3 [26] to build a local BLAST to search genes using published aphid mitogenomes (mainly based on the A. glycines and A. gossypii) as bait sequences. Table 1 lists the analyzed aphid species, the taxon status and the GenBank accession numbers [27][28][29][30][31]. The best hit sequences were retrieved from transcript data. Then, the whole retrieved sequences were aligned against all the published aphid mitochondrial sequences to identify the gene boundaries. The clover-leaf secondary structures of tRNA genes predicted by tRNAscan-SE server [32] are presented in S1 Fig. New mitochondrial DNA sequences obtained in this study were deposited in GenBank under accession number of KT889380.

Sequence alignment and characteristics analyses
In total, eighteen mitogenome data included twelve aphids, and comprised the ingroup (10 species from Aphididae, each one from Greenideidae and Phylloxeridae, respecitively), and six other homopteran insects, and selected as outgroup (one taxa from Psylloidea, 2 from Cercopoidea, and 3 from Fulgoroidea). Each of the 37 mitochondrial genes were aligned separately for further analyses. For PCGs, firstly stop codons were excluded. Subsequently, each was aligned based on the invertebrate mitochondrial genetic code with Perl script TransAlign [33]. Both the mitochondrial tRNA and rRNA genes were aligned with reference to the conserved secondary structure. Every tRNA gene was aligned manually. Each of the two rRNAs was aligned by the R-Coffee web server [34]. Finally, all alignments were concatenated in a single matrix using FASconCAT_v1.0 [35].
Nucleotide composition of these sequences was calculated using MEGA 6 [36]. Sequence potential saturation was assessed using the index of substitution saturation (Iss) of Xia et al.  [37] implemented in the DAMBE 5 [38]. To detect nucleotide homogeneity across taxa, the chi-square test was performed for the concatenated datasets using PAUP Ã 4.0b10 [39]. Estimates of nonsynonymous (dN) and synonymous (dS) substitution rates of concatenated protein-coding genes were obtained by Yang and Nielsen (2000) method [40] using the program yn00 as implemented in PAML 4.9 [41]. The software SPSS 16.0 was used to perform one-way ANOVA analyses in order to test for significant differences of substitution rates between aphid lineages. For 12 aphid species, three groups were designed as the priori independent variable. Of which five species from Aphidini made up the group 0, four from Macrosiphini made up the group 1, and the remaining three aphids made up the group 2 ( Table 2). The values of dN, dS and dN/dS were set to be the dependent variables for each test, respectively. The post hoc multiple comparisons were conducted using the method of the Least-significant difference (LSD), and the significance level was set to be 0.05.

Phylogenetic analyses
Tree searches were conducted on the combined dataset using both Maximum likelihood (ML) and Bayesian inference (BI). Before undertaking ML analyses, PartitionFinder was employed to infer the optimal partitioning strategy [42], meanwhile the best-fitting model was selected for each partition using the Bayesian Information Criterion (BIC). The data blocks were defined by gene types (each genes of 13 PCGs as independent blocks, while both tRNA and rRNA as two blocks) and by codon positions, in total 41 blocks were utilized. The partition schemes and best-fitting models selected are presented in S1 Table. ML searches were carried out using the partition schemes and the selected models described above with RAxML as implemented in the CIPRES Portal [43]. Support for nodes was assessed with the fast bootstrap method using 1000 non-parametric bootstrap inferences. The impact of outgroup on the phylogeny were assessed by RAxML analyses based on the combined datasets with reduced taxa (removing partial or entire outgroups). The parameter settings are as those described above.
The BI analyses were conducted using PhyloBayes with a parallel version (pb_mpi1.5a) [44,45] as implemented on a HP server with twenty-four CPU and 64 G memory. The GTR-CAT model was used for nucleotide analyses. Two chains were run, and started from a random topology. The Maximum "maxdiff" value to be accepted was set as 0.1.

Mitochondrial gene sequences from transcriptome sequencing data
In total, the sequenced 13,788 nucleotides of A. craccivora contained all the 37 mitochondrial genes typically present in insect mitogenomes. Of them, eleven complete PCGs (i.e., atp6, atp8, cox1, cox2, cox3, cytb, nad1, nad3, nad4, nad4l and nad6) were identified. There were 341 bp nucleotides missed for 3' end of nad2 on major strand compared to other aphid mitochondrial genomes, and 75 bp for 5' end of nad5 on minor strand. Compared with closely related species (e.g., A. glycines and A. gossypii), the 3' end of rrnS gene determined from transcriptome sequencing of A. craccivora lacked 118 bp sequences on the minor strand. For the rrnL gene, there were 24 bp or 23 bp nucleotides missed in 5' end on minor strand compared with A. glycines or A. gossypii. For the 22tRNA genes, each had a similar gene length to the released aphid mitogenomes.
All the newly determined genes exhibited strong AT nucleotide bias, such that A + T frequencies of PCGs were 82.77%, rRNAs were 84.69% and tRNAs were 84.71%. The results of the substitution saturation tests showed that the value of substitution saturation index for the combined dataset (Iss = 0.4853) was significantly lower than the critical values (Iss. cSym = 0.8494 or Iss.cAsym = 0.6575). This indicated that the combined data suitable for further phylogenetic analysis. The chi-square test of homogeneity of base frequencies across taxa indicated that there was significant heterogeneity among taxa for the combined dataset (p < 0.05).
Of the eleven complete PCGs, five (atp6, atp8, nad1, nad3 and nad6) used ATT as start codons, whereas cox1, cox2, nad4 and nad4l used ATA, and cox3 and cytb used ATG. The partial nad2 gene had ATT to be start codons. All the PCG genes used TAA as stop codons except for cox1 and nad4. The genes cox1 and nad4 ended with incomplete stop codons (T or TA). Because of the missing 3' end of nad2 and missing 5' end of nad5, there were no stop codons or start codons to be found for them.

Transfer RNA genes
The standard 22 tRNA genes were found in the transcriptome sequencing data of A. craccivora which ranged from 62 bp (trnD, trnG, trnS-AGN, trnT and trnV) to 73 bp (trnK) in size. Of them, all tRNAs could be folded into typical cloverleaf structure except for trnS-AGN (S1 Fig). trnS-AGN lacks the dihydrouridine (DHU) arm, as in many other insect species [46].

Phylogenetic analyses
The newly obtained full mitochondrial gene data were included in phylogenetic analyses along with other available aphid mitochondrial data. The two trees resulting from the ML and BI analyses had a similar topology (Fig 1). The only discrepancy was in the basal interfamilial relationships within Aphidoidea. ML analysis recovered the Phylloxeridae as the first branch, and the Greenideidae as the next. However, Bayesian analysis retrieved these two families as sister group with low statistical support (posterior probability 0.69). In both ML and BI trees, one notable aspect was the distinctive branch lengths seen between outgroup and ingroup taxa. In particular, outgroups P. venusta and L. striatella had longer branches in comparison to ingroup aphids. To investigate the potential effect of fast evolving outgroup on the tree topology, we successively removed the long-branched P. venusta, L. striatella, Geisha distinctissima and Lycorma delicatula or all six outgroup taxa to rerun tree searches (S2A and S2B Fig). The resulting ingroup relationships are identical to Fig 1. Moreover, both ML and BI analyses showed strong support for a monophyletic Aphididae (BP = 100, PP < 0.9). However, within the family Aphididae, the subfamily Aphidinae was recovered as a paraphyletic group, with the Pterocommatinae (represented by P. pilosum) nested within. In addition, two tribes were strongly supported: Aphidini, and Macrosiphini including P. pilosum. Within the Aphidini, both trees strongly supported a monophyletic Aphis, which was found to be a sister taxon to the clade of (R. padi + S. graminum). Thus, two subtribes, Aphidina and Rhopalosiphina, were strongly supported (BP = 100, PP > 0.95). In the genus Aphis, the A. glycines had a closer relation to A. gossypii than to A. craccivora. This might be the fact that sequence variation (polymorphic sites) between A. glycines and A. gossypii were relatively low. For the rest of aphids, the relationships of ((C. salicicola + P. pilosum) + (D. noxia + (A. pisum + S. avenae))) were consistently supported by all analyses.

Substitution rate analyses
The substitution rate analyses showed that the newly sequenced A. craccivora had a similar dN value to other aphids from the tribe Aphidini (0.0408~0.0420) ( Table 2). For five aphid species from the group of the Macrosiphini (including P. pilosum), A. pisum, D. noxia and S. avenae had a relatively lower dN values (0.0396~0.0447), whereas C. salicicola and P. pilosum had a higher dN value (namely, 0.0472 and 0.0482). These results corresponded to the two clades, which was respectively constituted by the former three aphids and the latter two ones. The rest two aphids (i.e., C. quercus and D. vitifoliae) had obviously higher dN values than other ones. Similarly, all aphids had the dS values ranging from 2.0899 to 2.8293 except for the C. quercus and D. vitifoliae, and the two latters had higher dS values than 4.6. The statistical analyses revealed no significant differences of dN or dS values among the defined groups (P > 0.05). However, there were significant differences for dN/dS values between the group comprised by the species of Macrosiphini and the group including P. pilosum, C. quercus and D. vitifoliae (comparison between group 0 and group 2, P = 0.030), and between the group of Aphidini and the group including the P. pilosum, C. quercus and D. vitifoliae (comparison between group 1 and group 2, P = 0.036). This result basically confirmed the recovered phylogeny of aphids at the tribe level.

Discussion
Next generation sequencing for complete mitochondrial gene data Traditionally, complete mitogenome data has been generated by PCR-based strategies, which can be readily confounded by degraded DNA templates, PCR amplicons and sequencing conditions, species-specific primers, and complex genome structure and organization, etc. Thus, it may be time consuming and costly. In contrast, next generation sequencing can overcome these difficulties and allows determining full mitochondrial gene data more effectively, in particular when it is becoming relatively cheaper already today. This study demonstrates the usefulness of transcriptome sequencing for obtaining complete mitogenome data. However, the obvious drawback of this approach is the inability to find out gene arrangement and to achieve the mitochondrial control region directly from transcript data. In addition, the number of mitochondrial genes detected by transcriptome sequencing varied depending on the different expression level with different sampling [8]. Sampling includes various complex factors, for example, different insect species, developmental stages, sexes and biotypes. A study of the brown planthopper Nilaparvata lugens revealed that long wing forms have higher expression levels of genes involved in respiration and energy metabolism compared to short wing forms [47]. They contributed this to the fact that long wing forms required more energy than short wing forms for flight. In this study, we chanced to sample the winged forms of aphid A. craccivora for transcriptome sequencing. The mitochondria is an important cell organelle responsible for energy metabolism. Thus, transcripts with abundant mitochondrial genes were obtained, and the entire set of 37 mitochondrial genes could be determined from A. craccivora transcript. This is only a speculation, the fact needs to be verified by further experiment. In general, transcriptome sequencing can provide the full set of mitochondrial PCGs and partial RNA genes, which are in fulfillment of the need of systematic research based on mitogenome sequences.

Phylogeny
Mitogenome data are often employed for phylogenetic analyses and are useful to study the intra-order relationships of insects [48][49][50]. Nevertheless, attempts to reconstruct the higher mitogenomic phylogeny have often failed [51,52]. The rapid rate of mitogenome evolution limits the resolving power of this type of molecule marker for deep phylogeny reconstruction.
In the current study, our phylogenetic analyses suggest that the full mitochondrial gene data may be appropriate for providing fine resolution of evolutionary relationships for the insect family Aphididae. Outgroup selection is important for phylogentic inference [53][54][55][56][57][58]. According to previous studies on Hemiptera phylogeny [30,[59][60][61][62], we used a comprehensive taxon sampling from other homopteran lineages which are closely related to aphids. Although some outgroup taxa exhibited obviously long-branch lengths due to higher sequence evolutionary rate, taxa-excluding analyses demonstrated that the resultant tree topology was not affected by long branches (S2A and S2B Fig). Therefore, the current outgroup choice is appropriate.
Two different inference methods under different evolutionary models resulted in congruent tree topology within the superfamily Aphidoidea (Fig 1). The families Phylloxeridae and Greenideidae were successively recovered as the early diverging lineages in Aphidoidea. This arrangement is also supported by the symbiont Buchnera DNA sequences [16]. The monophyletic family Aphididae and tribe Aphidini were retrieved. However, the monophyly of the subfamily Aphidinae was not supported due to the nested position of Pterocommatinae. Classifications of Aphidinae have been controversial. Until the latter half of the 20th century, most taxonomists concurred that the Aphidinae included three major groupings: "pterocommatines" (Pterocomma, etc.), "aphidines" (Aphis, Rhopalosiphum, Schizaphis, etc.), and "macrosiphines" (Acyrthosiphon, Diuraphis, Sitobion, etc) [14,63]. However, the relationships among these lineages were not fully resolved. Remaudière and Remaudière (1997) grouped aphidines and macrosiphines together, both of which comprised the subfamily Aphidinae [64]. And they classified pterocommatines as the independent subfamily Pterocommatinae. But the affinities of Pterocommatinae to Aphidinae were not clear. In contrast, von Dohlen and Moran (2000) strongly supported a monophyletic group of (pterocommatines + aphidines + macrosiphines) [65]. Shaposhnikov et al. (1998) placed Pterocommatini as a sister group to the clade (Aphidini + Macrosiphini) [66]. This hypothesis was confirmed by the study of Ortiz-Rivas and Martínez-Torres (2010), based on the combined analysis of nuclear and mitochondrial sequences [15]. However, von Dohlen et al. (2006) recovered a sister-group of Pterocomma plus Cavariella [14]. And they thought that there seemed to be no morphological synapomorphies to support the grouping of (Aphidini + Macrosiphini). By contrast, they demonstrated that the distance between stigmal pores on abdominal segments could be considered as a key character to support the relationship of Pterocommatini with Macrosiphini [14]. Our mitogenomic analyses recovered the relationships of ((Pterocommatini + Macrosiphini) + Aphidini). In fact, this result had been obtained using single gene locus EF1α by Ortiz-Rivas and Martínez-Torres (2010) [15] (see Fig 2 in their paper). Another prior mitogenomic phylogeny by Wang et al. (2013) also recovered the sister relationship of C. salicicola to P. pilosum, and they advocated that pterocommatines should be transferred into Macrosiphini [19]. This point has also been confirmed by the phylogenetic analyses in the present study.
Besides C. salicicola, all the other macrosiphines clustered in a clade, and the sister-group of (A. pisum + S. avenae) were strongly supported (BP = 100, PP = 0.94). Close relationship between these two aphid species has been reported in previous molecular studies [14,15]. Within the tribe Aphidini, two subtribes were recognized: Aphidina (Aphis, Toxoptera, etc.) and Rhopalosiphina (Rhopalosiphum, Schizaphis, etc.) [67]. In our analyses, this two subtribes were consistently recovered with strong nodal support (BP = 100, PP = 0.97). In addition, our data strongly support a sister group relationship between R. padi and S. graminu, as suggested by Buchnera DNA sequences [16]. The newly determined mitochondrial gene data of A. craccivora fell within the genus Aphis, and clustered with the published mitogenomes of A. glycines and A. gossypii. Thus, this result validates the approach of obtaining mitogenome sequence from transcriptome sequencing.

Implications for aphid host alternation and pest control
For the lineage Aphidinae, the strong support found for the relationships among its tribes and subtribes, and especially about the position of Pterocomma, makes us to be able to discuss some issues on the evolution of life cycles and some strategies of pest control of this group under the framework of reconstructed phylogeny.
The evolution of seasonal host alternation and complex life cycle led insects from Aphidinae to be explosively diversified in Tertiary [68]. In particular, presence of winged males during the autumn migration makes the Aphidinae distinguished from other aphid lineages, and implies a separate origin in this subfamily [69]. These characteristics also enable Aphidinae to be better adapted to the modern, seasonal, north-temperate climate and vegetation [70]. Thus, investigating origins of host alternation of aphids will benefit research on aphid host range and on aphid pest control. In the present study, three aphid tribes were included in the family Aphididae: Pterocommatini, Macrosiphini, and Aphidini. The reconstructed aphid phylogeny from mitogenomic data strongly supported P. pilosum (Pterocommatini: Pterocomma) embedded within Macrosiphini as the sister to C. salicicola. This is congruent with the result from von Dohlen et al. (2006) [14]. Pterocommatini have simple life cycles of non-host-alternating on hosts in the Salicaceae. Whereas, both Macrosiphini and Aphidini comprised rich species with or without host-alternating life cycles on diversified host plants of Rosaceae, Asteraceae, Poaceae, etc. Based on our phylogenetic analysis, two conclusions can be drawn on the origins of host alternation of Aphididae. First, the inferred basal position of P. pilosum (Pterocommatini) in Macrosiphini from mitogenomic data is in agreement with the view that the common ancestor of Aphidinae had a simple, non-host-alternating life cycle on a woody host [14]. Second, the similar phylogenetic relationships of Aphididae found here to that from von Dohlen et al. (2006) [14] further support the idea of the existence of several independent origins of host alternating life cycles in this group of insects.
A solid phylogeny is critical for inferences not only of the evolution of host-plant associations in Aphididae, but also for pest control. For example, based on comparative genomic data, researchers can predict a drug effect on some pests when molecular targets of a compound are known or suspected [71]. The similar phylogenomic pipeline can be used to study the mechanisms behind the action of insecticides on aphid pest. Although obtaining an aphid phylogeny largely concurrent with some previous studies, this study is preliminary due to still limited taxon sampling. Future studies need to include more mitogenomes from this important agricultural insect pest group. Especially, with the explosion of new genome sequencing technologies, researchers should explore large phylogenomic data from the rapid expansion of genomic resources for mitochondrial sequences to reconstruct a reliable phylogenetic relationship of aphids.