Complex evolution in Aphis gossypii group (Hemiptera: Aphididae), evidence of primary host shift and hybridization between sympatric species

Aphids provide a good model system to understand the ecological speciation concept, since the majority of the species are host-specific, and they spend their entire lifecycle on certain groups of host plants. Aphid species that apparently have wide host plant ranges have often turned out to be complexes of host-specialized biotypes. Here we investigated the various host-associated populations of the two recently diverged species, Aphis gossypii and A. rhamnicola, having multiple primary hosts, to understand the complex evolution with host-associated speciation. Using mitochondrial DNA marker and nine microsatellite loci, we reconstructed the haplotype network, and analyzed the genetic structure and relationships. Approximate Bayesian computation was also used to infer the ancestral primary host and host-associated divergence, which resulted in Rhamnus being the most ancestral host for A. gossypii and A. rhamnicola. As a result, Aphis gossypii and A. rhamnicola do not randomly use their primary and secondary host plants; rather, certain biotypes use only some secondary and specific primary hosts. Some biotypes are possibly in a diverging state through specialization to specific primary hosts. Our results also indicate that a new heteroecious race can commonly be derived from the heteroecious ancestor, showing strong evidence of ecological specialization through a primary host shift in both A. gossypii and A. rhamnicola. Interestingly, A. gossypii and A. rhamnicola shared COI haplotypes with each other, thus there is a possibility of introgression by hybridization between them by cross-sharing same primary hosts. Our results contribute to a new perspective in the study of aphid evolution by identifying complex evolutionary trends in the gossypii sensu lato complex.


Introduction
Phytophagous insects are a group of tremendous diversity that covers a quarter of all known terrestrial biodiversity [1,2]. It has long been a concern to identify the evolutionary force of by several morphological changes as the generation passes [35]. Most heteroecious aphids use a much narrower range of primary host (within the same family or genus), even if they have a wide range of secondary hosts [36]. These patterns are even observed to extend to the closely related aphid species. For example, eight Hyperomyzus species are known to have host alternation with the primary hosts within the genus Ribes, even though these species have wider relationships with various secondary hosts, such as Asteraceae and Scrophulariaceae [22].
There are different views about these contrasting host ranges of primary and secondary hosts. The first view suggested that primary host specialization is in evolutionary terms more favored, because a primary host not only provides nutrients, but also a mating place [37]. For heteroecious aphids, a primary host is a place where sexual reproduction takes place, as well as the overwintering eggs hatch, and the following fundatrix stage [35,38]. However, on the secondary host, only asexual reproduction occurs before the migration for overwintering [35,38]. Thus, having different primary hosts, rather than those with different secondary hosts, may have a greater significance in reproduction. In other words, a constraint of primary host is possibly linked to the mating success of a species [15,37]. The second view hypothesized that host alternation is only a by-product of an evolutionary process that occurs due to the phylogenetic constraints of fundatrix to the primary host [32,33]. This is the so-called fundatrix specialization hypothesis (Fig 1), which is based on the following assumptions: i) monoecy is evolutionarily more favored than heteroecy, ii) primary hosts are more ancestral than secondary hosts, iii) the fundatrix is highly adaptive to the primary host, but maladaptive to the secondary host, and iv) secondary hosts are more labile, and more recently obtained than primary hosts [31,32,39]. Under this hypothesis, the loss of a primary host was described as escape, and specialization to a specific secondary host was believed to be the only evolutionary way [31][32][33]39,40].
Having multiple primary hosts in A. gossypii [22,24] suggests the possibility of host-associated speciation. Indeed, previous population genetic studies have shown that A. gossypii is a complex of several genetically distinct biotypes associated with some plant families (e.g. Cucurbitaceae, Malvaceae, and Solanaceae) [12,41]. However, these studies only targeted anholocyclic lineages of A. gossypii collected from certain crops, and only a few primary hosts (e.g. Hibiscus syriacus and Punica granatum) [12,41]. They live on a much broader range of wild plants, and the primary hosts are also very diverse. Nevertheless, we know surprisingly little about the primary host-associated genetic structure in this species. In particular, there is no study on the genetic structure between primary and secondary HAPs of A. gossypii. Therefore, to better understand the evolutionary trends in A. gossypii, further genetic analyses encompassing wild HAPs are needed.
In addition, confirming the ancestral host plant is crucial to understanding the evolutionary process of aphids. Among several host plants used as the primary host of A. gossypii, the genus Rhamnus (incl. Frangula) in Rhamnaceae is the most strongly presumed to be a ancestral host of the gossypii sensu lato complex group [42,43]. There are several reasons why Rhamnus is regarded as a ancestral host. The first reason is that most species belonging to the gossypii group show congruent use of primary hosts in Rhamnus [44], while the second reason is the possibility that the gossypii complex group and Rhamnus chronologically co-evolved based on molecular dating and fossil record [44,45]. Because of this, Rhamnus has been believed to be at the center of the host-associated evolution in the gossypii complex group. Nevertheless, the relationships between Rhamnus and other secondary HAPs have not yet been investigated.
This study aims to investigate the evolutionary trends of the two closely-related host-alternating species, A. gossypii and A. rhamnicola, based on population genetic analyses of various primary and secondary HAPs. Aphis gossypii shows a typical heteroecious holocyclic lifecycle in Korea, for which various perennials and woody plants are known to be used as primary hosts [22], even though several anholocyclic isolates have been found in the secondary hosts [12]. Aphis rhamnicola is a recently described cryptic species of A. gossypii that shares Rhamnus spp. as primary hosts, but has a somewhat different range of secondary hosts [46]. We conducted population genetic analyses of the two species in a comprehensive set of populations from primary and secondary host plants that were mostly collected from South Korea (except for one population from the UK). We used two molecular approaches in this study. First, reconstructing the haplotype network based on COI barcode, we confirmed their speciation pattern and genetic relationships of the aphid HAPs specialized on various host plants. Second, using nine microsatellite loci, we analyzed the genetic structure to identify the relationships between the HAPs of the two species, and to clarify the host shifting or switching process between the primary and secondary hosts. We also inferred the most likely ancestral host for the primary HAPs, which could be strongly suggested through the results of this study, by using approximate Bayesian computation methods.

Taxon sampling and DNA extraction
As all collections have not been carried out in restricted areas, national parks, etc. where permits are required, it is clearly stated that there is no content regarding collection permits. To examine the genetic structure, diversity and host-associated evolution between primary and secondary HAPs, we used 578 individual aphid samples of 36 HAPs, selectively pooled from 116 different collections, within the two species, A. gossypii and A. rhamnicola, which were collected from 36 different host plants-perennial, annual, and biannual; woody and herbaceous -in 16 plant families (Table 1). For forthcoming analyses, primary hosts and secondary hosts were defined based on the following criteria: i) The obvious primary hosts are plants that have collected sexuparae and fundatrix morphs. In addition to the obvious primary host, we also considered plants meeting the following two conditions as primary hosts. ii) Plants previously recorded as primary hosts of A. gossypii with reference to Inaizumi [23,25] and Blackman and Eastop [28] or iii) the case when the collecting time is early spring (April-May) or late autumn (October-November) based on the lifecycle of A. gossypii on the Korean Peninsula; However, even if these two conditions were met, annual or biennial plants were excluded from the primary host. It was also not considered as the primary host if aphid collected in a greenhouse. As a consequence, all remaining plants not falling under the above conditions were considered as the secondary hosts. In our study, the host-associated population (HAP) means a collective population pooled from several temporally and/or geographically different collections in the same plant species (S1 Table). In A. gossypii with a large spectrum of host utilization, 25 HAPs were collected from its various primary and secondary hosts (S1 Table). As A. rhamnicola was recently recorded found in Rhamnus spp., sharing and co-existing with A. gossypii in Rhanmnus as a primary host [46], nine HAPs of A. rhamnicola were also collected from its various primary and secondary hosts (S1 Table).
These collections were acquired from South Korea, except for those of A. gossypii from Catalpa ovata in the UK (S1 Table). To avoid the chance of sampling individuals from the same parthenogenetic colony, each aphid was collected from a different host plant, or a different isolated colony. All of the fresh aphid specimens used for molecular analyses were collected and preserved in (95 or 99) % ethanol, and stored at -70˚C. Total genomic DNA was extracted from single individuals using a DNeasy 1 Blood & Tissue Kit (QIAGEN, Inc., Dusseldorf). To preserve voucher specimens from the DNA extracted samples, we used a non-destructive DNA extraction protocol [43]. The entire body of the aphid was left in the lysis buffer with protease K solution at 55˚C for 24 h, and the cleared cuticle dehydrated.

Species lineage sorting
In some aphid groups, morphological identification can be ambiguous, due to the lack of conclusive morphological evidence. The Aphis group is one of the most typical groups with the above problem. As a complementary way to avoid misidentification, host plant relationships, morphologies, and molecular tools are widely used to identify aphids [43,[46][47][48]. Because our study aims to demonstrate intra-specific genetic relationships based on host plant associations, species lineage sorting is significant to prevent biases of the results. The two Aphis species, A. gossypii and A. rhamnicola, we study here are not only very similar in morphology, but also share several host plants due to the polyphagy. Although we performed species identification through morphology and host plant relationships as a first step and also tested DNA barcoding for all individuals collected on their shared host plants (e.g. Capsella, Rhamnus, and Rubia), we found that there were a lot of the haplotypes cross-shared between A. gossypii and A. rhamnicola (see Results). Therefore, instead of identifying the species with 36 HAPs, we applied the dominant assignment (white, green, blue, red, dark blue) of the genetic structure (K = 3, 4, 5) by STRUCTURE as well as the PCoA results (see Results) to sort their lineags into five groups as Aphis gossypii Group 1, A. g. Group 2, A. rhmanicola Group 1, A. r. Group 2 and A. r. Group 3 (Table 1). Accordingly, 'Aphis gossypii' and 'A. rhamincola', which are mentioned later, are meant to include all group lineages containing the HAPs assigned by the results. S1 Table shows detailed information for lineage sorted samples used in DNA analyses.

Haplotype analysis
A 658 bp of the partial 5' region of the cytochrome c oxidase subunit I gene (COI), namely COI DNA barcode [49], was amplified using the universal primer sets: LEP-F1 5'-ATTCAACCAAT CATAAAGATAT-3' and LEP-R1, 5'-TAAACTTCTGGATGTCCAAAAA-3'. A polymerase chain reaction (PCR) was performed with AccuPower 1 PCR Premix (Bioneer, Daejeon, Rep. of Korea) in 20 mL reaction mixtures under the following conditions: initial denaturation at 95˚C for 5 min; followed by 35 cycles at 94˚C for 30 s, an annealing temperature of 45.2˚C for 40 s, an extension at 72˚C for 45 s, and the final extension at 72˚C for 5 min. All PCR products were assessed using a 1.5% agarose gel electrophoresis. Successfully amplified samples were purified using a QIAquick PCR purification kit (Qiagen, Inc.), and then immediately sequenced using an automated sequencer (ABI Prism 3730XL DNA Analyzer) at Bionics Inc. (Seoul, Korea). Both morphological identification, based on voucher specimens in the insect museum in Kunsan National University with descriptions of Blackman and Eastop [22], Lee and Kim [24], Heie [26], and molecular identification method using the COI DNA barcode region for comparison with the previous COI DNA barcode database, were used [43,46,50].
All sequences that were obtained for DNA barcoding were initially examined and assembled using CHROMAS 2.4.4 (Technelysium Pty Ltd., Tewantin, Qld, AU) and SEQMAN PRO ver. 7.1.0 (DNA Star, Inc., Madison, Wisconsin, USA). In this step, poor-quality sequences were discarded to avoid biases. The final dataset containing 187 sequences was aligned using MAFFT ver. 7 [51], an online utility. Some ambiguous front and back sequences were removed at this stage, resulting in sequences of 583 bp that were finally used for haplotype analysis. All sequences were deposited in GenBank (accession no. MT461429-MT461602). The COI haplotypes of A. gossypii complex were analyzed using DNASP ver. 6.12.03 [52]. A median-joining network (MJ) was built using NETWORK ver. 5.0.1.1 [53]. The MJ result was annotated with host plants or species, and then visually summarized in Fig 2.

Microsatellite genotyping
In this study, all 578 individuals of four species were successfully genotyped using nine microsatellite loci (AGL1-2, AGL1-10, AGL1-11, AGL1-15, AGL1-16, AGL1-20, AGL1-21, AGL1-22, and AGL2-3b) previously isolated from the soybean aphid [55]. In the preliminary study, we had already checked the cross-species amplification test of these loci on A. fabae, Hyalopterus pruni, Rhopalosiphum padi, and Schizaphis graminum, as well as A. gossypii in the tribe Aphidini. There were the previously developed loci from A. gossypii [56], but we used the nine loci developed from A. glycines [55], because we noticed that the polymorphism of the latter was higher than that of the former, which was advantageous to amplify the loci between different species. In the aphid group, several studies showed that microsatellite loci were available between related species within the aphid family as a utility of cross-species amplification [57,58].
Microsatellite amplifications were performed using GeneAll 1 Taq DNA Polymerase Premix (GeneAll, Seoul, Korea) in 20 μL reaction mixtures containing 0.5 μM forward labeled with a fluorescent dye (6-FAM, HEX, or NED) & reverse primers, and 0.05 μg of DNA template. PCR was performed using a GS482 thermo-cycler (Gene Technologies, Essex), according to the following procedure: initial denaturation at 95˚C for 5 min, followed by 34 cycles of 95˚C for 30 s; annealing at 56˚C for 40 s; extension at 72˚C for 45 s, and a final extension at 72˚C for 5 min. PCR products were visualized by electrophoresis on a 1.5% agarose gel with a low-range DNA ladder to check for positive amplifications. Automated fluorescent fragment analyses were performed on the ABI PRISM 377 Genetic Analyzer (Applied Biosystems), and allele sizes of PCR products were calibrated using the molecular size marker, ROX labeled-size standard (GenScan™ ROX 500, Applied Biosystems). Raw data on each fluorescent DNA product was analyzed using GeneMapper 1 version 4.0 (Applied Biosystems).

Microsatellite data analysis
We used GENALEX 6.503 [59] to identify multilocus genotypes (MLGs) among populations. The program FSTAT 2.9.3.2 [60] was used to estimate the mean number of alleles (N A ), gene diversity (H S ), and allelic richness (R S ). Observed (H O ) and expected heterozygosity (H E ) values among loci were estimated using GENEPOP 4.0.7 [61] among the population data (HAPs) sets. Levels of significance for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium tests were adjusted using the sequential Bonferroni correction for all tests involving multiple comparisons [62]. Deviations from HWE were tested for heterozygote deficiency or excess. Because the clonal copies of MLGs due to the parthenogenetic life cycle of aphids could affect and distort the estimation of HWE [63], we used a reduced data set containing only one copy of each MLG when estimating HWE. Several assumptions of HWE still can be violated, thereby these estimates are used only for descriptive purposes even although the clonal MLG copies were removed from data analysis [63]. MICRO-CHECKER [64] was used to test for null alleles [65] and identify possible scoring errors, because of the large-allele dropout and stuttering.
We used ARLEQUIN 3.5.1.2 [66] for calculations of pairwise genetic differentiation (F ST ) values [67], in which populations were assigned by 36 HAPs of the two species. The statistical significance of each value was assessed by computing the pairwise comparison of the observed value in 100,000 permutations. Groupings based on three different cases, (1) gossypii vs rhamnicola, (2) perennial vs non-perennial host groups in A. gossypii, (3) perennial vs non-perennial host groups in A. rhamnicola, were tested independently with analysis of molecular variance [AMOVA; 68] in ARLEQUIN, with significance determined using the nonparametric permutation approach described by Excoffier et al. [69].
To examine the genetic relationships between 578 individual samples of four species, principal coordinate analysis (PCoA), also in GENALEX [59], further explored population relationships using the microsatellite loci, making no a priori assumptions about population groupings. Codominant genotypic genetic distance was calculated to make tri-matrix of pairwise populations, and then each population plot was created with coordinates based on the first two axes.
The program STRUCTURE 2.3.3 [70] was used to test for the existence of population structuring among all samples, by estimating the number of distinct populations (K) present in the set of samples, using a Bayesian clustering approach. We assessed likelihoods for models with the number of clusters ranging K = (1 to 15). The length of the initial burn-in period was set to 100,000 iterations, followed by a run of 1,000,000 Markov chain Monte Carlo (MCMC) repetitions, of which the analysis was replicated 10 times, to ensure convergence on parameters and likelihood values. Parameter sets of ancestry, allele frequency, and advanced models remained as defaults. Following the method of Evanno et al. [71], we calculated ΔK based on the secondorder rate of change in the log probability of data with respect to the number of population clusters from the STRUCTURE analysis. To determine the correct value of K, both the likelihood distribution being to plateau or decrease [70] and the peak value of the ΔK statistic of Evanno et al. [71] was estimated. The single run at each K yielding the highest likelihood of the data given the parameter values was used for plotting the distributions of individual membership coefficients (Q) with the program DISTRUCT [72].
We performed assignment tests using GENECLASS 2 [73], in which populations were assigned to 36 HAPs of the two species. For each individual of a population, the program calculates the probability of belonging to any other reference population, or of being a resident of the population where it was sampled. The sample with the highest probability of assignment was considered the most likely source for the assigned genotype. In this study, we checked the mean assignment rate from 391 A. gossypii or 187 A. rhamnicola individuals into each population (source), to confirm the possible origin of each HAP. We used a Bayesian method of estimating population allele frequencies [74]. Monte Carlo re-sampling computation (100,000 simulated individuals) was used to infer the significance of assignments (alpha = 0.01).

Approximate Bayesian computation analysis
To estimate the relative likelihood of the most likely ancestral HAP of A. gossypii, an approximate Bayesian computation (ABC) was performed for the microsatellite dataset as implemented in DIYABC version 1.0.4 [75]. DIYABC allows the comparison of complex scenarios involving bottlenecks, serial or independent introductions, and genetic admixture events in introduced populations [76]. The parameters for modeling scenarios are the times of split or admixture events, the stable effective population size, the effective number of founders in introduced populations, the duration of the bottleneck during colonization, and the rate of admixture [77]. The software generates a simulated data set used to estimate the posterior distribution of parameters, in order to select the most likelihood scenario [77]. DIYABC generates a simulated data set that is then used to select those most similar to the observed data set, and the so-called selected data set (n δ ), which are finally used to estimate the posterior distribution of parameters [75]. Recently, this ABC software package has been widely used, such as for inferring the demographic history of populations and species [78,79], and testing potential bottleneck events [80].
To infer the most likely ancestral primary host of A. gossypii, among the whole microsatellite dataset, we tested three different ABC analyses using the original or partial dataset. We hypothesized the evolutionary scenarios following our results obtained from the COI haplotype network and two Bayesian analyses, STRUCTURE and GENECLASS2 (see "Results"). The previous studies have already revealed that A. rhamnicola was located in the more ancestral position within the phylogeny of the A. gossypii group [43,44], which population was therefore set to the most ancestral position on the genealogy of the two ABC tests.
In the first analysis, based on the result of STRUCTURE (K = 3), we compared eight evolutionary scenarios (A1-A8) using a dataset that included 578 individuals from four population groups, which consisted of 75  We produced 1 000 000 simulated data sets for each scenario. We used a generalized stepwise model (GSM) as the mutational model for microsatellites, which assumes increases or reductions by single repeat units [75]. To identify the posterior probability (PP) of these three scenarios, the n δ = 30 000 (1%) simulated datasets closest to the pseudo-observed dataset were selected for the logistic regression, which is similar to the n δ = 300 (0.01%) ones for the direct approach [77]. The summary of statistics was calculated from the simulated and observed data for each of the tested scenarios, such as the mean number of alleles per locus (A), mean genetic diversity for each group and between group, genetic differentiation between pairwise groups (F ST ), classification index, shared alleles distance (D AS ), and Goldstein distance.

Haplotype analysis
A total of 18 haplotypes were recognized from the 187 COI sequences of 36 host-associated populations of the two Aphis species (Fig 2). The most common haplotype was H9, followed by H2. Aphid samples from the three primary hosts: Hibiscus, Rhamnus, and Rubia were spread across these two major haplotypes (Fig 2A). All the samples from the remaining primary hosts (i.e. Catalpa, Celastrua, Citrus, Euonymus, and Punica) had H9 haplotype (Fig 2A). Unique haplotypes were mostly observed among the Rubia population (Fig 2A). Cucurbita, Hibiscus, Punica, Rhamnus, Sedum, and Youngia associated populations also had unique haplotypes. Among the secondary HAPs, samples from the Capsella were found in both H2 and H9 haplotypes (Fig 2A). Samples from the Youngia was also observed in H1, H2, and H3 haplotypes (Fig 2A). However, the populations associated with the majority of secondary hosts only had one haplotype. H1 consisted of samples from secondary hosts, such as Ixeris, Leonurus, Perilla, Phryma, and Youngia. To compare COI haplotype and microsatellite genotype results, we overlaid the five biotypes that were identified from STRUCTURE (K = 5) on the haplotype network (Fig 2B). The result of haplotype analysis was highly discordant with the STRUCTURE results (see below). Among the five biotypes, red, blue, and green types were observed in both H2 and H9 haplotypes (Fig 2B). The majority of white type aphids belonged to H9, while blue and green types were mostly found in H2 (Fig 2B). Aphids with green type showed the most diverse haplotype diversity (Fig 2B). The haplotype H1 contained blue and dark blue types.

Microsatellite data analysis
We successfully genotyped 578 aphid individuals of 36 HAPs of the two species using 9 microsatellite loci, and then found 463 non-clonal MLGs from all samples (Table 1). Generally, genetic diversity was high throughout the HAPs collected from woody perennials, which seemed to be regarded as the primary (overwintering)  In HWE, there were significant deviations in Ag_CA, Ag_CP, and Ar_PE by heterozygote excess, and in Ag_CJ, Ar_YO, Ar_PH, and Ar_RU by heterozygote deficit. Heterozygote excess in Ag_CA, Ag_CP, and Ar_PE were likely the result of heterosis or over-dominance related to selection preference toward heterozygous combination or fixation of heterozygous genotypes due to parthenogenesis of aphids in secondary host, especially under anholocyclic (permanently asexual) life [81]. Similar to our results, this phenomenon was already reported from several aphid species such as Sitobion avenae, Myzus persicae and Rhopalosiphum padi having permanently or temporary asexual life, which showed the significant heterozygote excess [82][83][84]. Negative FIS values also showed an increase in heterozygosity that was generally due to random mating or outbreeding, whereas positive FIS values explained that the amount of heterozygous offspring in the population decreased, usually due to inbreeding [85]. There was no evidence of significant linkage disequilibrium or frequency of null alleles.
Genetic differentiation between host-associated populations and AMOVA. We estimated pairwise genetic differentiation (F ST ) between 36 different HAPs of the two species (   Two cases to confirm the genetic variance between the preordained groups were analyzed using AMOVA implemented in ARLEQUIN [68]. In the case of the analysis grouped by case 1, percentages of the genetic variance (PV) 'among groups' and 'among populations within groups' were 14.59% and 22.60%, respectively, which shows that there is some grouping effect by host plants, even though the majority of genetic variation was found 'among individuals within populations' as approximately 63% (Table 3). However, the genetic variance of about -1 0% 'among groups' in the both analyses grouped by cases 2 and 3 suggests that there are no grouped structures according to their lives in the perennial or non-perennial hosts on both A. gossypii and A. rhamnicola (Table 3). Interestingly, PV of 'among populations within groups' in A. rhamnicola was about 20% higher than that in A. gossypii, which means that the HAPs of A. rhamnicola is genetically differentiated further than those of A. gossypii (Table 3).

Ag_IL Ag_CU Ag_CM Ag_KA Ag_SO Ag_CA Ag_CP Ag_PU Ag_EL Ag_HI Ag_HR Ag_EU Ag_EJ Ag_CI Ag_FO Ag_CE Ag_ER Ag_SN Ag_CO
Genetic similarity, structure, and assignment. A plot of PCoA between 36 HAPs based on codominant genotypic genetic distances showed that the two species, A. gossypii and A. rhamnicola, were completely separated in each of the left, upper-right, and lower-right sides on the plot (Fig 3). Plots of A. gossypii populations being closely aggregated along the line of factor 1 means that they are genetically close to each other, whereas the plots of A. rhamnicola being relatively largely scattered show genetic isolations between them. Among all A. gossypii populations, Ag_HI and Ag_RH were relatively located near to the HAPs of A. rhamnicola, Ar_ST and Ar_SE, respectively. Plots of Ar_YO and Ar_IX, which had been taxonomically considered to A. gossypii, were closely located to each other, but distant from the majority group of A. gossypii.
The genetic structure of 36 HAPs of the two species (A. gossypii and A. rhamnicola) for 578 individuals was analyzed by STRUCTURE 2.3.3 [70]. In all STRUCTURE analyses from K = (1 to 15), the most likely number of clusters was K = 4, using the ΔK calculation according to the method of Evanno et al. [71]. Here, we show the structure results from K = (2 to 5), in order to observe the change of genetic structure and assignment pattern according to the K value ( Fig  4). When K = 2, the (first) white cluster dominantly appeared to A. gossypii populations, except for Ag_RH with a large green assignment, while the (second) green cluster was largely distributed among populations of A. rhamnicola. When K = 3, the (first) white cluster also was dominant in A. gossypii HAPs, except for Ag_RH with large blue assignment and Ag_Hi with small green and blue ones, the (third) blue cluster as the 'Rhamnus group' prevalent in Ar_SE, Ar_PE, Ar_YO, Ar_IX, Ar_RH, Ar_CO, Ar_LE, Ar_PH and Ar_ST, and the (second) green cluster in the rest as the 'Rubia group'. When K = 4, the genetic structure was basically similar Table 3 to that at K = 3, except that the (fourth) red cluster was dominant in Ag_IL, Ag_CU, Ag_CM, Ag_KA, Ag_SO, Ag_CA, and Ag_CP, and partially appeared in Ag_EL, Ag_HI, Ag_HR, Ag_EU, Ag_EJ, and Ag_CI. When K = 5, the genetic structure was basically similar to that at K = 4, except that both Ar_YO and Ar_IX showed the (fifth) dark-blue cluster. The Bayesian assignment tests using GENECLASS 2 [73] were carried out to identify the HAP (as population) membership of 578 individuals from all the 36 HAPs. The result of the assignment test (S2 Table) indicated the average probability with which individuals were assigned to the corresponding reference HAP (as population). The self-assignment probability values (SA) averaged (0.482 ± 0.106) (mean ± s.d.) in overall HAPs, (0.515 ± 0.103) in A. gossypii, and (0.427 ± 0.08) in A. rhamnicola. In A. gossypii, the mean assignment probability from 391 A. gossypii individuals into Ag_RH had the highest value (0.446, SA = 0.381), which was followed by the assignment value into each reference HAP of Ag_HI (0.219, SA = 0.478) and Ag_PU (0.214, SA = 0.458) (Fig 5). In A. rhamnicola, the mean assignment probability from 145 A. rhamnicola individuals into Ar_RU had the highest value (0.137, SA = 0.489), which was similar to the assignment rate into each reference HAP of Ar_CB (0.131, SA = 0.463) and Ag_LY (0.129, SA = 0.309) (Fig 6).

Among groups
Inferring a ancestral primary host to test hypothetical scenarios by ABC analysis. To propose the most likely 'ancestral host evolution' scenario followed by the hypothesis that most of the A. gossypii populations originated from two possible ancestral HAPs (e.g. Ag_RH, Ag_HI), which had diverged from A. rhamnicola, the ABC test was conducted. We tested four scenarios to determine which HAP is the most ancestral among all the HAPs in A. gossypii (see "M&M"). The generated results are presented as a logistic regression using DIYABC software, estimating the PP of each tested evolutionary scenario of the hypothesis for the selected simulated data (n δ ) (Cornuet et al. 2008), which ranged between (8 000 (or 6 000) and 80 000 (or 60 000)) n δ .
In the result of the first analysis (S4 Fig), scenario A1 obtained the highest PP ranging (0.664 (n δ = 8000) to 0.697 (n δ = 80 000)), with a 95% CI of (0.601-0.727) and (0.677-0.716).  Scenario A2 showed a PP ( Table 4). As a result, scenario A1 appeared as the most robust hypothesis with the highest PP among the four scenarios tested, which suggests that, compared to the other remaining hosts, Rhamnus is the most ancestral host for A. gossypii and A. rhamnicola, respectively.
In the result of the second analysis (S5 Fig), the scenario B2 was estimated more highly than the other four scenarios ( Table 4). As a result, although the direct approach estimated a slightly higher PP for scenario B1 (0.520 and 0.480) than for B2 (0.460 and 0.448) (S5 Fig), the scenario B2 appeared as the the highest PP in the logistic regression. It is well supported that A. rhamnicola is the origin of A. gossypii (B1, B2), but is not conclusive whether the RED group in A. gossypii is diverged from the WHITE group, or vice versa.
In the result of the third analysis (S6 Fig), scenario C4 obtained the highest PP (Table 4). As a result, although the direct approach estimated a slightly higher PP for scenario C6 (0.400 and 0.326) and C5 (0.140 and 0.234) than for C4 (0.180 and 0.198) (S6 Fig), the scenario C4 appeared as the highest PP among the four scenarios tested in the logistic regression. This suggests that, within A. gossypii, the WHITE group is more ancestral than the MRW and RED groups, and then RED is originated from MRW, which hypothesizes that Hibiscus is the secondarily primary host, and can be still a refuge for the RED group.  Table 4. Probabilities (with 95% confidence intervals in brackets) of the logistic regression for the scenarios in three different analyses inferred from DIYABC [77].

Complex evolution in Aphis gossypii
Our results identify the genetic structure between the various primary and secondary HAPs of the two species, A. gossypii and A. rhamnicola, encompassing the most various aphid samples from wild host plants. Our population genetic analyses reveal that A. gossypii and A. rhamnicola are mainly split into three (red, white, blue) and the other three (dark-blue, blue, green) biotypes, respectively, based on the STRUCTURE result (Fig 4, K = 5). The evolutionary trend of these aphids cannot be defined in any particular direction, and they show complex and various speciation tendencies. Here, we highlight major cases in these species. One of the notable results is that some secondary HAPs seem to use a specific primary host (Fig 4, K = 4). In other words, A. gossypii and A. rhamnicola do not promiscuously use their primary and secondary host plants; instead, certain biotypes use only some secondary and specific primary hosts. For example, secondary HAPs having green biotype (e.g. Capsella, Lysimachia, Stellaria, and Veronica) seem to use only Rubia as the primary host in our dataset. On the other hand, Rhamanus serves as the primary host for the secondary HAPs having blue biotypes (e.g. Commellina, Leonurus, Perilla, Phryma, and Sedum). These cases indicate that a group that apparently uses several primary hosts is actually a complex of groups using a specific primary host.
In contrast to the previous cases, the white and red biotypes were found to share some primary and secondary hosts (Fig 4, K = 4). In particular, the white biotype has been extensively found in the most diverse primary hosts, such as Callicarpa, Catalpa, Celastrus, Citrus, Eunonymus, Hibiscus, and Punica. The red biotype occurs in Citrus, Eunonymus japonica, Hibiscus, Ilex, and Punica. However, some primary hosts were exclusively occupied by the white (e.g. Catalpa and Celastrus,) or red type (e.g. Ilex), suggesting that these biotypes are possibly in a state of diverging through specialization to specific primary hosts. Interestingly, similar to the first case (i.e. blue and green types), the white and red types also tended to use specific secondary host groups, respectively. Except for a few secondary hosts (e.g. Cucurbita and Solanum), most of them represented only one biotype. For example, Cucumis sativus and Capsicum annuum were completely occupied by the red biotype. This is similar to the tendency found in most polyphagous aphids that the primary host is shared, but the secondary host is completely different [32,[86][87][88][89].
In the STRUCTURE results, the dark-blue biotype (Fig 4, K = 5) represents the third case. The dark-blue biotype was represented only by two secondary hosts, Ixeris and Youngia, and was not found in any primary host. Thus, we assume that this case seems to be an ecologically isolated host race through the loss of a primary host. Although we did not confirm the lifecycle of this biotype in this study, there is a reference to A. gossypii inhabiting some Asteraceae plants in the previous study, even though those HAPs are identified to A. rhamnicola based on our results (Figs 3 and 4). Blackman and Eastop [22] found that populations producing eggs on the roots of Ixeris, including some Asteraceae plants in China identified as A. gossypii, may be other closely-related species.With large genetic differences from the main group of both A. gossypii and A. rhamnicola (Fig 4), they were possibly isolated to the secondary host directly from the ancestral primary HAP in Rhamnus by the host alternation, supporting the possibility of differentiation from their ancestral host race according to the loss of primary host [32]. Thus, the dark-blue biotype is likely to be an ecologically incipient species of A. rhamnicola, which has recently been derived by secondary host isolation.
Our results show strong evidence of ecological specialization through a primary host shift in both A. gossypii and A. rhamnicola. ABC analyses yielded the biotypes of the two species that were formed by shifting from the shared resource, Rhamnus, to different primary hosts, respectively (S4 and S5 Figs). In particular, the series of primary host transitions identified in A. gossypii seem to have played an important role in the formation of their biotypes. For heteroecious aphids, a distinct choice of the primary host means not only utilizing different resources, but also genetic isolation between populations. This is because at one and the same time, the primary host is a resource, and a mating place. Accordingly, primary host selection in aphids is closely linked to genetic structure. Interestingly, in these species, primary host transitions occur more commonly than expected. As a traditional notion, primary hosts in aphids have been considered to be very fixed, and to not be able to easily escape, due to a highly adapted fundatrix morph [32,33]. In particular, the white biotype that appears in many A. gossypii uses a wide variety of taxonomically unrelated primary hosts, which show a variable relationship between primary and fundatrix (Fig 4). However, the results of our ABC analyses identified that A. gossypii was firstly derived from the biotype associated with Rhamnus to a white biotype, and then Hibiscus associated biotype was derived (S4 Fig). Thus, having multiple primary hosts is possibly a transitional step to shifting to another primary host. In fact, Hibiscus is a plant closely related to Gossypium (i.e. cotton), a representative secondary host of A. gossypii. Unfortunately, although Gossypium associated population was not included in this study, it can be inferred that there is a possibility that the transition of primary host through secondary host may have occurred. However, similar to our results, Carletto et al. [12] also suggested the possibility that Hibiscus was a shared ancestral host from which the agricultural divergence originated. In light of its HAP being genetically shared with the other HAPs in agricultural crops, such as Gossypium, cucurbits, and other secondary hosts.
Since the fundatrix specialization hypothesis [32,33] has been proposed, the complex lifecycle of aphids has long been regarded as a by-product of aphid evolution. However, the identification of several heteroecious HAPs in A. gossypii and A. rhamnicola in our study is largely in conflict with the expectations of this hypothesis. In our results, except for one case (i.e. the dark-blue biotype), the HAPs appear not to be genetically isolated completely but still to be linked together between some group of primary and secondary hosts, in contrast to the assumption that monoecy as a dead-end [32] is evolutionarily favorable over heteroecy. Moran's hypothesis [32] predicts that the dead-end of heteroecy always leads to specialization on the secondary host by loss of the primary host. Nevertheless, our results indicate that a new heteroecy race can commonly be derived from the heteroecy ancestors. In other words, our results show that lifecycle evolution is not a one-way process [32], but can be much more variable than we expected. These results are similar to the recent study on the genus Brachycaudus (Aphidinae: Macrosiphini), which provided strong evidence of the evolutionary lability of a complex lifecycle in Brachycaudus [89]. In addition, the use of several primary hosts found in some races (i.e. red and white biotypes) negates the core assumption of the fundatrix specialization hypothesis [32,33] that the fundatrix is fully adapted to the only primary host, and is inadequate to other hosts. Using multiple primary hosts is possibly a strategy for their migration success. Indeed, a migration failure can lead to high risk. For example, aphids using only a single primary host, such as Rhopalosiphum padi, have only a 0.6% migration success rate [90].

Ancestral host association in A. gossypii complex
Rhamnus appears to be the most ancestral host plant for both A. gossypii and A. rhamnicola. Several species in the gossypii complex group have intimate relationships with Rhamnus (e.g. A. frangulae, A. glycines, A. gossypii, A. nasturtii and A. rhamnicola), which have been considered Rhamnus as an ancestral host for this aphid group [22,43]. Our ABC result is consistent with this assumption (S4 and S5 Figs). As in the previous study, Rhamnus appears to serve as a shared primary host for both A. gossypii and A. rhamnicola. In the GENECLASS2 analyses (Figs 5 and 6) as well, the assignment of most of the gossypii HAPs to Rhamnus was very high, corroborating that it was differentiated from the ancestral host, Rhamnus.
Despite the differentiation of aphids into various species using the different hosts (mainly secondary hosts), host utilization of Rhamnus still remains in several species of the gossypii group. The phylogenetic studies of Aphis showed that heteroecious species using Rhamnus as the primary host were derived non-consecutively from monoecious species [44,91]. In other words, even if a monoecious species has been derived by loss of heteroecy, it seems likely to not be the dead-end of evolution [33], as well as a complete disconnect from the ancestral primary host. For example, A. rhamnicola and A. gossypii are heteroecious species, which use Rhamnus as the primary host, and several monoecious species on various host plants appear to have been derived between them [46]. Our ABC analyses confirmed that Rhamnus was lost once when branching from blue type to green type, and was then regained in white type (S4 Fig). Surprisingly, these ABC results, similarly supported by the GenClass2 results (Figs 5 and 6), almost coincided with our haplotype network results (Fig 2).
These results conflict the fundatrix specialization hypothesis [32,33], which predicts that once aphids leave the ancestral primary host, they cannot regain it again. Recently, the phylogenetic study of Brachycaudus demonstrated that even if they lost their potential ancestral Rosaceae hosts, they can easily regain their hosts to be the primary host for heteroecy, or the sole host for monoecy [89]. The ancestral primary host does not seem to be an absolute being that cannot be changed due to the adaptation of the fundatrix, but seems to be a conserved resource within a specific aphid group. In fact, such a labile of aphid lifecycle related to the use of primary hosts may also occur within a species. Host alternation for some species is often not obligatory but facultative, in which the migration to the secondary host can often be omitted [15,92]. As an example, a facultative alternation lifecycle has been reported in populations of Aphis fabae, even although the vast majoriy of them migrate routinely between primary and secondary hosts [92]. Although there is little known about the facultative use of the primary host in A. gossypii, it may be related to the primary host range expansion and lifecycle lability.

The evidence of hybridization between A. gossypii and A. rhamnicola
Our population genetic analyses based on microsatellite and COI gene show that there is a significant conflict between the two results. Regardless of the primary and secondary hosts, we found individuals that are difficult to identify in some host-associated populations. A. gossypii and A. rhamnicola appeared to share major haplotypes, H9 and H2, respectively, of their counterpart species with each other. Although the PCoA and STRUCTURE results (Figs 3 and 4) based on the microsatellites clearly showed identification of A. gossypii, the individuals corresponding to H2 (major haplotype of A. rhamnicola) were two individuals from Ag_Hi and one from Ag_KA, whereas the individuals corresponding to H9 (major haplotype of A. gossypii) were also identified as A. rhamnicola, but six from Ar_CB, one from Ar_PH, and four from Ar_RU. Surprisingly, the cross-sharing haplotypes (H5, H9) between these two species unexpectedly contained several kinds of both primary and secondary hosts.
Comparing the host races between A. gossypii and A. rhamnicola, most of them were distributed in two haplotypes (H9 and H2), and they were clearly identified as distinct species, based on the microsatellite analysis (Fig 2). However, a number of intermediate haplotypes, H13, H11, H12, H10, and H18, were observed among the species (Fig 2). The H1 haplotype shared by several wild plant populations, such as Ixeris, Leonurus, Perilla, Phryma, and Youngia, is closely related to the H9 (major haplotype of A. gossypii). However, according to our microsatellite data, these populations appear to be closer to A. rhamnicola (Figs 3 and 4). Similarly, collected from Rubia akane, individuals of H4, H6, H7, and H8 haplotypes apparently have alleles of A. rhamnicola in microsatellite data; it is most unusual that they have the haplotypes closely related to and derived from A. gossypii, rather than A. rhamnicola. In the case of H18, it is inferred to be the similar haplotype of true A. gossypi in Curcubitaceae, and H13 of Ar_SE, which was often cryptically recognized from A. gossypii as A. rhamnicola based on morphology and mtDNA [42,93], was nested between the two species, A. gossypii and A. rhamnicola. Since it was reported that populations with a sexual phase on Rubia cordifolia in Japan appeared to be isolated from those on other primary hosts [25], it is suspected that most of the populations collected from Rubia in the past might actually be host races of A. rhamnicola, but misidentified.
Hybridization can have important evolutionary consequences, including speciation in association with novel host plants in insects [94]. In our study, as the two species, A. gossypii and A. rhamnicola, with distinct taxonomic and phylogenetic differences shared COI haplotypes of counterpart species with each other, there is a possibility of introgression by hybridization between them. These two species share overwintering (primary) host, such as Rhamnus, so mating and reproducing contemporarily in the same leaves or branches, because there are no physiological or ecological significant differences [43,46]; there is therefore the possibility that hybridization occurs between them. In fact, interspecific cross mating between sexuparae of A. glycines, A. gossypii, and A. rhamnicola in Rhamnus spp. has often been observed (unpublished data). It is an interesting phenomenon that a hybrid zone mediated not by geography, but by a resource, can exist. Lozier et al. [87] first detected hybridization and introgression between plum and almond associated Hyalopterus spp. on these host plants, which surprisingly were capable of feeding and developing on apricot from each species. For that possibility of hybridization, it was suggested that imperfections in any number of mechanisms associated with host plant choice [95,96] could lead to strong selection against hybrids on parental host plants, but less so on apricot [87]. Although apricot was introduced later than other host species in the studied area, it remains a mystery why only it is able to attract all Hyalopterus groups and permit hybridization, whereas the other Prunus hosts are more restrictive [87]. Based on the phylogenetic results of COI or Buchnera 16S for Hyalopterus spp. [87], peach or apricot could be inferred to their ancestral host like the Rhamnus as a hybridization host utilized by the gossypii group. These results from the gossypii group reaffirm the hypothesis of Lozier et al. [87], which corroborates that such hybridization in the aphid group often occurs by co-existence in the primary host. However, further research is needed to determine whether a primary host (i.e. hosts that can be utilized by various host races, where they co-exist and overwinter together) are ancestral or derivational for those aphids.  Table. Mean assignment rate of individuals into (rows) and from (columns) each population using GeneClass 2 [73]. Values in bold indicate the proportions of individuals assigned to the source population (self-assignment). Values less than 0.001 were excluded from the