The Oriental Fruit Fly, Bactrocera dorsalis, in China: Origin and Gradual Inland Range Expansion Associated with Population Growth

The oriental fruit fly, Bactrocera dorsalis, expanded throughout mainland China in the last century to become one of the most serious pests in the area, yet information on this process are fragmentary. Three mitochondrial genes (nad1, cytb and nad5) were used to infer the genetic diversity, population structure and demographic history of the oriental fruit fly from its entire distribution range in China. High levels of genetic diversity, as well as a significant correspondence between genetic and geographic distances, suggest that the invasion process might have been gradual, with no associated genetic bottlenecks. Three population groups could be identified, nevertheless the overall genetic structure was weak. The effective number of migrants between populations, estimated using the coalescent method, suggested asymmetric gene flow from the costal region of Guangdong to most inland regions. The demographic analysis indicates the oriental fruit fly underwent a recent population expansion in the Central China. We suggest the species originated in the costal region facing the South China Sea and gradually expanded to colonize mainland China, expanding here to high population numbers.


Introduction
The oriental fruit fly, Bactrocera dorsalis (Hendel), is one of the most important pests on fruits and vegetables across South East Asia and the Pacific region [1].Being highly polyphagous, the oriental fruit fly can infest a wide variety of fruit crops, such as citrus, mandarin, peach and mango [2,3], and induce significant economic losses through direct fruit damage, fruit drop and export limitations associated to quarantine restrictions.Furthermore, due to its broad host range, relatively ample climate tolerance, high reproductive potential and dispersal capacity [4], the oriental fruit fly is considered to have a high invasive potential.
The oriental fruit fly has significantly expanded its geographic distribution in the last century.Early records report its presence in 1912 in Taiwan [5].Henceforth, this species colonized different areas of the Asian and Pacific regions, such as India, Pakistan, Nepal, Vietnam, Laos, Burma, Thailand, Sri Lanka, the Northern Mariana Islands (eradicated), Hawaii, Guam, with transient appearances in California and Florida [1,2].The potential distribution analysis showed that the oriental fruit fly is likely to expand in the next future North and South-ward into areas currently cooler [2].
After its initial recognition in Taiwan, the oriental fruit fly was reported in 1934 on Hainan Island, China [6], and sparsely in disjointed areas of Southern China until the 1970s.Since the 1980s the population size of the oriental fruit fly increased quickly and the distribution area expanded rapidly to cover most areas south of the 26N parallel.In the last decade the oriental fruit fly expanded across the Yangtze River to reach the 32N parallel [7][8][9] and is expected to expand further North [10].
Despite the economic and ecological threats associated with the invasion of the oriental fruit fly, data on its phylogeography are scarce.An early study investigated the relationship between two laboratory and three wild populations [11].Afterwards, studies on Bactrocera dorsalis population genetics and phylogeography mostly addressed specific and/or geographically limited issues [12][13][14][15][16][17], until a first integrated attempt to study this species in a substantial part of its range conducted by Aketarawong et al. [18].Aketarawong and colleagues suggested China as the origin of fly populations in South-East Asia and could describe a well supported pattern of expansion from the region of Guangdong to this latter area.Nevertheless, their study included only two Chinese populations, from Guangdong and Taiwan, limiting their possibility to study the dispersal of the oriental fruit fly in mainland China.
Mitochondrial DNA, due to its accelerated rate of evolution, short coalescence time and simple maternal inheritance, has been used as a marker of choice for historical phylogeography and complements well with the information provided by microsatellite markers.Thanks to the possibility to reconstruct evolutionary relationships among haplotypes, the mitochondrial DNA is particularly informative to reconstruct historical processes, such as the identification of the region of origin of a species, the pathways of invasion and historical demography, and has been repeatedly used to study the spread of alien species [19][20][21][22][23][24][25][26].Furthermore, mitochondrial markers have repeatedly been used in fruit flies, such as the olive fly [27][28][29], the pumpkin fruit fly [30], the melon fruit fly [31] and the Mediterranean fruit fly [32][33][34].Limitation to the use of mitochondrial markers is that the entire molecule is non recombining and inherited as a single locus, henceforth limiting the possibility to average across markers to take into account the random nature of the coalescence process.
With this study we specifically focus on Chinese populations, previously identified as a likely source for the species, to tackle the issue of the origin and range expansion of the oriental fruit fly.Specifically at issue are: a) a first exploration of Chinese diversity and definition of local genetic groups; b) the reconstruction of the major routes of expansion in mainland China and the definition of the demographic profile associated with the expansion, and c) the identification of the region of origin of the species and a reevaluation of the different hypotheses proposed in the literature.

Sampling, DNA extraction and sequencing
Samples of Bactrocera dorsalis were collected from 12 locations covering the entire distribution range in China during years 2009-2010 using traps baited with Methyl Eugenal (Tab.1; Fig. 1).Specimens were preserved in 95% ethanol at 220uC until processing.

Data analysis
Sequences were aligned using ClustalX (ver.2.0) [36] and unique haplotypes were identified in ARLEQUIN (ver.3.5) [37].Descriptive statistics (number of variable sites, number of  haplotypes, haloptype diversity, nucleotide diversity, average number of nucleotide difference between haplotypes) were calculated in Dnasp (ver.5.0) [38].Spatial analysis of molecular variance was performed using SAMOVA (ver.1.0) [39] to identify population groups, with concatenated sequences of the three genes for each individual, longitude and latitude information as input data.The most supported number of groups (K) was determined by repeating the analysis with K ranging from 2 to 6 and selecting the subdivision scheme associated with highest F CT .An AMOVA hierarchical analysis of variance was performed using ALEQUIN to partition total variance in its components among groups, among populations and within populations, based on the groups inferred by the SAMOVA analysis.The correlation between genetic (F ST /1-F ST ) and geographic distance matrices (in ln scale) [40] was tested using the IBDWS web service [41] with 10000 randomizations.Medianjoining networks of haplotypes of each of the three genes were constructed using NETWORK (ver.4.6) [42,43] to study the evolutionary relationships among haplotypes.
The extent of gene flow between population pairs was studied using the coalescent-based strategy implemented in MIGRATE (ver.3.2.7)[44].To determine if there was asymmetrical gene flow between populations, the mutation scaled effective immigration rate (M = m/m) entering and leaving each population per generation and the mutation scaled effective population size (H = N e m) were jointly estimated applying the Bayesian search strategy.N e m was calculated by multiplying these latter values.Four independent runs of MIGRATE, each consisting of one long chain of 100,000,000 generations with the initial 10,000 excluded as burn-in of the analysis, were conducted to assess consistency in the results, changing seed number at each run.
The demographic history of all populations pooled together and of each of the three population groups identified by the SAMOVA analysis was studied using mismatch distributions in ARLEQUIN.Tajimas'D ad Fu's F S were calculated to test for neutrality.Population size before expansion (h 0 ), population size after expansion (h 1 ), population expansion time (t), and sum of squared deviation (SSD) between observed and expected mismatch distributions were similarly calculated.All parameters were tested against the expected values under the hypothesis of a recent population expansion based on 1000 bootstrap replicates.

Genetic diversity
Collapsing of individual sequences led to the identification of 132, 143 and 142 unique haplotypes for genes nad1, cytb and nad5, respectively, or 164 after concatenation of the three markers for each individual.
Basic descriptive statistics, calculated for each population based on concatenated sequences as well as each of the three genes independently, are shown in Tab.S1.The number of haplotypes per population (n) ranged from 9 to 20, 6 to 19, 8 to 20, 9 to 20 in concatenated sequences, nad1, cytb and nad5, respectively.Haplotype diversity (H) ranged from 0.9006 to 1, 0.8889 to 1, 0.8947 to 1 and 0.8947 to 1, nucleotide diversity (p) from 0.0093 to 0.0125, 0.0077 to 0.0147, 0.0105 to 0.0194 and 0.0081 to 0.0117, similarly in concatenated sequences, nad1, cytb and nad5.These figures suggest that all populations retain fairly high levels of genetic diversity.

Genetic structure
Monitoring of F CT values in the SAMOVA analysis suggested 3 as the optimal number of population groups (F CT3 = 0.04231).The 12 populations were clustered as follows: Jiangjin, Wulong, Wanzhou, Xiushan, Huaxi, Nanning, Wuhan, Nanchang and Jianshui; Fuzhou and Guangzhou; Wenchang.These three groups correspond to three geographically well defined regions that we refer to as Central, South-East and Southern China in the following presentation (see Fig. 1).
The AMOVA analysis revealed that a substantial portion of genetic differentiation is partitioned among groups (4.22% based on concatenated sequences, from 2.29% to 6.96% based on individual genes) and inside populations (94.95%; 92.63% to 96.89%), while genetic differentiation between populations inside each of the three groups identified is limited (0.84%; 0.4% to 1.29%) (Tab.2).Accordingly, differentiation among groups (F CT ) and within populations (F ST ) are highly significant, while differentiation among populations between groups (F SC ) was not or marginally (nad5 only) significant (Tab.2).The Mantel test for correlation between genetic and geographic distances revealed a significant correlation based on the concatenated dataset (r = 0.3667; P = 0.008) (Fig. S1 A) as well as based on each individual gene (r = 0.3852, P = 0.005; r = 0.3010, P = 0.025; r = 0.3852, P = 0.008 for nad1, cytb and nad5, respectively) (Fig. S1 B, C, D).

MJ networks of haplotypes
Median Joining networks reconstructed from haplotypes of the three genes are shown in Fig. 2. Networks are generally star-like with limited substructure.Some haplotypes positioned in the center of the networks are found at higher frequency in two or all three population groups (such as H77 for nad1, H7 and H8 for cytb, H4 for nad5), with most remaining haplotypes that are found in one single population group, generally at low frequency and connecting to central haplotypes through few mutations.Some missing haplotypes (32, 53 and 50 in nad1, cytb and nad5, respectively) were inferred.

Gene flow
The effective mutation scaled population size was estimated for each population and the amount of mutation scaled immigration rate in both directions was estimated for all 66 population pairs (Tab.S2).High levels of migration rate were detected among populations, ranging from 87.6 (Xiushan to Guangzhou) to 853.0 (Guangzhou to Fuzhou).Migration rate was generally symmetrical in population pairs, i.e. the immigration rate was at par with the emigration rate.Some instances of asymmetric migration rate The mismatch distribution of all 12 B.dorsalis populations pooled together as well as of the Central China group only were distinctly unimodal (Fig. 3, 4), suggesting the further testing of a sudden expansion model.
The mismatch distribution of the 12 populations pooled together was compatible with the sudden expansion model based on concatenated sequences (P SSD = 0.556) as well as each individual gene (see Tab.The ratio between estimated effective population size after expansion (h 1 ) and before expansion (h 0 ), an estimate of the extent of population growth, is 127966 for the entire dataset and 52236 for the Central China group based on concatenated sequences.Individual genes, although with differing numerical estimates, similarly suggest a large population growth, from 336 to 71446 for the entire dataset and from 856 to actually infinite for the Central China group.

High level of genetic diversity
Invasive species are generally associated to a loss in genetic diversity that can take place, during an invasion process, due to a) increased genetic drift associated to the temporarily reduced population size in founding colonies [45][46][47] and b) increased selection pressure encountered during the colonization of new habitats [48].
Chinese populations of the oriental fruit fly, nevertheless, seem to retain fairly high levels of genetic diversity, as exemplified by the high observed values of haplotype diversity.Noteworthy, this is observed even in populations such as Wuhan, Xiushan, Wulong, Wanzhou and Jiangjin that, based on collection records, established not longer than a decade ago and that display values of genetic variability actually higher than some of the oldest populations in the study set, Fuzhou and Guangzhou.
Nevertheless, some characteristics of the oriental fruit fly and the ecology of the area have to be considered that could help explain why no loss in genetic variability is observed concurrent to the colonization process.Due to the high reproductive potential of Table 3. Parameters of demographic history of the collated 12 populations and each of three population groups independently.this species and high dispersal capabilities, together with the relative absence of major geographical or ecological barriers to dispersal in the area, it is possible to envision how the oriental fruit fly might have gradually expanded without any significant or prolonged bottleneck.Furthermore, the relative abundance of suitable host fruits such as mango, carambola and guava, the large orange plantations in South China and Yangtze valley, and the relative uniformity and stability of a suitable tropical and subtropical monsoon climate in this area might have exerted little novel selective pressures during the range expansion in Central China.Furthermore, multiple introductions, or colonization of a given area from multiple sources, can counteract the drop in genetic variability associated with colonization or rescue a species from an actual loss in genetic diversity [48].Such cases of increased genetic variability due to a secondary mixing between previously diversified populations have been repeatedly described in the literature [49][50][51][52][53] and multiple introductions have been previously suggested to explain the relatively high genetic variability of the oriental fruit fly in the Yunnan province of China [13].
The oriental fruit fly may have, therefore, expanded quickly but gradually in mainland China, without the significant bottlenecks generally associated with a range expansion that takes place through invasive propagules of few individuals.This view is further in line with the high levels of gene flow and limited population differentiation observed (see below) and with the situation described by Aketarawong et al. [18] in South-East Asia and Shi [13] in the province of Yunnan.

Weak genetic structure
Although some structure could be identified by the SAMOVA analysis, leading to the partitioning of the 12 populations studied in the three groups Central, South-East and Southern China, the overall level of differentiation was low.According to the analysis of molecular variance, more than 90% of variability was observed inside populations, with no or marginal differentiation between populations and some differences arising only between population groups.This is rather unexpected given the distances involved, with the area occupied by the Central group being roughly equivalent to Central Europe in size, but in line with other examples of highly vagile fruit fly species that have expanded in large areas in relatively recent times [28,54].
The median joining networks for the three genes, accordingly, did not describe any underlying structure, as networks are distinctively star-like and haplotypes sampled in the three regions, not to mention populations, appear randomly distributed.The Mantel test identified a significant, though not strong, correlation between genetic and geographic distance, suggesting a certain degree of isolation by distance with no major discontinuity.
Taken together, these observations suggest a relative genetic uniformity of the oriental fruit fly in China.This is in line with the notion that the species has high dispersal capacities, with eggs and larvae that can disperse efficiently inside host fruits both under natural conditions (i.e.along rivers and ocean currents [55]) and artificially through trades of infested fruits, and adults that can fly as far as 46 km under experimental conditions [56].
It is therefore possible to hypothesize that the natural barriers present in the study area (Daba Mountains, Hengduan Mountains, Wuyi Mountains, Yangtze River, Zhujiang River and Qiongzhou Strait) are not sufficient to interrupt or determine a significant limitation to gene flow.A similar situation has been described in the region of Yunnan, where weak genetic structure was observed among 14 oriental fruit fly populations despite the presence of three big rivers and mountain ranges running North-West to South-East in the area [57].Opposite results, i.e. the insurgence of a clear genetic structure, were in turn obtained for more sedentary species, such as the rice stem borer Chilo suppreddalis [58], where rivers and mountain ranges act as effective barriers to gene flow.

Demographic history
The significant negative Tajiams'D and Fu's Fs values showed that the oriental fruit fly populations in mainland China do not fit a simple model of selective neutrality [59,60] due to an excess in low frequency alleles.This, together with collection data that indicate a substantial increase of oriental fruit fly in terms of both geographic range and population numbers, suggested the possibility of a recent population expansion.In turn, analysis of Tajiams'D and Fu's Fs estimators separately in the three groups indicated that the Central China group may have specifically undergone a regime of population expansion.The unimodal mismatch distribution [61] for all genes in the entire dataset and specifically in the Central China group, as well as non significant SSD values against the null hypothesis of a sudden population expansion further support this hypothesis [62].The notion that the species underwent a significant population expansion is further in line with the observation that Median Joining networks for the three genes have a distinctly star like structure, typical of expansion demographic processes.
These results are concordant with trapping data in the area.Oldest presence of the species is reported from the area facing the island of Taiwan, where the species was first detected, and the island of Hainan.These two areas correspond to population groups South-East and South China, respectively, that based on our data show no sign of population expansion.In turn, the marked range expansion observed in the last few decades mainly interested mainland China up to the 32N parallel, a very large region corresponding to population group Central China, for which distinctive signs of population expansion could be described.
Based on historical records and the genetic data presented here, it is therefore possible to reconstruct a process of fast and recent range expansion from the coastline area facing the South China sea, where the species has been likely established for a longer period of time, to a very large mainland region that has been colonized in the last few decades.Based on the high genetic diversity in the area and limited genetic structure (see above), we hypothesize that this process of colonization may be interpreted as a gradual, though fast, range expansion associated with high population numbers and population growth.The opposite scenario of a stepping stone model of repeated colonization events through numerically limited propagules may on the other the hand be excluded, as no trace of genetic bottlenecks and related drop in genetic variability is observed.
The notion that local natural barriers seem to be rather ineffective in counteracting oriental fruit fly dispersal, as well as the potential distribution analysis, suggests that the process described here of population expansion in mainland China may interest other regions north of the 32N parallel in the next decades.

Region of origin
Different hypotheses have been proposed about the geographic origin of the oriental fruit fly.While the current distributional range of the species is rather ample, from India to Hawaii encompassing all South-East Asia, historical records clearly show that marginal populations represent recent introductions, with the species being most likely East-Asian in origin.One initial hypothesis on the geographic origin of the oriental fruit fly, based on the earliest records of its presence in the island, is Taiwan [6].Subsequent studies, based on the high levels of local genetic variability, hypothesized Yunnan as a possible source area for the species [13], or at least an area of old colonization [57].The most comprehensive study to date on the phylogeography of this species [18], in turn, suggested China as the source area for South-East Asian populations, and possibly for the species as a whole, based on the observation that China is equally divergent from Pacific and South-East Asian populations.Furthermore, a well supported pattern of a West-ward migration from China to colonize South-East Asia was described.Noteworthy, Aketarawong and colleagues, based on asymmetrical gene flow values, could exclude Taiwan as the source of the invasion.
Based on our extended sampling in mainland China, that greatly expands the current knowledge on local oriental fruit fly diversity beyond the region of Yunnan [13,57] and one single population in Guangdong [18], we could further explore the patterns of diversity and range expansion in China to support an origin of the species in the coastal region facing the South China Sea, corresponding in genetic terms to the population group here described as South-East China.Main support for this hypothesis is found in the asymmetric gene flow observed from location Guangzhou (Guangdong) to 6 out of 11 alternative locations scattered in mainland China.Furthermore, this hypothesis is in line with the historical records showing that population groups South and South-East China occupy regions where the species has been present since the last century, while population group Central China, that we associate with the expansion process, occupies the mainland regions where the oriental fruit fly has been reported in the last two decades only.
Further support for the costal region of Southern China being the source area for the species as a whole comes from a joint interpretation of our results and those presented by Aketarawong et al. [18].The two studies, taken together, cover the majority of the oriental fruit fly geographic range, and all regions that are plausible candidates for the early differentiation of the species: China and South-East Asia from Bangladesh to Hawaii.In turn, both studies describe a similar pattern of expansion from a single and the same region (Guangdong) throughout South-East Asia [18] and mainland China (this study), with a distinct signal of asymmetrical gene flow out of this region.

Figure 2 .
Figure 2. Median joining networks of haplotyps.Pie area proportional to haplotype frequency.A: nad1; B: cytb; C: nad5; blue: Central China group; yellow: South-East China group; pink: Southern China group; red: inferred haplotypes.doi:10.1371/journal.pone.0025238.g002 3).Parameters of the expansion model were h 0 = 0.007, h 1 = 89.575and t = 24.453.The mismatch distribution of the Central China group was similarly compatible with the sudden expansion model based on concatenated sequences (P SSD = 0.455) and individual genes (see Tab. 3).Parameters of the expansion model for the Central China group were h 0 = 0.019, h 1 = 99.248 and t = 22.164.The sudden expansion model was in turn rejected for population groups South and South-East China (P,0.05).Individual genes (see Tab. 3) are in accord with concatenated sequences, all being compatible with the sudden expansion model for all 12 populations pooled and the Central China group (P.0.05) and most rejecting the expansion model in the other two population groups.

Figure 3 .
Figure 3. Observed and simulated mismatch distributions of entire sample.A, B, C and D are for concatenated sequences, nad1, cytb and nad5, respectively, the horizontal axis represents the number of pairwise differences, the vertical axis represents the relative frequency.doi:10.1371/journal.pone.0025238.g003

Figure 4 .
Figure 4. Observed and simulated mismatch distributions of Central China group only.A, B, C and D are for concatenated sequences, nad1, cytb and nad5, respectively, the horizontal axis represents the number of pairwise differences, the vertical axis represents the relative frequency.doi:10.1371/journal.pone.0025238.g004

Figure
Figure S1 Scatter plots of genetic distance vs. geographic distance (in ln scale) for pairwise population comparisons.A: concatenated sequences, B: nad1, C: cytb, D: nad5.(TIF)Table S1 Genetic diversity indices.V, number of variable sites; n, number of unique haplotypes; H, haplotype diversity; p, nucleotide diversity; k, average number of nucleotide differences.(DOC) Table S2 Estimates of population size and effective immigration rate between populations pairs.H: mutation scaled effective population size; M: mutation scaled effective immigration rate.In parentheses the 95% HPD intervals.Instances of asymmetrical gene flow are indicated in bold.The source population is indicated in columns, the target population in row.(DOC)

Table 2 .
Partitioning of genetic variation at different hierarchical levels.