Marked Genetic Differentiation between Western Iberian and Italic Populations of the Olive Fly: Southern France as an Intermediate Area

The olive fly, Bactrocera oleae, is the most important pest affecting the olive industry, to which it is estimated to cause average annual losses in excess of one billion dollars. As with other insects with a wide distribution, it is generally accepted that the understanding of B. oleae population structure and dynamics is fundamental for the design and implementation of effective monitoring and control strategies. However, and despite important advances in the past decade, a clear picture of B. oleae's population structure is still lacking. In the Mediterranean basin, where more than 95% of olive production is concentrated, evidence from several studies suggests the existence of three distinct sub-populations, but the geographical limits of their distributions, and the level of interpenetration and gene flow among them remain ill-characterized. Here we use mitochondrial haplotype analysis to show that one of the Mediterranean mitochondrial lineages displays geographically correlated substructure and demonstrate that Italic populations, though markedly distinct from their Iberian and Levantine counterparts are more diverse than previously described. Finally, we show that this distinction does not result from extant hypothetical geographic limits imposed by the Alps or the Pyrenees nor, more generally, does it result from any sharp boundary, as intermixing is observed in a broad area, albeit at variable levels. Instead, Bayesian phylogeographic analysis suggests the interplay between isolation-mediated differentiation during glacial periods and bi-directional dispersal and population intermixing in the interglacials has played a major role in shaping current olive fly population structure.


Introduction
The olive fly, Bactrocera oleae (Rossi, 1790), is a tephritid fly with a broad range of distribution and a significant economic impact. The species is thought to have originated in Africa, where it is still pervasive in Southern and Eastern areas, and spread to the Mediterranean basin, as well as South Central Asia. More recently, and presumably as a result of human intervention, it has also invaded North America [1]. B. oleae is the most important olive tree pest [2], causing production losses that are estimated to average approximately 15% yearly, but vary significantly depending on the region and year, and can be as high as 35% [3,4,5,6]. To the agriculture of the Mediterranean basin, where more than 95% of world olive production is concentrated, damaged caused by B. oleae translates into average annual losses in excess of 1 billion dollars.
In the Mediterranean, olive fly control has relied mainly on chemical insecticides. However, growing concerns over environmental damage, human health and the development of insect resistance are inducing a gradual shift towards more integrated pest control approaches. Under these circumstances, it is now generally accepted that a comprehensive understanding of olive fly's population structure and dynamics is crucial for the design of effective management and control strategies [7,8,9]. Previous genetic studies have established that there are three distinct Old World populations of B. oleae, corresponding to the original regions of dispersal: Sub-Saharan Africa, South Central Asia and the Mediterranean basin [10]. There is also good evidence for the existence of population substructure in the latter region, but we are still far from a comprehensive view. Indeed, earlier studies [11,12] suggested three Mediterranean subpopulations ("Eastern", "Central" and "Western"), but pointed to substantial population admixture; later analyses, on the other hand, suggested deep separation of the mitochondrial DNA (mtDNA) lineages found in the three areas [13,14]; and two very recent studies [7,8], though providing further support for the distinction between populations, again raised the possibility of significant gene flow between them.
In a previous study [14], we addressed the issue of the separation between Iberian (Western Mediterranean) and Italic (Central Mediterranean) olive fly populations using mtDNA analysis, and presented evidence of a deep split between them. This result, in turn, raised the questions of where the "boundary" between the two populations could be and to what extent do they intermix. In order to answer those questions, we have now extended our mtDNA analysis to samples collected in a~1,000 km long swath of territory ranging from Eastern Spain to the Southern end of Liguria (Italy).

Olive fly samples
Olive flies were collected at or near 31 different localities, of which 10 across Portugal, four in Spain, 10 in France, six in Italy's Northern region of Liguria and one in Palestine (Table 1) by a combination of methods. Collections took place in public places for which permissions were not necessary, with three exceptions: in Nossa Senhora de Machede, on a property belonging to the corresponding author (LTC); in Valverde, on University of Évora's property; in Berqin, on a farm associated with Canaan Fair Trade (Kufor Qud Road, Burqin, Jenin, West Bank, Palestine), who graciously provided the olives. In most locations, olives were picked (preferably in groves not subject to chemical treatment) and stored in plastic bags, with emerging larvae, pupae and adults being collected periodically; a few larvae were also obtained by dissection of infested olives. In some locations, McPhail traps baited with a 5% solution of ammonium dihydrogen sulfate (Sigma-Aldrich) were placed in olive groves for periods of 1-2 weeks and inspected every 1-2 days. Trapped adults were collected and washed once in 70% ethanol before storage. All individuals were stored at -20°C in 70% ethanol until DNA extraction.

Selection of polymorphic mitochondrial DNA regions
Sections of mtDNA selected for amplification and sequencing were based on those used in our previous study [14], with adaptations (Table 2). Briefly, analyses centered on two of the five regions used in that study (total length = 1,826 bp; 11.5% of B. oleae's complete mitochondrial genome), as they allowed a good discrimination between the three Mediterranean mtDNA lineages previously identified (with at least seven mutation steps separating them). For haplotypes of the P lineage, two other regions were included (Table 2), to allow a finer resolution of sublineages.

DNA extraction, amplification and sequencing
Total DNA was extracted using a standard SDS/Proteinase K method. Amplifications of the various mtDNA segments (  [16]. Median-joining phylogenetic networks were constructed using the Network software v4.612 available at www. fluxus-engineering.com, using default parameters in all calculations [17]. Six cases of suspected heteroplasmy were identified, which did not interfere with lineage and sublineage assignement, but led us to exclude the corresponding samples from at least some of the computations. Publicly available sequences with accessions GU108459, GU108470, GU108464, GU108471, GU108465, GU108461, GU108460, GU108472, GU108473, GU108479, GU108475 and AY210702 were also used in the analysis, the latter as B. oleae's mtDNA reference sequence. A detrended canonical correspondence analysis (DCA) was initially performed to determine the type of response associated to the explanatory variables (sampling regions, longitude and latitude) in the olive fly haplotype distribution. A gradient length <3 was obtained, indicating a linear response to these variables. Redundancy analysis (RDA) was therefore performed to analyze the relationships between explanatory variables and response variables (olive fly clades). Samples belonging to the O clade were excluded from the analysis due to their small number, and other data were transformed using the formula Y = log(y+1), to eliminate zero values. The significance of the fraction of variability attributable to explanatory variables was determined by Monte-Carlo permutation analysis. DCA and RDA analysis were performed using Canoco for Windows version v.5 (Microcomputer Power, USA) [18].
Phylogenetic analyses were implemented in the BEAST1.8 software package [19,20] using the GTR+I model of sequence evolution (selected using the Akaike Information Criterion with correction) with mutation rates and p Inv as estimated by jModeltest [21] with a lognormal relaxed clock [22]. For phylogeographic analyses, a discrete trait (location) symmetric substitution model with a strict clock was also used. Given the results of Nardi et al. [13], a coalescent tree prior with exponential population growth, with priors on the age of the most recent common ancestor of Mediterranean samples, population size and population growth rate based on the results of the same study were used. To ensure convergence, two independent runs of 40 million generations (with a burn-in of 1 million and sampling every 1,600) were performed for each analysis. Trees were summarized and annotated with TreeAnnotator v1.8 and visualized using FigTree 1.4 (http://tree.bio.ed.ac.uk/software/figtree/) and GIMP2.

Results
Olive fly mtDNA diversity in the Italic peninsula is higher than previously described The main objective of this study was to investigate the geographic structure of the previously identified split between Italic and Iberian olive fly populations [14]. Considering the limited number of Italic sequences publicly available for analysis, we started by obtaining and analysing mtDNA sequences from the Italic peninsula. Though mitochondrial haplotype analysis is one of the most powerful and popular approaches to determine phylogenetic relationships between populations, its effectiveness naturally depends on the variability of the mtDNA fragments analysed, which, in the case of B. oleae, requires the use of relatively long stretches of mtDNA [23]. For this reason, we chose to analyse a 1.8 kb fraction of mtDNA harboring at least 7 differences between the Iberian and Italic samples previously studied [13,14]. Furthermore, based on the working hypothesis that the two large mountain ranges (the Pyrenees and the Alps) separating the Iberian and Italic peninsulas were the best candidates for a "natural border" between populations, we focused on Northwestern Italy. We therefore analysed 24 samples collected from six locations spread throughout Liguria, as well as 10 samples from Palestine, and compared their sequences to those from Western Iberian flies [13,14]. The results obtained confirm a clear distinction between Iberian and Italic B. oleae populations (F ST = 0.47), as illustrated by the haplotype network (Fig 1). The high F ST value found also demonstrates the validity of our choice of mtDNA sequences for studying olive fly population structure in Central and Western Mediterranean basin. Additionally, these results reveal olive fly mtDNA diversity in the Italic peninsula is higher than previously suspected, as all three Mediterranean mtDNA lineages were detected in the Ligurian samples (Fig 1). This is in sharp contrast to the results obtained in Western Iberia, where all 18 samples analysed belong to the same lineage, or around the Levantine region, where the 4 published sequences [13] and the 10 sequences obtained in this study also fall into a single (distinct) lineage. The three Mediterranean lineages were earlier referred to as Western, Central and Eastern Mediterranean or, alternatively, as Mauro-Iberian, Italo-Aegean and Levantine [14]. However, given our observations from Liguria, it seems more advised to use a less geographically-dependent nomenclature. In the remainder of this report they will therefore be referred to as the P, M and O lineages, designations that solely reflect the locations where they were first observed (Paradela, Monteccuco and Osmanyie, respectively).
Phylogeographically-correlated substructure in the P mtDNA lineage The results described in the previous section prompted us to increase the numbers of Iberian and Ligurian samples analysed to obtain better estimates of the contributions of each lineage to the populations. As shown in Table 3 and Fig 2A, the large majority of the Iberian haplotypes analysed (43/48; 89.6%) belonged to the P lineage. Among Italic samples, on the other hand, though there was a clear predominance of M lineage haplotypes (34/ 48; 70.8%), the P lineage was also frequent (22.9%). This seemed to indicate that the split between Iberian and Italic populations is less pronounced than previously thought. Intriguingly, however, we observed that the Ligurian and Iberian samples seemed to have distinct distributions within the P lineage, with 81.8% (9/11) of the former carrying either haplotype IT0626 (previously found in Sicily) or a derivative thereof (D132). This led us to hypothesize they might belong to a third P sublineage, previously unrecognized. To further investigate this issue, we analysed two additional mtDNA segments (Table 2), harboring variants previously found in Sicily but not in the Western Mediterranean [13], to determine if they could help to dissect the substructure of the P lineage. The results obtained (Fig 2B and 2C) indeed confirmed that lineage P can be subdivided, with Bayesian analysis indicating very high support (>98.8%) for clades P1, P2 and P3, and showed that all nine Ligurian samples with haplotypes IT0626 or D132 fall into P3, whereas the 43 Iberian P haplotypes belong to either P1 or P2. This demonstrates the P lineage has geographically-correlated substructure and that the split between Iberian and Italic populations is very pronounced, as the predominant clades found in each of the two peninsulas have only a limited presence in the other (Table 3, Fig 2).

No sharp population boundary between Northwestern Italy and Western Iberia
To determine the geographic structure of the split between Iberian and Italic populations we proceeded to compare them to a set of 56 samples collected throughout Southern France and the 14 available Levantine sequences ( [13] and this study). A total of 77 haplotypes, defined by 88 sequence variants were identified (S1 Table). As shown in Fig 3A, Tables 4 and 5, the    haplotype distribution within the French sample set was found to be intermediate between those of the Iberian and Italic peninsulas, suggesting that both the Alps and the Pyrenees act as barriers to olive fly's dispersal. However, upon closer inspection, no marked differences were observed between the haplotype distributions in Southeastern France and Liguria, on the one hand, and Southwestern France and Northeastern Iberia on the other (Fig 3B), as confirmed by redundancy analysis (Fig 4) and pairwise F ST s ( Table 5), demonstrating that, in fact, neither of the two mountain ranges acts as a barrier. Instead, significant genetic differentiation was detected between Western France and both Eastern France and Western Iberia (Fig 4 and Table 5). Together with the observed lineage distributions (Table 4), this raised the possibility that a geographically-correlated gradual transition was responsible for the sharp differences in population structure between Iberia and Northwestern Italy. However, redundancy analysis (RDA) revealed that, despite a high correlation with longitude (Fig 5), geographical position cannot be considered the only major determinant of population structure.

Bayesian analyses suggests ancient olive fly dispersal in both westwards and eastwards directions
To start dissecting the historical processes underlying the observed genetic structure, we used a phylogenetic diffusion model implemented in a Bayesian framework to obtain information regarding the ancestral history of our samples. When applied to the full sample set, this analysis yielded some interesting results (Fig 6). First, common ancestors of the P and M lineages that predominate in the Central and Western Mediterranean basin nowadays were already present in the area hundreds of thousands of years ago. This lends further support to the notion [13] that expansion of the olive fly to the Western Mediterranean is much older than olive tree cultivation. Second, although ancestors of Levantine samples in the 100,000 years BP (yBP) range were apparently from the Levant, the common ancestor of all clade O members-dated 250,000 yBP-was found to be~12 times more likely to have been from Italy. Third, there is strong evidence that clades P1 and P2 originated west of the Alps, whereas Mn derives from an Italian ancestor. Since haplotypes belonging to both P1 and Mn are now present from Iberia to Italy, this implies the occurrence of both westwards and eastwards ancient olive fly population movements in the Central and Western Mediterranean basin. Fourth, several clades were found to have mid to high probabilities of having originated in France. However, the calculated dates of all the relevant ancestors are anterior to the last glacial maximum, when paleoclimatic evidence indicates no olive trees would be present in Southern France [24]. This implies that extant populations in the region derive from relatively recent (<12,000-15,000 yBP) migrations from Iberia and/or Italy (where olive trees persisted throughout the glacial periods),  suggesting that using only Iberian and Italian samples would provide a more realistic assessment of the ancestors' locations. The results obtained with this analysis (Fig 7) strongly support the notion that clades P1 and P2 originated in Iberia, whereas P3 and M derive from Italian ancestors-and therefore the occurrence of ancient bi-directional migrations between the two peninsulas.

Discussion
Despite important advances over the past decade, many questions regarding the structure of B. oleae's populations in the Mediterranean basin remain open. Several studies point to the existence of at least three different populations, but the geographic limits of their ranges are poorly defined, and the level of differentiation among them is somewhat controversial [7,8,11,12,13,14]. Earlier microsatellite-based studies found either no [10] or limited [11,12] differentiation, whereas whole mitochondrial genome comparisons later pointed to clear separations [13]. Barring different dispersal patterns for males and females, which are unlikely [25], this difference was attributed to the particular mode of inheritance of mtDNA, and the use of complete mtDNA sequences [13]. Indeed, a previous study using only 0.6 kb of mtDNA had failed to detect intra-Mediterranean differentiation [10], in agreement with a previous suggestion that, given the observed level of sequence diversity in B.oleae, analysis of 2 kb or more of mtDNA would be warranted [23]. We therefore used 3.8 kb (~25% of the mitochondrial genome) in our previous study, which suggested a deep split between Western Iberian and Italic populations [14]. However, the results obtained also suggested that a smaller fraction could still provide good discrimination, prompting our option for the present work (1.8 kb). This option was fully vindicated, as Western Iberian vs Italic pairwise F ST values observed were at least eight times those obtained with nuclear microsatellites, demonstrating the superiority of mtDNA in discriminating between these populations.
Application of this strategy to our full sample set and comparison with previously reported sequences resulted in the identification of 77 haplotypes, defined by 88 sequence variants, of which 59 and 56, respectively, are reported for the first time (S1 Table). Fifty four novel haplotypes were identified in the Central and Western Mediterranean samples alone, more than doubling the number of described haplotypes from the region, which by itself represents a significant contribution to the genetic characterization of its B. oleae populations. The present study also provides an important extension of the region's coverage, particularly at the mtDNA level. Indeed, given previous data suggesting differentiation between Iberian and Italic populations [7,11,12,13,14], the limited information previously available from Southern France (five samples from a single location) represented a major gap [8] that has now been largely filled.
More importantly, the results of our mtDNA haplotype analyses shed a new light onto several aspects of B. oleae's population structure in the Mediterranean basin. First, they confirm our previous contention of a deep split between Iberian and Italic populations [14], as clearly indicated by a pairwise F ST value of 0.38 and mitochondrial lineage distributions-dominated (>89%) by P1+P2 in Iberia and M+P3 in Italy. However, they also reveal that this differentiation has a more complex structure than previously recognized. Indeed, both haplotype networks and Bayesian analysis show that lineage P is subdivided, not just in two sublineages, as previously suspected, but in three, with P3 (~19% of Italic sequences) and P2 respectively absent from Iberia and Italy, while P1, which predominates (~73%) in Iberia had only a residual (~4%) presence in Northwestern Italy. The P lineage substructure was therefore found to be geographically correlated and to provide an important contribution to the differentiation between Iberian and Italic populations. This result suggests that the presence of a P (P3) haplotype in Sicily [13] can be regarded as typical of Italy, unlike we had speculated [14], and highlights the importance of using a large enough fraction of mtDNA in phylogeographic studies of B. oleae [23]. Interestingly, the results also raised the possibility that P lineage substructure contributes to intra-Iberian population differentiation, as a large difference was observed between the frequencies of P2 in Western (25.0%) and Eastern Iberia (8.3%). As for the main questions this study meant to address, the results clearly show that the marked split between Iberian and Italic B. oleae populations does not result from extant hypothetical geographic limits imposed by the Alps or the Pyrenees, as no differentiation was detected across either of these mountain ranges. More generally, the results suggest there is no clearly defined boundary: different lineage distributions were indeed observed in Southwestern and Southeastern France (as reflected by the SW-SE pairwise F ST value) but, given the evidence for extensive intermixing (e.g . Fig 3b), this cannot be interpreted as evidence for a sharp boundary. In fact, intermixing was observed to extend at variable levels throughout virtually the whole area under analysis, from Northwestern Italy to Portugal. Together with the apparent patterns of regional lineage distributions (Table 4) and pairwise F ST values (Table 5) along the East-West axis, this raised the possibility of a stepwise variation in population structure. However, a gradual, geographically-correlated, transition was not supported by either RDA (Fig 5) or a Mantel test (not shown). Bayesian phylogeographic analysis of the data, combined with paleoclimatic evidence provides important clues to how this population structure came about. Indeed, the most recent common ancestors (MRCAs) to the P1, P2 and P3 lineages have all been dated to 100,000-200,000 yBP (prior to the Würm glacial period), the first two in Iberia and the latter in Italy. Similarly, MRCAs to the P and M lineages were dated to the Mindel glacial period-the latter likely in Italy and the former having similar probabilities of being from either peninsula. This suggests that the current population structures in Western Europe-including the intermediate one found in Southern France-were likely generated by cycles of glaciation-mediated isolation and bi-directional dispersal and consequent intermixing after the end of the glacial periods.
Importantly, we also found evidence suggestive of intermixing with populations from the Eastern Mediterranean. Indeed, O lineage sequences, typical of the Levantine regions (e.g. all sequences from the Palestinian samples analysed belong to this lineage) were found in Italy at levels similar to those of lineages typical of Iberia, and even in Eastern Spain. Such intermixing would be consistent with the recent observations of some level of differentiation (but not a clear split) between regions of Turkey around the Levant and close to the Aegean [8] and implies that the source of the olive fly's invasion of California could be located within an area significantly larger than previously thought. Interestingly, Bayesian analysis points to Italy, rather than the Levant, as the more likely source of the O lineage. However, given the lack of comparable information on the genetic make-up of the wide olive-growing area between Italy and the Levant (and also the limited number of O lineage sequences from Italy analysed), this result should be viewed as raising an intriguing hypothesis that needs to be tested in future work. More generally, it is important to emphasize that, given the large uncertainties in dating common ancestors, the notion that glacial cycles played a central role in shaping current olive fly population structure in Western Europe might still be challenged in the future. The overall picture suggested by the present work for the olive fly's population structure in the Mediterranean basin is one of higher levels of lineage homogeneity at the limits of its Mediterranean range (Western Iberia and the Levant-as shown by the Palestinian samples), with variable (possibly distance-related) intermixing throughout most of the range. This picture is similar to that derived from microsatellite analyses [11,12], with at least two important differences: first, it implies the existence of three different dispersion sources, one in the West (in Iberia or, more generally, around the Gibraltar Straits), one in the East (the Levant [26] or, more generally, the fertile crescent [27]) and one in the Italo-Aegean area (or even modern-day Tunisia [7]); second, dispersion was largely human-independent and in both West-to-East and East-to-West directions. Given that Iberia and the Levant have been identified as major long-term olive tree refugia, our results also raise the possibility of co-evolution of B. oleae and its host tree in the Mediterranean.

Conclusions
Olive fly populations in the Italic peninsula are markedly different from their counterparts in Iberia and the Levant, even though representatives from the mtDNA lineages that predominate in the latter two regions can also be found in Italy. One of the three Mediterranean mtDNA lineages displays geographically-correlated substructure, with the sublineage predominating in Iberia being rare in Italy and vice-versa. The marked differentiation between Iberian and Italic olive fly populations is not due to extant constrains on gene flow imposed by the Alps or the Pyrenees nor, more generally, to any extant sharp boundaries, but likely to the interplay between isolation-mediated differentiation during glacial periods and bi-directional dispersal and consequent widespread population intermixing in the interglacial periods.
Supporting Information S1 Table. MtDNA haplotypes found in a comparison of olive fly populations from Iberia, France, the Italic peninsula and the Levant. (XLS)