Bridgehead Effect in the Worldwide Invasion of the Biocontrol Harlequin Ladybird

Recent studies of the routes of worldwide introductions of alien organisms suggest that many widespread invasions could have stemmed not from the native range, but from a particularly successful invasive population, which serves as the source of colonists for remote new territories. We call here this phenomenon the invasive bridgehead effect. Evaluating the likelihood of such a scenario is heuristically challenging. We solved this problem by using approximate Bayesian computation methods to quantitatively compare complex invasion scenarios based on the analysis of population genetics (microsatellite variation) and historical (first observation dates) data. We applied this approach to the Harlequin ladybird Harmonia axyridis (HA), a coccinellid native to Asia that was repeatedly introduced as a biocontrol agent without becoming established for decades. We show that the recent burst of worldwide invasions of HA followed a bridgehead scenario, in which an invasive population in eastern North America acted as the source of the colonists that invaded the European, South American and African continents, with some admixture with a biocontrol strain in Europe. This demonstration of a mechanism of invasion via a bridgehead has important implications both for invasion theory (i.e., a single evolutionary shift in the bridgehead population versus multiple changes in case of introduced populations becoming invasive independently) and for ongoing efforts to manage invasions by alien organisms (i.e., heightened vigilance against invasive bridgeheads).


Introduction
Elucidating the routes and modalities of introductions of undesirable organisms is crucial for managers who wish to prevent new invasions. [1,2]. It is also a prerequisite to generating and testing useful hypotheses regarding the environmental and evolutionary factors responsible for biological invasions [3]. In the large body of literature on biological invasions, a number of studies suggest that successful invasions involve a particular invasive population, which serves as the source of colonists for remote new territories [4,5,6,7,8]. We call here this phenomenon the invasive bridgehead effect, likening the successful population of a biological invader to a military force that establishes a foothold, historically at the far side of a bridge, prior to further incursions into hostile territories. Convincing demonstrations of such an invasive bridgehead effect are however scarce because of the lack of appropriate methods to confidently reconstruct the routes of invasion and hence formally test this scenario against alternative ones. Moreover, too few invasive populations have been studied to capture the global picture of the worldwide invasion process for most species.
The harlequin ladybird Harmonia axyridis (HA), or multicolored Asian lady beetle, is an appropriate biological model to test for the existence of an invasive bridgehead effect at a worldwide scale [9]. Native to Asia, HA has been introduced repeatedly as a biocontrol agent against aphids since 1916 in North America [10,11], since 1982 in Europe [12] and since 1986 in South America [13]. Despite recurrent intentional releases of ladybirds originating from various source populations in its native range for acclimation attempts, the species did not establish for decades. However, for unknown reasons it recently and suddenly became invasive on four different continents. Invasive populations were first recorded in eastern (Louisiana, USA) and western (Oregon, USA) North America in 1988 and 1991, respectively [14,15]. They were then recorded in Europe (Belgium) [16] and South America (Argentina) [17] in 2001 and in Africa (South Africa) [18] in 2004. The species has spread widely in these areas where it has become a harmful predator of non-target arthropods, a household invader, and a pest of fruit production [19].
Most of our knowledge about introduction pathways of invasive species relies on historical and observational data, which are often sparse, incomplete and sometimes misleading. Population genetics has proven to be a useful approach to reconstruct routes of introduction, highlighting how complex and counter-intuitive the real story can be [4,5]. This is especially true since the recent development of a new approach termed approximate Bayesian computation (ABC) [20,21,22,23,24]. Using molecular and historical data, this method allows performing model-based inference in a Bayesian setting for complex demographic or evolutionary scenarios such as those related to the introduction histories of invasive species, where bottleneck, multiple introductions and/or genetic admixture events are often suspected. Here, we retraced the routes of all five worldwide invasive HA populations using a rigorous quantitative analysis of microsatellite genetic variation relying on ABC methods. We compared large sets of HA invasion scenarios covering all invaded areas, taking into account historical data (e.g. dates of first observation of the outbreaks and dates of initial collection of biocontrol strains) and all potential sources (native, older outbreaks and biocontrol), accounting for the possibility of genetic admixture among them.

Results and Discussion
Our results unambiguously point to a surprising scenario (Fig. 1). The outbreaks in eastern and western North America originated from two independent introductions from the native range (either from biocontrol or accidental introductions). The South American and African outbreaks both originated independently from eastern North America. The European outbreak also originated from eastern North America, but with substantial genetic admixture with individuals of the European biocontrol strain (estimated at 43%, 95%CI: [18%-83%]; Fig. 2a). The establishment of new invasive populations is characterized by low (South America) to minute (all other outbreaks) bottleneck severities suggesting a substantial number of founding individuals and/or a quick demographic recovery (Fig. 2b). Each choice of scenario is supported by very high posterior probabilities using two different sets of prior distributions of demographic, historic and mutation parameters (Fig. 1, Table S2 and Table S3). In all analyses the 95% CI of the most likely scenario never overlapped with those of competing scenarios. We also computed the type I and type II errors from control simulated data sets for each of the five nested analyses and found that our method selected the true scenario with high confidence and markedly low type II errors (Table 1).
For the scenarios not complicated by substantial admixture, the raw classical statistics measuring genetic variation between populations such as pairwise F ST and assignment likelihood [23] agree with our conclusions based on ABC methods (Table S4). On the other hand, such classical statistics are unable to detect North America and the European biocontrol strain as the sources of the admixed European invasive population. Rather, these simpler methods suggest that the latter outbreak originated from the native area only. We show with simulated data that, in case of admixture between two populations deriving from the same source, pairwise F ST and assignment likelihood values tend to select the ancestral population (here the native area) as the source of the admixed population (Fig. S2). An admixed origin of the European invasive population involving eastern North America and the European biocontrol strain as inferred from ABC is also supported by the raw allelic distributions observed in these samples (see Fig. S3 for an illustration). The ABC methods used here have hence three advantages over more standard methods based on raw measures of genetic distance: they use all the data simultaneously in inference, allow an assessment of uncertainty in all inferences and therefore provide confidence in the choice of invasion routes [23], and they avoid misleading biases such as those due to genetic admixture (if included in the set of compared scenarios), an increasingly acknowledged common feature of species invasions [3,5].
The complete panorama of the invasion history of HA strikingly points to the worldwide dissemination of a single successful invasive population (eastern North America) into remote new territories. Although ladybirds were repeatedly introduced in North America, Europe and South America for biocontrol from native Asian populations and laboratory strains, eastern North America is the proximate origin of the worldwide invasion of HA. This pattern illustrates what we call the invasive bridgehead effect, whereby a particular invasive population serves as the source of colonists for other areas. Our results have important implications for the understanding and the management of biological invasions. The history of HA reveals that prevention of new invasions should include minimizing accidental dissemination from invasive bridgehead populations (here eastern North America) rather than focusing on the native range of invasive species. For instance, we found that live HA individuals intercepted in Europe (Norway) in 2007 on imported timber most likely originated from eastern North America based on the sample being significantly differentiated genetically from all tested populations (p,1610 26 ) except that of eastern North America (p = 0.060). The implication of  eastern North America being an invasive bridgehead, together with the absence of remote establishment and slower local spread of the population in western North America and the long history of unsuccessful biocontrol introductions of HA from its native range, jointly suggest that an evolutionary shift triggering invasion occurred in eastern North America. It is worth pointing out that the invasive bridgehead effect is evolutionarily parsimonious: a single evolutionary shift in a single introduced population (the bridgehead) is required whereas multiple changes are required if introduced populations had become invasive independently. In Europe, the potential role of admixture with the European biocontrol strain is unknown, but the single eastern North American origin of the South American and South African outbreaks suggests that the genetic admixture observed in Europe is not required for an eastern North American propagule to establish and start an invasive population in diverse ecological contexts.
In conclusion, our results provide a convincing demonstration of an invasive bridgehead scenario in a worldwide emblematic invasive species. That invasion can proceed via a bridgehead has important implications both for invasion theory (i.e. the number and nature of evolutionary shift(s) involved in large scale invasions) and for ongoing efforts to manage invasions by alien organisms (i.e. heightened vigilance against invasive bridgehead populations). Our study highlights the interest of new model-based methods such as approximate Bayesian computation to rigorously compare complex invasion scenarios, including those with genetic admixture, a feature of species invasions that is increasingly acknowledged to be common [2,3,5]. It would be useful to revisit previously published genetic data sets on other worldwide invasive species using similar ABC approaches to examine whether bridgeheads and/or admixtures are a common mechanism driving invasion. Our ability to confidently reconstruct the routes of introduction of worldwide invaders is crucial to generating sensible hypotheses regarding the environmental and evolutionary factors responsible for biological invasions [3]. In HA, our results suggest that an evolutionary shift triggering invasion likely occurred in the bridgehead population in eastern North America. Forthcoming studies based on quantitative genetics approaches [3,25] will shade light on the evolutionary basis of HA invasiveness in eastern North America and into the potential role of genetic admixture in Europe.

Population samples and genotyping
Population samples were genotyped at 18 microsatellite markers [26]. The samples of populations from the five invaded areas ( Fig. 1 and Table S1) were collected near the sites where the outbreaks were first observed. The three samples from the native range were collected in eastern Asia, where individuals historically used for biocontrol purposes had been collected [10,11,12]. The three native population samples were genetically homogeneous (mean pairwise F ST = 0.0057), and were therefore pooled into a single sample. Using native population samples individually did not change our conclusions. We genotyped numerous additional population samples collected from all invaded areas (30 additional samples) and from the native range (5 additional samples); analysis of these samples indicated that the subset of samples used in the study provides an adequate description of the main invasive and native populations (i.e. none significant to low level of genetic differentiation within the native range and within each invaded area). The European biocontrol sample (Ebc) was from the INRA rearing stock of 1987 initially collected from the native area in 1982 and used for biocontrol in Europe and South America. Genotyping of the biocontrol strains produced in European biofactories from the late 90's to the present (4 additional samples) confirmed that they are indeed derived from the original INRA strain we genotyped.

Inferring invasion scenarios using approximate Bayesian computation
Genetic variation within and between populations was summarized using a set of statistics traditionally employed in approximate Bayesian computation analyses (ABC) [23,24]. For each population and each population pair we used the mean number of alleles per locus, the mean expected heterozygosity and the mean allelic size variance. The other statistics used were the mean ratio of the number of alleles over the range of allele sizes, pairwise F ST values, mean individual assignment likelihoods of population i assigned to population j and the maximum likelihood estimate of admixture proportion.
We performed five serial nested ABC analyses of invasion scenarios involving successive HA outbreaks (Table S1). In analysis 1, we dealt with the introduction pathway for the first recorded outbreak in eastern North America in 1988, with native populations (either from biocontrol or accidental introduction) and/or the European biocontrol population as potential sources, thereby defining three competing scenarios (see Fig. S1). The second outbreak in western North America in 1991 was examined in analysis 2, taking into account the scenario selected in analysis 1. For this analysis, there were three possible sources: the first outbreak (eastern North America), the native range, and the European biological control strain (Table S1). With the potential for admixture between them, this gave 6 competing scenarios. The European and South American outbreaks in 2001 were addressed in analyses 3 and 4, respectively (10 scenarios for each outbreak), taking into account the scenario selected in analysis 1 and 2. The year of first observation in Europe and South America was the same, thus we assumed that one could not be a potential source of the other. Nevertheless, we performed a specific ABC analysis to test this assumption which confirmed the independence of these two outbreaks (P = 0.773 with CI = 0.555-0.991). Finally, the African outbreak in 2004 was considered in analysis 5 (21 scenarios), taking into account the scenarios selected in analyses 1, 2, 3 and 4.
The ABC analyses were performed using parameter values drawn from the prior distributions described in Table S2 (prior set  1), and by simulating 10 6 microsatellite data sets for each competing scenario in the first four analyses and 5610 5 data sets in analysis 5 because of the high number of scenarios (21) and summary statistics (170) which made a larger analysis computationally unfeasible. For each of the five analyses, we estimated the posterior probabilities of the competing scenarios using a polychotomous logistic regression [24] on the 0.1% of simulated data sets closest to the observed data set. The selected scenario is that with the highest significant probability value with a nonoverlapping 95% confidence interval. We estimated the posterior distributions of demographic parameters under the final HA invasion scenario presented in Fig. 1 using a local linear regression on the 1% closest of 5610 6 simulated data sets [22,24].

Confidence in scenario selection
We evaluated the robustness of our inferences by running a second analysis with an alternative set of priors (prior set 2 in Table S2) and by estimating the posterior probabilities of scenarios using the 0.1% or 1% closest simulated data sets for both sets of priors (Table S2). For each of our five ABC analyses, we evaluated the ability of the methodology to correctly select the true scenario by analyzing test data sets simulated from the various competing scenarios with the same number of loci and individuals as in the real data set. For each scenario, one hundred such data sets were simulated using parameter values drawn from the same probability distributions as the priors (prior set 1, Table S2). Posterior probabilities of each competing scenario were estimated for each simulated test data set using the 0.1% closest data sets. These probabilities were used to compute type I and II errors in the selection of scenarios.
Most ABC computations were processed using a modified version of the Windows package DIYABC [24]. Some of the computations for assessing confidence in scenario selection (i.e. those for analysis 5) necessitated the development of specific Linux programs and scripts running on a computer cluster (available under request from A.E.).

Additional simulation and statistical treatments
We ran computer simulations to tackle the question of a high frequency of erroneous selection of the source population in a situation of genetic admixture when considering only the raw values of genetic differentiation statistics (i.e. pairwise F st and assignment likelihood values). We used the program DIYABC to simulate genetic data sets under a scenario of invasion with admixture similar to the one considered in the case of the invasion of Europe by HA. We simulated 6610 4 data sets drawing the parameter values in the prior set 1 (Table S2), except the admixture rate (ar) which was drawn in a discrete {0, 0.1, …,1.0} distribution instead of a uniform [0.1;0.9] distribution. The number of sampled diploid individuals was fixed to 30 in all populations. The deduced origin of the admixed introduced population is the population with which the F ST -value is the smallest or the assignment likelihood value is the highest. We then calculated the proportion of simulated data sets, as a function of the admixture rate, for which the deduced origin was the ancestral population of the two actual source populations (i.e. the native area in the case of the invasion of Europe by HA), considering either F ST or assignment likelihood values (see Fig. S2).
Finally, we also genotyped 200 live HA individuals intercepted in Europe (Å ndalsnes, Norway) in 2007 on timber imported from North America. We tested for the genetic differentiation, using Fisher's exact tests [27], between this interception HA sample and each population sample used in the ABC analysis to elucidate its most likely origin. Scenario 2 corresponds to a European Biocontrol origin of the ENA outbreak. In scenario 3, the ENA outbreak is the result of an admixture between individuals from the native area at a rate ar and from the European biocontrol strain at a rate 1-ar. All parameters with associated prior distributions are described in Table S2. Table S1 Native, invasive and biocontrol populations of Harmonia axyridis (HA), with the possible sources of each population considered for each nested ABC analysis. Notes: The definition of potential sources is based on the year of first observation of invasive populations. The EBC population was considered a potential source in all analyses even though historical records indicate that it was used for biocontrol purposes only in Europe and South America [13]. Admixtures between all pairs of potential sources also were considered in specific scenarios. In analyses 1 and 2, when the native population was involved we considered that there may have been a period of laboratory rearing in preparation for release for biocontrol purposes resulting in a lower effective population size, instead of a direct introduction from the native area (see Table S2 and Figure S1). For each ABC analysis, the number of competing scenarios is given in parentheses. Population code names as in Figure 1.  populations k are biocontrol strains (i.e., laboratory reared populations). Times were translated into numbers of generations running back in time from 2007 by assuming 2.5 generations per year in prior set 1, and 3 generations per year in prior set 2 [19]. NS = stable effective population size (number of diploid individuals); NF = effective number of founders during an introduction step lasting BD generation(s); ar = admixture rate (only for scenarios with admixture); t i = introduction date of invasive populations i with bounds x i or y i fixed from dates of first observation, assuming 2.5 or 3 generations per year, respectively; tbc = creation date of unsampled biocontrol strain for ENA and WNA populations (with condition t i , or = tbc i ) bounded between the dates of first observation of invasive population (which would correspond to a direct introduction into the wild) and the number of generations from 1970, the start date of a period of intense HA biocontrol activity in the USA. For microsatellite marker parameters, the loci were assumed to follow a generalized stepwise mutation model [S1] with two parameters: the mean mutation rate (mean m) and the mean parameter of the geometric distribution (mean P) of the length in number of repeats of mutation events. Each locus has a possible range of 40 contiguous allelic states and is characterized by individual m loc and P loc values, with m loc drawn from a Gamma (mean = mean m and shape = 2) distribution and P loc drawn from a Gamma (mean = mean P and shape = 2) distribution [S2]. Uneven insertion/deletion events that were suspected for several of our microsatellite loci based on observed allele sizes (i.e., allele lengths were sometimes not multiple of the motif length implying that there has been insertion-deletion mutations [28]) were also simulated with a mean mutation rate mSNI (for single nucleotide instability) and mSNI loc drawn for each locus from a Gamma (mean = mean mSNI and shape = 2). Boundaries of distributions are in brackets. Parameters of Normal and Gamma distributions are in parentheses. In prior set 2, Normal, Lognormal and Gamma distributions are truncated between the same boundaries as in prior set 1. All prior quantities presented were computed from 100,000 values. NA = not applicable; DV = can take different values. Supporting references: S1. Estoup A, Jarne P, Cornuet JM  Posterior probabilities (P) of the selected (most likely) scenarios in each ABC analysis at two different thresholds of smallest Euclidian distances (0.1% and 1%) and two different sets of priors. Notes: Prior sets are detailed in Table S2. 95% confidence intervals (CI) are in brackets. The 95% CI of the selected scenarios never overlapped those of competing scenarios. The values presented in Figure 1 of the main text are those obtained using the 0.1% threshold and prior set 1.