Population genetics of the Mediterranean corn borer (Sesamia nonagrioides) differs between wild and cultivated plants

The population genetic structure of crop pest populations gives information about their spatial ecology, which helps in designing management strategies. In this paper, we investigated the genetic structure of the Mediterranean Corn Borer (MCB), Sesamia nonagrioides Lefèbvre (Lepidoptera: Noctuidae), one of the most important maize pests in the Mediterranean countries, using microsatellite markers for the first time in this species. Insects were collected in twenty-five locations in southwest and southeast France from cultivated and wild host plants (Zea mays, Sorghum halepense and Typha domingensis). Contrary to what has been reported so far in France, we found that MCB populations could be locally abundant on wild poales plants. Analysis was carried out at 11 polymorphic microsatellite markers. Molecular variance was significantly determined by geography, then by host plant, with 17% and 4%, respectively, when considered as a major effect, and with 14% and 1%, respectively, when considered as a marginal effect in permutational analysis. Multidimensional scaling (MDS) and GENELAND Bayesian clustering suggested that populations infecting wild plants (T. domingensis and S. halepense) were more structured locally than those affecting cultivated maize. In S. halepense, significant Isolation By Distance (IBD) indicated that this factor could explain genetic differentiation of the moth populations. In T. domingensis, local population differentiation was strong but did not depend on distance. The implication of this absence of population structure in maize and the heterogeneity of population genetics patterns in wild plants are discussed in the context of the population dynamics hypothesis and population management strategies.

Introduction mating, i.e. the MCB host races. Additional studies are therefore needed to better assess host plant and geographic determinants of the fine genetic structure of the MCB.
In this paper, we used microsatellite markers for the first time and sampled cultivated and non-cultivated host plants in southern France to assess the genetic structure of the MCB. Microsatellite markers are co-dominant, highly polymorphic and more likely to be neutral [19] as compared to previously used markers that lack at least one of these characteristics (allozymes, RFLP, AFLP, RAPD). The main objective of this study was to assess the relative role of the host plant and geography in the local genetic structure of the MCB in the south of France and to discuss the population dynamics of this species in the light of available data. We sampled MCB populations on Maize, Sorghum halepense and Typha domingensis across southern France. We jointly analysed geographic and ecological factors for genetic differentiation through multidimensional descriptive statistics and permutation tests for major and marginal effects.

Insect sampling
Larvae were collected from both wild and cultivated host plants in twenty-five locations in two regions of southern France (Fig 1). In the southwest region, sampling was done in one maize field in the Lot county (Lavergne) in December 2008 and in seven maize fields in the Haute Garonne county during March 2012 and 2014 (Table 1). In this area of intensive maize growing (e.g. in Longages in Haute Garonne, 33% of the utilized agricultural land is devoted to maize, Arvalis, 2010 unpublished data), the wild plant hosts for the MCB are scarce, and we failed to find the moth in them. In the Rhone Valley region, larvae were sampled from two wild host plants, Typha domingensis (the MCB major wild host plant in its native African region) and Sorghum halepense, in 14 collection points in the Bouche du Rhône county (Camargue, south of Arles) and in one collection point in the Rhone Valley area in the Ardèche county (Aubenas) in July 2012 (S1 Table). In this area maize is scarce (Arles, 0.15% of the utilized agricultural land is devoted to maize, Arvalis, 2010 unpublished data), making insect collection on this crop difficult. Thereafter, the term "population" refers to MCB individuals collected from the same host plant in the same location (field or collection point) on the same date. Even if we could not find places hosting MCB on both wild and cultivated plants, having samples on the same plant from different localities helped to determine the effect due to distance, and having some samples from the same locality on different wild plants helped to determine the effect from plants. A total of 451 caterpillars were sampled, and each individual was preserved in ethanol (95%) before DNA extraction.

Molecular analysis
The DNA extractions were performed on third-stage larvae (whole body) using the Qiagen 1 Dneasy Blood and Tissue Kit. The microsatellite loci used were developed by Capdevielle et al. in 2012 (unpublished data) using next generation sequencing, confirmed by amplification, which allowed us to identify 17 suitable markers (SN01; SN16; SN20; SN44; SN15; SN22; Amplifications were performed in 96-well thermocycler using the following program: 5 mn at 95˚C to activate the taq polymerase, followed by 25 cycles of 30 s denaturation at 95˚C, 30 s for primer annealing at 55˚C and 30 s extension at 72˚C, and final extension of 5 mn at 72˚C. The amplified fragments were detected by a capillary sequencer ABI 3130xl (Applied Biosystems 1). Microsatellite profiles and allele scoring were made using GeneMapper Software (Version 4.0). The genotype scoring was manually checked for every individual. Two loci having too much missing data (SN32 and SN61) and two others, which were not polymorphic at all in our samples (SN25 and SN68), were removed. The remaining 13 microsatellite loci were analysed separately for conformity with Hardy-Weinberg (HW) equilibrium expectations using GENEPOP v4.0.1 [20].
Lepidoptera are known to have a high frequency of null alleles [21,22]. These alleles are due to nucleotide variation in the flanking region and make a locus appear homozygous, or result in no amplification at all if both alleles at the locus are null. Because the presence of null alleles may overestimate F-statistics [23], we checked their presence and estimated their proportions using INEst software [24]. Six loci were suspected to present null alleles (SN 01; SN16; SN 21; SN22; SN 37 and SN 44). The bias introduced by null alleles on F-statistics are considered significant when their frequencies are superior to 0.2 [23,25,26]. We therefore removed all suspicious loci having frequencies higher than 0.2 (SN 16 and SN59), which lead to the 11 microsatellites markers used in this study.

Statistical analysis
Genetic diversity. Population genetic diversity was assessed for populations represented by at least 5 individuals, by estimating allelic richness (A r ), using FSTAT v2.9.3.2 software [27], and the number of alleles per locus, the observed (H o ) and unbiased expected heterozygosity (H e ) for each locus in each population, using GENETIX v4.05 software [28]. We used Genetix and FreeNA software to estimate F IS values for each population and pairwise F ST values between all pairs of populations, between regions, county, localities or plants, and overall F ST values. F IS 95% confidence interval (95% CI) and F ST significance were computed on the basis of 1,000 bootstraps over loci.
MCB genetic structure. A neighbor-joining tree was built on the basis of Cavalli-Sforza and Edward (1967) distances using POPULATIONS v 1.2.32 [29]. The reliability of each node was estimated by 1,000 re-samplings of the data over loci. We also carried out non-metric multidimensional scaling (MDS), a non-linear equivalent of Principal Coordinate Analysis, which allowed us to optimize the graphical representation of genetic distance data. The population genetic structure was further characterized using Bayesian assignment approaches implemented in STRUCTURE 2.3.4 [30], TESS 2.3.1 [31] and GENELAND 4.06 [32].
Although TESS and GENELAND can account for geographic information in the definition of clusters, this option was not used. It would have favored spatial factors to the detriment of the ecological factors that we intended to compare in our study. For STRUCTURE and TESS, the "Admixture" model was carried out, allowing K to range from 2 to 10 (5 replicates of 3.10 4 burn-in iterations followed by 7.10 6 iterations for STRUCTURE and 7.10 5 iterations for TESS, for each value of K). For STRUCTURE, we used a model with correlated allele frequencies [33] and the best solution was identified using ΔK statistics [27]. For TESS we performed the Conditional Auto-Regressive (CAR) model with spatial trend (ψ) set to Zero, very close to the algorithm implemented in STRUCTURE [30,33]. Results obtained with the different K values were compared using the deviation information criterion (DIC). In order to identify the existence of distinct solutions across TESS and STRUCTURE replicates, we used CLUMPP v1.1.2 software [34] to compute a symmetrical similarity coefficient between the different replicates (greedy algorithm, 100 random input sequences, G' statistic). The analysis performed with GENELAND used a non-spatial model and correlated allele frequencies among clusters [33], with a number of iterations of 106, a thinning of 100, and with the possible number of clusters varying from 1 to 20. Burning was set at 2000 (20% of the recorded iterations) after visualisation of posterior trace. Post process chain analysis considered a 100 by 100 grid for mapping.
Finally, we carried out a non Bayesian DAPC clustering based on PCA and discriminant analysis [35] using find.clusters command implemented in adegenet R package.
Correlation between genetic and geographic distances. Isolation by distance (IBD) was tested using individuals as replication unit. It used a linear regression of F ST / (1-F ST ) estimated by a_values, by the logarithm of geographical distances, as proposed by Rousset [37] using GENEPOP v4.0.1 software [20], while statistical relationship was tested using the Spearman rank correlation coefficient (permutations).
Plant and locality effects. Genetic diversity indices were used to investigate the differences between insects collected from different host plants or different locations. For that, we used Kruskal-Wallis non-parametric analysis of variance using populations as a repetition unit, followed by post-hoc multiple comparison (Wilcoxon rank sum test adjusted by a sequential Bonferroni when significant effects are detected), using R [38]. To distinguish the effect of geography and the host plant in the genetic differentiation among MCB populations, we carried out permutational multivariate analysis of variance (PAMOVA), an extension of molecular analysis of variance (AMOVA) [39] that allowed us to estimate effects in a non-hierarchical way, either as a principal or as a residual effect, and thereby account for correlation among factors [40]. Significance of region, locality and plants, were considered sequentially by permutational analyses using the adonis function implemented in the vegan R package [41]. Plant marginal effects were estimated after the locality effect was accounted for, to evaluate the structure due to the plant only. To evaluate the effect of geography on genetic differentiation we performed mantel tests within the different host plant samples in the southwest and in the Rhone Valley separately and among all datasets.

Within-population genetic diversity
Allelic richness (A r ) and expected heterozygosity (H e ) ranged from 1.32 to 1.56 (mean of 1.46) and from 0.32 to 0.56 (mean of 0.46), respectively (Table 1). The number of alleles per locus ranged from 2 (locus 11) to 13 (locus 2) with a mean of 5.8 (S2 Table). The mean deviation from H-W expectations was variable among populations, with F IS varying from 0.15 to -0.19 (mean of 0.032). These within-population genetic diversity indices were not significantly different among insects collected from the two regions (the southwest and the Rhone Valley; P = 0.6 for He, P = 0.55 for Ho; P = 0.14 for F IS , P = 0.6 for Ar), from the different populations within regions (P = 0.46 for He, P = 0.46 for Ho; P = 0.46 for F IS , P = 0.46 for Ar) and from the three different host plants (P = 0.24 for He, P = 0.84 for Ho; P = 0.16 for F IS , P = 0.24 for Ar) or maize vs wild host plants (P = 0.6 for He, P = 0.55 for Ho; P = 0.14 for F IS , P = 0.597 for Ar) ( Table 2).

Genetic clustering
The admixture models (TESS and STRUCTURE) gave comparable results and differed from GENELAND (Fig 2). The optimal number of clusters was K = 3 for STRUCTURE, while in the analysis with TESS software, the DIC curve did not strictly reach a plateau (S1 Fig). However, when K was greater than 4, solutions proposed by TESS included empty clusters, so that the observed genetic structure was very similar to the solutions obtained for K = 4. For these reasons, we focus on K = 4 as the TESS best solution; one of the clusters in TESS (K = 4 solution) made only a small and admixed contribution. Among the five repetitions carried out with the optimal number of clusters for both STRUCTURE and TESS, we found very similar results (Fig 2, S2 and S3 Figs). South-western populations (Haute Garonne and Lavergne) in TESS, and the Arles 4 population in both TESS and STRUCTURE were very little admixed and mostly composed of a single cluster. All other populations were constituted with admixed individuals, supporting high levels of ongoing gene flow among all analysed populations (Fig 2). Results revealed in both analyses that the Arles 4 population was clearly different from all other populations, even from the MCB specimens collected from very close localities (Fig 2). The results of GENELAND (non-admixture and non-spatial model) were different. Strong local clustering with 10 Clusters corresponding to local populations were observed (Fig 2). Individuals from different localities, even when very close, were generally assigned to different clusters, and individuals from the same locality were generally assigned to the same single cluster (except for Longages 4). The Fis in the 10 clusters ranged from -0. 18

Host plant and geographically associated genetic structure
PAMOVA analysis was performed, considering the different factors (region, county, locality or plant) alone, as marginal effect, or within sub-samples for each host plant (i.e. individuals collected on the same host plant whatever the collection locality) ( Table 3). We observed that all factors were significant when considered either alone (P < 10 −5 , F ST = 0.057 among host plants, Fig  3) or as a marginal effect nested within their hierarchical superior (county within regions, P < 10 −5 ; locality within county, P < 10 −4 ), plants within localities (P = 0.0003) and reciprocally localities within plants (P < 0.0001). When considering genetic differentiation between localities within sub-samples of each host plant, geographic differentiation was significant for T. domingensis and S. halepense (P < 0.0001), but was not significant for maize, although the mean geographic distance between T. domingensis localities was smaller (16.2 km on average) than between S. halepense or maize localities (49.1 and 44.2 km on average, respectively). This difference in geographic structure according to the plant species can be visualized by MDS representations (Fig 3), where clustering of samples within locality can be seen for T. domingensis and S. halepense samples but not for maize samples. This indicates that individuals collected on wild host plants (T. domingensis and S. halepense) have more geographic structure compared to individuals collected on maize, suggesting that host plant species may influence the moth's propensity to disperse (Fig 3). IDB patterns back up this result. To account for the difference in sample size among populations, we performed Mantel tests at the individual level (Rousset 2000). Results are shown in Table 4. Mantel tests on the whole data set showed that geographic distance only slightly explained the observed genetic differentiation (slope = 0.005; max. distance between populations � 280 km). The genetic diversity did not show any significant IBD pattern in Southwest (mays) either in Rhone Valley (South West: P = 0.170; Rhone Valley: P = 0.220), likely reflecting very high levels of effective gene flow. However, when considering separate host plants, we observed significant IBD pattern in Rhone Valley (P = 0.0006) on samples from S. halepense but not on samples from T. domingensis (P = 0.23). Yet in T. domingensis, genetic differentiation was strong, even at a short distance (F ST = 0.181 between Arles 3 and Arles 4 at a distance of 0.137 km) (S3 Table). This is illustrated by the important intercept of IBD regression in this host plant (Table 4). By contrast, F ST value, on samples from S. halepense in Rhone Valley, were small at short distance (F ST = 0.02-0.03 in Arles at a distance of 4.5-13 km), and increased to 0.147-0.30 between Arles and Aubenas at a distance of 99-113 km. This is illustrated by the smaller intercept and the larger slope in this host species (Table 4). In maize the differentiation was low at low distance and did not increase with distance.

Existence of host races
The existence of host races, i.e. genetically differentiated populations specialized on different hosts, has been demonstrated in several Lepidoptera species [42] and in particular for Ostrinia nubilalis [12,14,18,43]. In our study, genetic differentiation due to host plant, was limited but significant (F ST = 0.01, 4% of variance explained  [12] for allozyme markers and F ST between 0.032 and 0.053 for AFLP markers [13]. The lower differentiation between MCB than ECB plant variants in Europe, is illustrated by STRUCTURE Bayesian clustering analyses, which did separate Ostrinia species [13] but did not separate MCB variants in our study. Ultimately, ECB host populations have been considered as different species on the basis of differences in timing of moth emergence and sex pheromone composition [44]. In Africa there are MCB sister species, with different levels of specialization toward the host plant or habitat [11]. The general picture is therefore a generalist species attacking maize and more specific wild relatives. In Europe, repetitive sampling of MCB across year would be necessary to evaluate the stability of the plant differentiation. Note that S. nonagrioides on maize lays eggs beneath windings of young leaf sheets, and in our rearing units, specifically on paper sheet rolls serving as structural surrogates, that are absent in dicotyledons [45,46]. The MCB sampled by Leniaud [18] on dicotyledons in Europe may therefore express a different oviposition behavior with a probable genetic basis, as opposed to the monocotyledon populations of the present study. Alternatively, S. nonagrioides oviposition and larval feeding preference traits may be transmitted between stages and generations not genetically, but by learning plant chemical cues transferred across stages and generation [47]. This fact has received a lot of attention in phytophagous insects because of its possible involvement in ecological speciation and its implication for the management of crop pests [48,49]; it has been documented in Lepidoptera [50,51]. In their Lepidoptera meta-analysis, Petit et al. (2015) reported that the most efficient transmission of preference was obtained in studies where the trait was adult oviposition preference and the exposure to the chemical cue at the larval stage [52]. This transmission of preference from larval feeding to adult oviposition traits  was also observed in MCB [52]. Such traits can lead to the host-related genetic structure we observed in this study, without the need to invoke genetic based preference, at least among monocotyledonous host species. Therefore, our study does not demonstrate the presence of host races in the MCB, but of more or less sedentary populations on wild plant settlements, possibly explained by the known biology of S. nonagrioides regarding transmission of preferences among development stages.

Contrasted and complementary results between Bayesian clustering methods
Our study showed contrasted results between GENELAND non-admixed and TESS-STRUC-TURE admixed models. The GENELAND clusters in South East of France were exactly corresponding to local populations. The TESS and STRUCTURE clusters were more homogenous geographically. Only the Arles 4 population formed a separate cluster in the admixed analyses, whereas most of the locations on wild host plants formed different clusters in the GENELAND non-admixed analysis. Safner et al. showed by simulation that GENELAND is better at detecting permeable barriers than TESS and other clustering methods [53]. Kalinowski also suggested that STRUCTURE does not always detect within a species genetic structure [54]. Our study confirmed the results of Safner et al. (2011) and Kalinowski (2011). GENELAND captured a local structure that was not captured by TESS and STRUCTURE. This local structure is confirmed by the F ST values observed at short distance between populations. The two approaches provide complementary information on the population structure of the MCB and seem to work at different scales. We next discuss how the other statistics performed in this paper cast light on the reason for these differences and thereafter, the population biology of MCB.

Existence and nature of genetic structure depend on the host plant
Many authors [55,56] have considered S. nonagrioides as a sedentary species. In our study, significant genetic structure was observed among sampling sites (F ST = 0.04, 17% of variance explained in PAMOVA). Mantel testing of genetic versus geographic distance was significant overall with extremely low slope indicating that distance plays a weak role in genetic differentiation (Table 4). Yet a neighbor-joining tree (S6 Fig) and [58], so long distance flight would be achieved upon several generations. Considering, for the first time, wild and cultivated plants in our analysis throws light on the debate on the dispersal behavior of the MCB: the GENELAND non-admixed model and MDS representation suggest the pattern of IBD depends on the host plant. GENELAND Bayesian clusters were highly localized on S. halepense and T. domingensis and highly regional on maize (one cluster over a maximum distance of 160 km). MDS representation also shows stronger geographic structure on T. domingensis and S. halepense than on maize. Lack of geographic distance effect on maize MCB differentiation and possible effect on wild plant MCB is congruent with Leniaud (2006), who sampled MCB on different host plants in southern France and found significant IBD within non maize plant groups and non-significant IBD within the maize group [18]. Our results and other direct experiments therefore strongly support the strong dispersal abilities of maize MCB contrary to what is often alleged so far.

Implication for MCB management
Understanding the relationships between crop pest populations living in cultivated and uncultivated areas is essential for effective control at a landscape scale. To our knowledge, this is the first MCB population genetic study including non-cultivated plants. This moth is a serious maize pest in southern France, and one question linked to controlling it is its ability to use wild plants as an alternative host. In Africa, the MCB develops populations in many wild plants, including a Typha species [59]. In southwest France, a major maize cultivation area, the MCB has been quite rarely found on wild plants after successive searches since 2011 (Kaiser, unpublished). In Camargue (southern Rhone Valley), where maize fields are scarce, we observed it on S. halepense or T. domingensis wild host plants (Naino Jika and Le Ru, unpublished data). This suggests that populations may specialize locally on the most abundant resource. Overall, the result suggests relatively varied and separated population dynamics on the three studies host plants. Our hypothesis is that for maize populations, population sizes are large but crop rotation imposes insect dispersion, thus increasing the total effective number of migrants (Ne×m), the critical parameter in reducing population structure. In Camargue, S. halepense and T. domingensis represent more perennial settlements than maize, since they develop from rhizome in humid non-cropping areas and are both resistant to flooding and drying events. This situation may have allowed the establishment of a pattern of IBD as observed for the moth populations on S. halepense. In T. domingensis on the other hand, the moth population genetic structure is strong but does not depend on distance. More precise data on the evolution of both plants and moth populations around the year are needed to understand the factors of the observed geographic structure. The hypothesis that the presence of wild relatives does not increase or could even decrease population dynamics on maize through maladaptation remains to be confirmed. Sampling of the MCB in uncultivated areas must therefore be carried out all across Europe, especially because the biotope favorable to its development, i.e. river banks, ponds, wetlands, estuaries as occurring in the Rhone Valley, in Africa [59] or in Iran [56,60] exists all over Europe.