Population genomics of fall armyworm by genotyping-by-sequencing: Implications for pest management

The fall armyworm (FAW), Spodoptera frugiperda, is a significant pest of many crops in the world and it is native to the Americas, where the species has shown the ability to rapidly evolve resistance to insecticides and transgenic plants. Despite the importance of this species, there is a gap in the knowledge regarding the genetic structure of FAW in South America. Here, we examined the genetic diversity of FAW populations across a wide agricultural area of Brazil and Argentina using a Genotyping-by-Sequencing (GBS) approach. We also characterized samples by their host strain based on mitochondrial and Z-linked genetic markers. The GBS methodology enabled us to discover 3309 SNPs, including neutral and outlier markers. Data showed significant genetic structure between Brazil and Argentina populations, and also among the Argentinian ecoregions. Populations inside Brazil showed little genetic differentiation indicating high gene flow among locations and confirming that structure is related to the presence of corn and rice strains. Outlier analysis indicated 456 loci putatively under selection, including genes possibly related to resistance evolution. This study provides clarification of the population genetic structure of FAW in South America and highlights the importance of genomic research to understand the risks of spread of resistance genes.


Introduction
The fall armyworm (FAW), Spodoptera frugiperda (J.E. Smith) (Lepidoptera: Noctuidae), is a major agricultural pest that can feed on several different hosts [1].FAW have the ability to evolve rapid resistance to insecticides and transgenic crops, which can impact the effectiveness of control strategies [2].The spread of such resistance traits is dependent on the migratory behavior of FAW and it is therefore important for effective pest management to delineate the pattern and magnitude of population movement across national boundaries.
In North America, FAW populations from Texas and Florida make annual migrations northwards to recolonize areas in the north of USA and Canada [3].This behavior reflects the inability of FAW to survive freezing temperatures, which limits winter populations to the most southern locations in the states of Texas and Florida [4].Less is known about migration patterns in South America, which experiences a significantly different climate than North America.Climate suitability modeling shows that South America features suitable conditions for persistent FAW populations in this continent, except for most of Argentina [5].This suggests an annual migration towards Argentina from more suitable locations, with the neighboring countries being the prime candidates for the migratory sources [6].
A complicating factor for FAW is that the species can be subdivided into two groups called host strains that differ in their distribution on plant hosts in the field, with the C-strain preferentially found in corn and the R-strain in pasture grasses and, to a lesser extent, rice [7,8].The host strains are morphologically indistinguishable and so can only be identified through the use of molecular markers, with the most commonly used being polymorphisms in the mitochondrial cytochrome oxidase subunit I (COI) and the Z-chromosome linked triosephosphate isomerase (Tpi) genes [9,10].
At this time, the most consistent evidence of population structure in South America is that indicative of the two host strains, although some level of genetic structure has been observed for populations within Paraguay and within Brazil [11][12][13].Overall, it remains unclear whether and to what extent geographically separated FAW in South America exhibit genetic differentiation.To better address this issue, we used Genotyping-by-Sequencing (GBS), a variation of the commonly used restriction-site-associated DNA sequencing methodology to identify single nucleotide polymorphisms (SNPs).GBS data allows us to resolve patterns of genetic diversity and spatial structure at very fine scale [14], and it was utilized in this study to evaluate patterns of gene flow and genetic structure of FAW populations across Brazil and Argentina.We additionally examined the information of host strains to discover informative loci putatively under selection.According to the results, we discuss some practical implications of our findings to the FAW management in South America.

Sampling and DNA extraction
Fall armyworm caterpillars were manually collected in fields from 12 locations in Brazil under the SISBIO license number 58435, and from 3 locations of Argentina, between June of 2018 and January of 2021 (Fig  1).Larvae were transferred to trays containing artificial diet and reared in laboratory conditions until become moths.Moths were placed in 1.5 mL polypropylene microcentrifuge tubes with 98% ethanol and stored at -20ºC.Species identification was confirmed by morphology and sequencing of the COI gene (see below).We extracted DNA from the legs of random adults using CTAB-based method [15].DNA quality and quantification were assayed by agarose electrophoresis gel (1% w/v) stained with SYBR Safe DNA Gel Stain (Invitrogen, Carlsbad, CA, USA).DNA concentrations were adjusted to approximately 30 ng/μL.Distribution map of the populations was drawn using the software QGIS v3.

Host strains identification
For the 228 specimens whose reads were successfully retained after GBS processing, we did polymerase chain reaction (PCR) amplifications of the COI and TpiEI4 regions to characterize host strains based on diagnostic polymorphisms at mCOI1164, mCOI1287 and gTpi183 positions using the same primer sequences and procedures as described elsewhere [4].Samples featuring both C and T nucleotides at the gTpi183 position were identified as hybrids (TpiH).The COI sequences were used to confirm species identification as well.Sequences were aligned with Clustal W [16] using MEGA 7 [17].The genetic data presented in this study are publicly available on GenBank (BioProject PRJNA847933, and accession numbers ON704174-ON704629).

GBS library sequencing and data processing
The steps of the GBS library preparation were done according to methodology described elsewhere [18] with the following modifications.Genomic DNA was digested with restriction enzymes NsiI-MseI (New England Biolabs-NEB, Ipswich, MA, USA).The barcoded NsiI adapters and a common MseI adapter were ligated to the digested DNA of each sample.Barcoded DNA fragments from all samples were pooled in a single tube and amplified by PCR.The libraries were single-end sequenced to 150 nucleotides on a single lane using the Illumina NextSeq500/550 sequencing kit v2 (Illumina, Inc.San Diego, CA, USA) at the Genome Investigation and Analysis Laboratory of University of São Paulo.
Sequencing quality of GBS libraries was evaluated using FastQC [19].The 3'end of raw reads were trimmed to 90 bp and were inspected for adaptor sequences removal.We performed demultiplex using process-radtags in STACKS v.1.42[20].Reads could be assigned to each individual based on the sequence of the barcodes.Samples with less than 200,000 sequences and/or unexpected GC contents were removed for further analysis.Reads were aligned to the Spodoptera frugiperda genome ZJU_Sfru_1.0, under Bioproject PRJNA590312 [21], using the algorithm BWA-MEM of the software BWA 0.7.17 [22].Alignment files were processed with SAMtools [23] and Picard (http://broadinstitute.github.io/picard).We identified SNPs using freebayes 1.3.4[24] with --standard_filters option.Filtering was performed using VCFtools v0.1.12a[25] and BCFtools 0.1.12[22].We retained bi-allelic SNPs that passed the following criteria: minor allele frequency � 0.01, read depth � 5X, mapping quality � 20, maximal amount of missing data per locus = 10%.Variants were separated by a minimum distance of 150 bp and r 2 threshold of 0.6.Results were stored in variant call format (VCF) after an additional filter to remove six samples with more than 25% of missing data, and SNPs identified at sexual chromosome W present only in females, mitochondrial genome or in unanchored contigs of the reference genome.

Genetic differentiation analysis
Genetic diversity was estimated by calculating the observed heterozygosity H O , expected heterozygosity H E , nucleotide diversity π, the inbreeding coefficients (F IS ), and the pairwise Fixation Index (F ST ) using hierfstat package [26,27].We tested for significant differences in heterozygosities and F IS using confidence intervals calculated based on 1000 bootstrap resamples.F ST relations were illustrated by heatmaps generated by the RColorBrewer R package.Genetic structure was additionally explored through the principal component analysis (PCA) and the discriminant analysis of principal components (DAPC) using ade4 [28] and adegenet [29,30] for R software [31].Specimens were grouped by the ADMIXTURE v1.3.0 software, and the best value of inferred genetic groups (K) was implemented by the cross-validation method [32].

Outlier SNPs detection and annotation
We searched for loci putatively under selection in the set of 15 populations and in the set of strains, where samples were identified as R-or C-strains using the Tpi marker as previously described [4].TpiH individuals were not included in this analysis.Outlier SNPs were identified using three methods.The fsthet package [33] generates smoothed quantiles for the F ST -heterozygosity relationship from the empirical distribution, and outliers were selected with a 95% confidence level threshold.The pcadapt package [34] considers population structure determined by PCA, and outliers were identified using a false discovery rate (FDR) of 0.05.The program FLK [35] assumes that SNPs were polymorphic in an ancestral population, and uses a population tree to build a null model of the behavior of F ST .We retained candidate markers identified by at least two methods and the remaining loci were considered as neutral.We compared the sequences containing outliers with the annotation files of the reference genome to identify loci present in encoding genes.We generated a fasta file containing the protein sequences to run Blast2GO [36] software using blastp with Insecta Taxa, InterPro Scan, GO mapping and functional annotation.The GO terms for each gene were submitted to Web Gene Ontology Annotation Plot (WEGO) for summarization [37].

Fall armyworm host strains
Utilizing diagnostic polymorphisms in the COI and Tpi regions, we were able to determine the strain composition of the fall armyworm populations.A higher percentage of the samples collected from corn fields were identified as C-strain using the Tpi marker (95%) than COI (86%), consistent with other studies suggesting that Tpi is a better strain marker than COI [10].The AR03 population in Argentina (Santa Fe) had the lowest agreement between markers (7%).Tpi-hybrids (TpiH) were found only in Brazilian locations, mainly in MA02 population, which also had many R-strain samples (Fig 2 ).

SNPs discovery
The GBS library generated a total of 187,842,557 reads that were retained after demultiplexing and quality checking

Outlier detection
When considering the 15 FAW populations, we identified 209 outlier SNPs in common by at least two of the three tests performed, whereas by comparing R-and C-strains we found 293 outliers in common by at least two of the three tests performed (S4 Fig) .Altogether, we identified a total of 456 outlier loci putatively under selection (13.8%), and the remaining 2853 SNPs were considered to be neutral.By analyzing only neutral markers, the PCA plot indicated

Annotation of outlier loci
For outlier analyses, 456 SNP loci putatively under selection were compared to the annotations of the genome of a specimen collected in China (ZJU_Sfru_1.0)and 306 of these were within predicted protein coding genes of FAW.After Blast2GO analysis, 220 proteins were successfully mapped, and 94 were annotated (S2 Table ).We found many outlier loci within genes possibly involved in binding functions and associated with the cell membrane (Fig 6A).Among the 94 loci with GO IDs, SNP_1055 was annotated as an ABC transporter C subfamily member 13, and the mutation was putatively under selection when comparing different locations, rather than related to the presence of two host strains.Regarding detoxifying activity, the locus SNP_3077 was in a gene similar to cytochrome P450 CYP314A1 (this locus was detected as outlier when comparing host strains), and the locus SNP_1559 was in a gene similar to an annotated esterase FE4-like.Another mechanism of insect defense involves chitin processing, and here we had five loci related to cuticle proteins or chitin binding (SNP_574, SNP_1209, SNP_1853, SNP_2799, SNP_3139).Analysis of outliers comparing host strains resulted in 68 outlier SNPs (23% of 293) in the sex chromosome Z, which concentrated over twice the number of outliers compared to the autosomal chromosomes (Fig 6B).The locus SNP_421 located at chromosome 3 was within a gene similar to an odorant receptor and could be related to sensorial stimulus to either host perception or male mating preferences.The locus SNP_3298 was in a gene that had 100% identity to a GPI-anchored glycoprotein.Besides this locus, another 188 loci were successfully blasted with more than 90% similarity, but had no GO ID associated with them (S3 Table ), including one locus identified as a cytochrome P450 (SNP_423), one cadherin (SNP_1010), one zinc carboxypeptidase (SNP_1352), one GABA receptor (SNP_719), and one UDP-glucosyltransferase (SNP_2719).The frequencies of each polymorphism for some candidate genes under selection were also calculated for the Brazil locations (S5 Fig).

Discussion
GBS has been used successfully in insect pests to reveal insights about gene flow and coancestry [38], spatial and temporal genetic structure [39,40], and incursions of invasive pests [41,42].
Here we associated the informative data set provided by GBS with molecular markers used for host strain identification to better explain the patterns of FAW population structure in Brazil and Argentina and to identify candidate genes putatively under selection.
By assessing a large number of samples in Brazil, we confirmed that FAW collected in corn fields were predominantly C-strain, with less than 4% of samples featuring both the COI-RS and TpiR diagnostic markers, and the few specimens featuring discordant genotypes likely represent vestiges of interbreeding events that occurred in the past.Based solely on diagnostic polymorphisms in COI and Tpi regions, MA02 and AR03 populations showed increased levels of strain hybridization, and we were able to describe the level of gene flow of these locations based on GBS data set.
GBS data revealed high levels of gene flow and low genetic differentiation between MA02 and RS population, which was composed by pure R-strain samples.Since these two populations were the most geographically distant locations sampled in Brazil, apart over 3200 km, we presumed that MA02 was composed by R-strain specimens with recent events of hybridization.Altogether, the genetic analysis based on pairwise F ST distances and PCA plots confirmed that the Brazilian populations are structured by host strains, rather than by geographical ecoregions.
Long distance migration enables FAW populations to travel from Southern Texas and Florida up to Canada, a distance of nearly 2500 km, in less than three months [3,43].Therefore given its strong flight performance, we can hypothesize that FAW is also performing long distance migration within Brazil sufficient to keep populations homogeneous within each host strain.
Several studies comparing populations from Brazil and Argentina in the past showed strong similarities between the two countries [4,[44][45][46], which would be expected if the great majority of Argentina FAW are derived from seasonal migrations from Brazil.To some extent, Brazilian and Argentinian populations featured common ancestry, and the population from Buenos Aires (AR02) had the lowest genetic differentiation with Brazilian populations.
Nevertheless, F ST analysis indicated significant genetic structure between countries and among provinces inside Argentina.This result is consistent with observations of mating incompatibility between populations collected in northern Argentina compared to those from the Pampas region, which indicated pre-reproductive isolating barriers between geographically separated populations [6].GBS data for another important Noctuidae pest in the Americas, the sugarcane borer (SCB), showed a similar pattern of genetic structure between Argentina and Brazil populations, and among populations within Argentina [47,48].We hypothesize that Argentina likely contains one or more endemic FAW populations that exhibit significant geographical isolation and that additional studies are required to better investigate the fine scale genetic structure of FAW as well as identify locations capable of supporting permanent FAW populations in this country.
In order to explain how selection pressures might be affecting FAW populations in South America, we examined the putative annotations of genes containing outlier SNPs.Host strain outliers were mostly concentrated on the sex chromosome Z, suggesting that the selection pressures are acting upon specific regions of the FAW genome.This result corroborates with previous study where the preponderance of strain specific SNPs were Z-linked [49], and is consistent with the proposal that strain divergence is being driven primarily by Z-chromosome functions [50].We believe that by reducing complexity of the genome, the GBS method was able to capture a fairly large number of polymorphisms in the Z-chromosome, and thereby discriminate between the R-and C-strains.It is possible that previous research based on nuclear SSR markers [12] that did not differentiate the host strains lacked sufficient coverage of the sex chromosome.
Other functional annotations revealed proteins that were likely involved in binding activities and that were present in or related to the cell membrane.Mutations in Cry receptor genes have been reported in numerous lepidopteran species to be the most common mechanism of resistance against Bt toxins [51], and here we found outlier SNPs in genes likely coding for Cry receptors such as GPI-anchored glycoprotein, cadherin and zinc carboxypeptidase.We also found outlier SNPs in genes possibly coding many important enzymes such as cytochrome P450 CYP314A1 [52], esterase FE4-like [53], JHAMT [54], and also proteins related to cuticle and chitin, which may be an indication of response to management with insecticides [55].Two noteworthy outlier SNPs associated with host strains were in genes possibly encoding an odorant receptor and a UDP-glucuronosyltransferase.Odorant receptors function in insects olfaction process, which is indispensable for host selection for feeding and oviposition [54].UDP-glucuronosyltransferase, in turn, appears to be associated with C-strain ability to detoxify DIMBOA [56], a toxic compound produced by corn plants but not rice.In conclusion, our work strongly suggest that positive selection is affecting allele frequencies at the level of populations and host strains.
From the Insect Resistance Management (IRM) perspective, resistance evolution is one of the most challenging problems in the sustainable control of FAW [57].Therefore understanding patterns of gene flow and consequent risks for spread of field-evolved resistance alleles are crucial for effective management.Our GBS data set poses a challenging scenario in Brazil, where locations presented high levels of gene flow across all ecoregions and low genetic structure within host strains.Moreover, pairwise F ST distances showed genetic structure between FAW populations of Brazil and Argentina, which has important IRM implications if resistant populations are reported in either country.
In conclusion, by combining classic molecular markers for FAW host strain identification, and genome-wide SNPs identified with GBS, we obtained more resolution of population structure than previously reported.The genetic structure and pattern of FAW in Argentina and Brazil reinforces the importance of phytosanitary barriers between countries for effective FAW management in each location.In agreement with this issue, outlier analysis suggested that positive selection is associated with field management and host strain divergences.Taking all this into consideration, current GBS data proved to be useful for population genomics research in South America and it may be applied to other geographies where the species has been introduced.

Fig 1 .
Fig 1. Fall armyworm sampling locations in Brazil and Argentina according to ecoregions.Shapefile of limits of countries in South America: IBGE-Mapas (IBGE-Brazilian Institute of Geography and Statistics; Available at: <https://geoftp.ibge.gov.br/cartas_e_mapas/bases_cartograficas_continuas/bc250/versao2021/shapefile/bc250_shapefile_2021_11_18.zip>;Acessed on February 26, 2023) QGIS v3.28.3 (Available at: <https://qgis.org/en/site/>;Acessed on: February 26, 2023).DATUM: SIRGAS 2000.https://doi.org/10.1371/journal.pone.0284587.g001 Gene flow and population genetic structurePairwise F ST calculations (S1 Fig, Fig 3A) and PCA analysis (Fig 3B) revealed high levels of genetic differentiation among the three populations from Argentina, ranging from 0.3742 to 0.4305, which is a strong indicative of reduced gene flow among these locations.On the other hand, Brazilian populations belonging to C-strain had very low F ST values (ranging from 0.000 to 0.006).Likewise, Brazilian R-strain populations MA02 and RS had pairwise F ST = 0. Lower pairwise F ST distances (S3 Fig) can be a result of lower divergence time in addition to the presence of gene flow across Brazil.Aditionally, the two first components of PCA grouped Brazilian populations by their host strains, and showed that Argentinean population AR02 was more related to the Brazilian C-strain populations than to the Argentinean AR01 and AR03.