Figures
Abstract
Genome size variation is one of the main topics in evolutionary genomics. Recent studies in Spodoptera frugiperda (Insect; Lepidoptera) reported very striking results, including large genome size differences, up to a twofold variation within populations, with the existence of 1.37 Gb of non-reference genome sequences, which is 2.5 times larger than the reference genome, 544 Mb in size. These reports raise the question of whether such extreme genome size variations within populations can be biologically realistic. To evaluate these results, we analyzed reference genome assemblies and resequencing datasets from multiple independent studies, including those from the original research. We observed that high-quality reference genomes consistently range from 380 Mb to 390 Mb, aligning closely with flow cytometry measurements. The genome size estimates using k-mer-based approaches from field-collected samples across four independent studies suggest that extensive genome size variation within S. frugiperda is unlikely to occur. Additionally, genome size appears to have remained stable for at least 15.89 million years in the ancestral lineage of S. frugiperda. Taken together, these results do not support the existence of extreme genome size variation in S. frugiperda, emphasizing the need for careful validation.
Citation: Durand K, Nam K (2025) Is the extreme within-population genome size variation real in Spodoptera frugiperda? PLoS One 20(9): e0332711. https://doi.org/10.1371/journal.pone.0332711
Editor: Vivekanandhan Perumal, Chiang Mai University Faculty of Agriculture, THAILAND
Received: May 9, 2025; Accepted: September 2, 2025; Published: September 30, 2025
Copyright: © 2025 Durand, Nam. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: This study was conducted using publicly available whole genome sequences from NCBI Genome (GCA_023101765.3, GCA_019297735.2, GCA_015832365.1, GCA_026413635.1, GCF_011064685.2, and GCA_900240015.1), NCBI SRA (ERR6937806, ERR6942234, SRR5132437, SRR27871596, PRJNA640063, PRJNA639295, PRJNA577869, PRJNA494340, and PRJNA639296), China National GeneBank Database (CNP0001020, CNA0003276), and Agricultural Genomics Institute at Shenzhen server (ftp://ftp.agis.org.cn/Spodoptera_Frugiperda/). Project data are available at: https://figshare.com/s/63497c4e2a06d6e8c891.
Funding: The study is supported by the department of Santé des Plantes et Environnement at Institut national de recherche pour l’agriculture, l’alimentation et l’environnement (Resistome). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We are grateful to Nicolas Nègre and Emmanuelle d’Alençon for their support in this study through intensive discussions.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Genome size variation has long been a key topic in evolutionary genomics [1], as the C-value paradox, the lack of correlation between genome size and organismal complexity, was coined as early as 1971 [2]. Genome size can vary substantially between species [3] or even between populations in a single species [4], primarily due to the differential accumulation of transposable elements [5]. Most copy number variation (or presence-or-absence variation) exists as rare alleles within populations, likely due to the deleterious effects [6–14]. Consequently, genome size variations are generally minor within populations. For example, in humans, nearly 30% of the 3.1 Gb genome can be subject to copy number variations [15], but a pangenomics study found that only 81 Mb of copy number variation sequences are shared among multiple individuals [16].
However, recent studies suggest that Spodoptera frugiperda (fall armyworm; Insecta; Lepidoptera; Noctuidae) defies this trend. S. frugiperda is a phytophagous lepidopteran species feeding on diverse crops, including maize, rice, and sorghum, due to its extreme polyphagy. S. frugiperda is found on all continents except Antarctica [17], after the invasion from native North and South Americas in 2016 [18], with the involvement of a substantial lag phase in a non-native area [19]. Gui et al. [20] estimated genome sizes using a k-mer-based method from 32 individuals collected from China, Ethiopia, and the mainland USA. They reported a striking result that genome size varies, ranging from 510 Mb to 977 Mb, without detectable differences among sampling locations, implying extensive within-population genome size variations. They also generated a reference 544Mb genome assembly using 100 bp MGI-Seq.
Huang et al. [21] performed a follow-up pangenomics study using the same dataset and further investigated the cause of this surprising result. Their method is simple. First, 100 bp BGI-Seq reads from the field-collected samples of Gui et al. were mapped to the reference genome generated by Gui et al., and unmapped reads were collected. Second, genome assembly was performed from these reads, and the resulting assemblies were considered to be non-reference genome sequences. Surprisingly, the resulting non-reference sequences totaled 1.37 Gb, which is nearly 2.5 times larger than the reference genome itself and far exceeds the known extreme case in the Mediterranean mussel, where non-reference genomes comprise nearly 48% of the reference genome (1.28 Gb and 580 Mb for reference and non-reference genomes, respectively) [22]. Thus, in S. frugiperda, genome sizes could be distributed between 510 Mb and 977 Mb by subsetting 26.6% to 51.0% of the total available sequence (1,914 Mb), including 544 Mb from the reference genome and 1,370 Mb from the non-reference assembly. If validated, this result would represent a milestone in evolutionary genomics, suggesting unprecedented within-population genome plasticity.
We identified three reasons that necessitate the validation of their findings. First, the assumption that all unmapped reads represent non-reference genomes needs further justification. If a genetic position contains a high level of heterozygosity, mapping of reads can be challenging. Then, a large proportion of unmapped reads might represent highly heterozygous loci rather than non-reference genomes. Second, the size of the used reference genome assembly should be robustly evaluated. Gui et al.’s reference genome assembly is 544 Mb in size, as mentioned earlier, which is significantly larger than the genome size measured by flow cytometry (396 Mb ± 3 Mb) from a laboratory corn strain colony originally seeded in Guadeloupe [23] using the standardized protocol of Johnston et al. [24]. Additionally, the reported genome sizes significantly exceed those of the NCBI reference genome generated using Oxford Nanopore long reads (384 Mb, NCBI accession number: GCA_023101765) and other published assemblies using PacBio long reads (380–390 Mb [25–28]), calling into question the accuracy of genome size exceeding 500 Mb. Third, it is difficult to imagine how homologous recombination can readily occur between a pair of haplotype genomes with two-fold size differences. A population with highly heterogeneous genome sizes is likely to experience fitness depression, ultimately resulting in a constrained range of efficiently recombinable genome sizes within a population.
Motivated by the principle that striking scientific observations require rigorous validation, we conducted a multi-pronged approach to assess the reported genome size variation by Gui et al. For this purpose, we analyzed the reference genome sequences of Gui et al. [20], other public reference genome assemblies, and the non-reference genome sequence of Huang et al. [21] to test whether the claimed genome size variation (510–997 Mb) is realistic. We also used k-mer-based methods with a resequencing dataset from 163 samples, which include the 32 samples reported to have extensive genome size variation by Gui et al., in addition to three other independent, publicly available datasets, including 302 samples across Argentina, Benin, Brazil, China, French Guiana, Guadeloupe, India, Kenya, Mexico, Puerto Rico, and the mainland USA, to evaluate whether similar levels of genome size variation are observed in other datasets. Finally, to place these results in an evolutionary context, we compared genome sizes across other Spodoptera species, providing insights into genome size dynamics within the genus.
Result and discussion
Reference and non-reference genome sequences
First, we assessed the accuracy of S. frugiperda reference genome assemblies according to the sizes using BUSCO genes [29], which are expected to exist predominantly as single-copy genes in a genome. We analyzed seven assemblies deposited in NCBI, along with the assembly generated by Gui et al. [20] (Table 1). The reference genome assembly from Gui et al. was the largest among these assemblies. When the number of Complete Single-copy BUSCO genes was below 5,020 (95% of the total 5,286 Lepidoptera BUSCO genes), assembly sizes varied between 329 Mb and 544 Mb (Fig 1A). However, when the number of Complete Single-copy BUSCO genes exceeded 5,020, assembly sizes converged to a range of 380–390 Mb, which closely aligns with the flow cytometry measurement (396 MB ± 3 MB). These genome assemblies were generated out of the samples collected from diverse locations, including Australia, Guadeloupe, the mainland USA, and Zambia, excluding the possibility of sampling bias.
Relationships (A) between assembly size and the number of Complete and Single-Copy BUSCO genes and (B) between assembly size and the number of Complete and Duplicated BUSCO genes. The reference genome assembly generated by Gui et al. [20] is marked with red circles. Assemblies generated from a laboratory colony originating from Guadeloupe, sequenced using different technologies, are shown with blue circles and arrows.
Assemblies larger than 400 Mb contained significantly higher numbers of Complete Duplicated-copy BUSCO genes than the other assemblies (Fig 1B). We tested the possibility that the large size of Gui et al.’s genome assembly was due to natural segmental duplications. A total of 446 out of 462 BUSCO genes were found to be specifically duplicated in this assembly compared with the NCBI reference genome assembly (GCA_023101765.3; S1 Table). These duplicated genes were distributed across all 31 chromosomes of the NCBI reference genome. Such a pattern would only be consistent with whole-genome doubling occurring in the sample used for Gui et al.’s assembly. However, this explanation is unlikely, as only 8.43% (446/5286) of BUSCO genes were duplicated in this assembly. These results indicate that the precise genome size of S. frugiperda is close to 380–390 Mb. Consequently, the reference genome assembly by Gui et al. does not appear to reflect true genome size. This genome assembly could have been inflated during the assembly of reads into contigs by treating heterozygous alleles as non-allelic sites or, more likely, during scaffolding errors of the contigs using Hi-C.
It is worth noting that this result included two genome assemblies derived from the same laboratory colony, seeded from a population in Guadeloupe, but generated using different sequencing techniques. The assembly based on 150 bp Illumina short reads [23] was larger (436 Mb) and contained a higher number of duplicated BUSCO genes (607) than the assembly generated using PacBio long reads [26] (385 Mb and 52). This result suggests, again, that the observed assembly size variation does not reflect true genome size differences but is primarily driven by differences in the used techniques.
We tested the possibility that the non-reference genome sequences reported by Huang et al. actually contain the reference genome sequences of Gut et al. In total, 347 lepidopteran BUSCO genes were identified within the non-reference genome sequences. The vast majority of these genes (323/347 = 93.08%) were also present in the reference genome (S2 Table), indicating the dual presence of these genes in both datasets. Among these, 84 genes showed BLASTp hits with at least 50% coverage against the metazoan BUSCO genes, which are reasonably expected to exist as single-copy genes across animal species. These include genes encoding, for example, Wntless, DNA-directed RNA polymerase II subunit RPB11, and the NEDD8-activating enzyme E1 regulatory subunit.
Among the total of 8,603 protein-coding genes identified in the non-reference genome sequences, 918 genes had protein sequences that were 100% identical to those found in the reference genome (S1 Fig, S3 Table for the gene list). These results demonstrate that the non-reference genome sequences reported by Huang et al. actually include the reference genome sequence, suggesting that the 1.37 Gb non-reference genome size is overestimated.
Genome size estimates from population data
We then re-evaluated the genome size variations (510 Mb – 997 Mb) reported by Gui et al. using GCE software [30], as they did. The resequencing dataset they generated includes 163 samples from the Americas, China, and Africa, among which 32 samples were used to estimate the genome size. The estimated genome sizes varied widely, ranging from 27.6 Mb to 1060.8 Mb (Fig 2A, left), implying 38.4-fold genome size variation (1060.8 Mb/27.6 Mb), which is biologically unrealistic. Fig 2A also shows that the genome size estimates did not appear to be influenced by sequencing throughput.
(A) Left: Estimated genome sizes from the resequencing dataset of Gui et al. [20] using GCE (blue dots) and Genomescope (red dots). Right: Distribution of ratios between two randomly selected genome size estimates from either Genomescope or Gui et al., with 10,000 replications. (B) Estimated genome sizes from the dataset of Zhang et al. [28] using Genomescope, plotted against sequencing throughput for each sample.
When we used Genomescope [31], another software using the reference-genome-free k-mer approach, the estimated range was reduced to 277.8 Mb – 446.7 Mb, with 80% of the samples falling within 323.3 Mb – 368.13 Mb (Fig 2A, left). Since a slight underestimation of genome size is expected when k-mer approaches are used [32], we believe this range does not significantly deviate from the flow cytometry measurements (396 MB ± 3 MB) and the size of the high-quality reference genome assemblies (380 Mb – 390 Mb). Thus, we conclude that Genomescope estimated a more realistic range of true genome size than GCE and that a two-fold genome size difference within populations is unlikely to be accurate or, at the very least, should be viewed with caution. According to Table S4 in the paper of Gui et al., the median proportion of unmapped reads is only 3.09% across the samples. Such a small proportion is also unlikely to account for the twofold difference in genome size, as argued by Gui et al. [20]
Intriguingly, the estimated genome sizes are positively correlated with the lengths of both unique and repeat sequences within the genomes (S2 Fig). To compare the variation in genome size estimates between Genomescope and Gui et al., we calculated the ratio of genome sizes from two randomly chosen samples with 10,000 replications. The average pairwise difference in genome sizes based on Genomescope was 7.23% (95% confidence interval: 0.16%–28.0%), which was much lower than the one based on the estimates of Gui et al. (25.7%, 95% confidence interval: 0.19%–70.5%). Notably, the distribution of differences from Gui et al.’s estimates was bimodal, a pattern not observed in the Genomescope results (Fig 2A, right). This result suggests that genome size estimation is highly sensitive to the used k-mer-based methods, with GCE potentially overestimating genome sizes in certain samples.
The resequencing dataset from Gui et al. primarily consists of S. frugiperda samples from China (20/32 = 62.5%). To further test extensive genome size variations in Chinese S. frugiperda, we analyzed an independently generated dataset from Zhang et al. [28], which includes 103 samples from China, using Genomescope. Once again, estimated genome sizes ranged from 295.4 Mb to 498.2 Mb, with 80% of the samples falling between 349.4 Mb and 387.4 Mb, regardless of sequencing throughput (Fig 2B). This result is inconsistent with the extensive genome size variation (510 Mb to 977 Mb) reported by Gui et al.
Additionally, we analyzed two independently generated datasets. The first dataset [33] included 55 samples collected from diverse geographic locations, including Argentina, Brazil, Kenya, Puerto Rico, and the mainland USA. Genomescope reported genome size estimation from 35 samples out of 55, all of which fell within the range of 349.9 Mb – 406.6 Mb, with a single exception reporting a genome size of 623.2 Mb (Fig 3A, left). This exceptional sample had the lowest model fit (Fig 3A, right), suggesting that this genome size estimate may not be biologically accurate.
The second dataset [34] comprised 144 samples, also collected from a wide geographic range including Benin, China, French Guiana, Guadeloupe, India, Mexico, Puerto Rico, and the mainland USA. Genomescope successfully reported the genome sizes from 133 samples. These samples had genome sizes between 325.6 Mb and 421.6 Mb for all samples, except for three outliers with the lowest model fit (Fig 3B). These results, again, do not support the argument by Gui et al. [20] of extensive genome size variation, exceeding 500 Mb in size.
Genome size evolution in Spodoptera
To place the genome size variation of S. frugiperda in an evolutionary context, we inferred genome sizes for other Spodoptera species using publicly available sequencing data analyzed with Genomescope. These species included S. exigua, S. picta, S. littoralis, and S. litura, which diverged from the ancestral lineage leading to the extant S. frugiperda between 10.97 million years ago (95% confidence interval: 10.36–12.67 Mya) and 15.89 million years ago (95% confidence interval: 15.08–16.34 Mya), based on molecular clock estimates calculated from mitochondrial divergence calibrated with fossil data [35]. All of these species had genome sizes close to 400 Mb (Fig 4), suggesting that genome size has remained relatively stable in the ancestral lineage of S. frugiperda, at least over the past 15.89 million years.
Conclusion
Taken together, these results do not support the claim that S. frugiperda exhibits extensive genome size variation between 510 Mb and 977 Mb [20] or the existence of 1.37 Gb of non-reference genome sequences in addition to a 544 Mb reference genome [21]. Instead, we demonstrate that the genome sizes of field-collected S. frugiperda samples remain largely close to the flow cytometry estimates (396 MB ± 3 MB). Furthermore, no dramatic changes in genome size have occurred for at least 15.89 million years in the ancestral lineage of extant S. frugiperda insects. Therefore, we suggest that the results of Gui et al. and Huang et al. need to be interpreted with caution, and further validation is necessary before accepting the existence of extensive within-population genome size variation, such as a 2.5-fold larger non-reference genome compared to the reference genome.
Since the first S. frugiperda genome project in 2017 [23], at least four high-quality reference genomes have been published [25–28], all reporting genome sizes substantially smaller than those estimated by Gui et al., as noted earlier. However, Huang et al. based their conclusions solely on the result of Gui et al. without addressing the discrepancies with prior research, effectively leaving the task of evaluating the pangenomic results to external researchers. We appreciate the value of open scientific discourse and view careful consideration of existing literature as a vital part of the research process.
Methods
Reference genome assemblies listed in Table 1 were downloaded from NCBI Genomes or the China National GeneBank Database. BUSCO v5.2.2 analysis [29] was conducted with lepidoptera_odb10. Non-reference genome sequences were compared to reference genome sequences using the blastP of BLAST+ v2.12.0 [36]. Assembly statistics were calculated using the FastA.N50.pl script of the enveomics collection [37]. The resequencing reads from Gui et al. [20] were downloaded from the China National GeneBank Database (ID: CNP0001020). The resequencing reads from Zhang et al. [28] were downloaded from the Agricultural Genomics Institute at Shenzhen server (ftp://ftp.agis.org.cn/Spodoptera_Frugiperda/), generated from samples that were all collected from maize, with the exception of two samples collected from sugarcane. The resequencing reads from Schlum et al. [33] were obtained from NCBI SRA (ID: PRJNA640063). The dataset from Yainna et al. [34] deposited in NCBI SRA (ID: PRJNA639295, PRJNA577869, PRJNA494340, and PRJNA639296) originally included 177 samples. However, we used only 144 samples, as the remaining data was not publicly released in the original study [38]. The resequencing reads from S. exigua, S. littoralis, S. litura, and S. picta were downloaded from NCBI SRA (IDs: ERR6937806, ERR6942234, SRR5132437, and SRR27871596, respectively). Adapter sequences were removed from the reads using AdapterRemoval [39]. The distribution of k-mers (= 17) was calculated using Jellyfish v2.3.1 [40] from the filtered reads, and the genome size was estimated using Genomescope v1.0 [31]. The length of unique and repeat sequences for each assembly was also obtained from the result of Genomescope. Alternatively, the distribution of k-mers (= 17) was calculated using kmerfreq v4.0 [30], and the genome size was estimated using Genomic Charactor Estimator (GCE) v1.0 [30].
Supporting information
S1 Table. Chromosomal locations on the NCBI reference genome assembly of BUSCO genes specifically duplicated in the genome assembly of Gui et al [20].
https://doi.org/10.1371/journal.pone.0332711.s001
(DOCX)
S2 Table. The BUSCO genes found in the Reference genome assembly generated by Gui et al. or non-reference sequences generated by Huang et al.
https://doi.org/10.1371/journal.pone.0332711.s002
(DOCX)
S3 Table. The genes showing 100% protein sequence identity between reference genome assemblies and non-reference sequences.
https://doi.org/10.1371/journal.pone.0332711.s003
(DOCX)
S1 Fig. Histogram showing the number of genes from the non-reference genome sequences generated by Huang et al. [21] that were mapped to each gene with 100% identical protein sequences in the reference genome assembly generated by Gui et al. [20].
https://doi.org/10.1371/journal.pone.0332711.s004
(TIF)
S2 Fig. The relationship between genome size and the lengths of repeat and unique sequences, as estimated by GenomeScope.
https://doi.org/10.1371/journal.pone.0332711.s005
(TIF)
Acknowledgments
We are grateful to Nicolas Nègre and Emmanuelle d’Alençon for their support in this study through intensive discussions.
References
- 1. Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302(5649):1401–4. pmid:14631042
- 2. Thomas CA Jr. The genetic organization of chromosomes. Annu Rev Genet. 1971;5:237–56. pmid:16097657
- 3. Sclavi B, Herrick J. Genome size variation and species diversity in salamanders. J Evol Biol. 2019;32(3):278–86. pmid:30588701
- 4. Chen X-G, Jiang X, Gu J, Xu M, Wu Y, Deng Y, et al. Genome sequence of the Asian Tiger mosquito, Aedes albopictus, reveals insights into its biology, genetics, and evolution. Proc Natl Acad Sci U S A. 2015;112(44):E5907-15. pmid:26483478
- 5. Canapa A, Barucca M, Biscotti MA, Forconi M, Olmo E. Transposons, Genome Size, and Evolutionary Insights in Animals. Cytogenet Genome Res. 2015;147(4):217–39. pmid:26967166
- 6. Fang B, Edwards SV. Fitness consequences of structural variation inferred from a House Finch pangenome. Proc Natl Acad Sci U S A. 2024;121(47):e2409943121. pmid:39531493
- 7. Schrider DR, Stevens K, Cardeño CM, Langley CH, Hahn MW. Genome-wide analysis of retrogene polymorphisms in Drosophila melanogaster. Genome Res. 2011;21(12):2087–95. pmid:22135405
- 8. Schrider DR, Navarro FCP, Galante PAF, Parmigiani RB, Camargo AA, Hahn MW, et al. Gene copy-number polymorphism caused by retrotransposition in humans. PLoS Genet. 2013;9(1):e1003242. pmid:23359205
- 9. Zhang W, Xie C, Ullrich K, Zhang YE, Tautz D. The mutational load in natural populations is significantly affected by high primary rates of retroposition. Proc Natl Acad Sci U S A. 2021;118(6):e2013043118. pmid:33526666
- 10. Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16(3):172–83. pmid:25645873
- 11. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature. 2006;444(7118):444–54. pmid:17122850
- 12. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, et al. Diversity of human copy number variation and multicopy genes. Science. 2010;330(6004):641–6. pmid:21030649
- 13. Zhang W, Tautz D. Tracing the Origin and Evolutionary Fate of Recent Gene Retrocopies in Natural Populations of the House Mouse. Mol Biol Evol. 2022;39(2):msab360. pmid:34940842
- 14. Emerson JJ, Cardoso-Moreira M, Borevitz JO, Long M. Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 2008;320(5883):1629–31. pmid:18535209
- 15. Zhang F, Gu W, Hurles ME, Lupski JR. Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet. 2009;10:451–81. pmid:19715442
- 16. Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51(1):30–5. pmid:30455414
- 17.
FAW map | Global Action for Fall Armyworm Control | Food and Agriculture Organization of the United Nations. [cited 12 Jun 2023]. Available from: https://www.fao.org/fall-armyworm/monitoring-tools/faw-map/en/
- 18. Goergen G, Kumar PL, Sankung SB, Togola A, Tamò M. First Report of Outbreaks of the Fall Armyworm Spodoptera frugiperda (J E Smith) (Lepidoptera, Noctuidae), a New Alien Invasive Pest in West and Central Africa. PLoS One. 2016;11(10):e0165632. pmid:27788251
- 19. Durand K, Yainna S, Nam K. Population genomics unravels a lag phase during the global fall armyworm invasion. Commun Biol. 2024;7(1):957. pmid:39117774
- 20. Gui F, Lan T, Zhao Y, Guo W, Dong Y, Fang D, et al. Genomic and transcriptomic analysis unveils population evolution and development of pesticide resistance in fall armyworm Spodoptera frugiperda. Protein Cell. 2022;13(7):513–31. pmid:33108584
- 21. Huang Y-X, Rao H-Y, Su B-S, Lv J-M, Lin J-J, Wang X, et al. The pan-genome of Spodoptera frugiperda provides new insights into genome evolution and horizontal gene transfer. Commun Biol. 2025;8(1):407. pmid:40069391
- 22. Gerdol M, Moreira R, Cruz F, Gómez-Garrido J, Vlasova A, Rosani U, et al. Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel. Genome Biol. 2020;21(1):275. pmid:33168033
- 23. Gouin A, Bretaudeau A, Nam K, Gimenez S, Aury J-M, Duvic B, et al. Two genomes of highly polyphagous lepidopteran pests (Spodoptera frugiperda, Noctuidae) with different host-plant ranges. Sci Rep. 2017;7(1):11816. pmid:28947760
- 24. Johnston JS, Bernardini A, Hjelmen CE. Genome size estimation and quantitative cytogenetics in insects. In: Brown SJ, Pfrender ME, editors. Insect Genomics. New York, NY: Springer New York; 2019. p. 15–26.
- 25. Nam K, Nhim S, Robin S, Bretaudeau A, Nègre N, d’Alençon E. Positive selection alone is sufficient for whole genome differentiation at the early stage of speciation process in the fall armyworm. BMC Evol Biol. 2020;20(1):152. pmid:33187468
- 26. Gimenez S, Abdelgaffar H, Goff GL, Hilliou F, Blanco CA, Hänniger S, et al. Adaptation by copy number variation increases insecticide resistance in the fall armyworm. Commun Biol. 2020;3(1):664. pmid:33184418
- 27. Fiteni E, Durand K, Gimenez S, Meagher RL Jr, Legeai F, Kergoat GJ, et al. Host-plant adaptation as a driver of incipient speciation in the fall armyworm (Spodoptera frugiperda). BMC Ecol Evol. 2022;22(1):133. pmid:36368917
- 28. Zhang L, Liu B, Zheng W, Liu C, Zhang D, Zhao S, et al. Genetic structure and insecticide resistance characteristics of fall armyworm populations invading China. Mol Ecol Resour. 2020;20(6):1682–96. pmid:32619331
- 29. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. pmid:26059717
- 30. Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N, et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv; 2020.
- 31. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33(14):2202–4. pmid:28369201
- 32. Pflug JM, Holmes VR, Burrus C, Johnston JS, Maddison DR. Measuring Genome Sizes Using Read-Depth, k-mers, and Flow Cytometry: Methodological Comparisons in Beetles (Coleoptera). G3 (Bethesda). 2020;10(9):3047–60. pmid:32601059
- 33. Schlum KA, Lamour K, de Bortoli CP, Banerjee R, Meagher R, Pereira E, et al. Whole genome comparisons reveal panmixia among fall armyworm (Spodoptera frugiperda) from diverse locations. BMC Genomics. 2021;22(1):179. pmid:33711916
- 34. Yainna S, Tay WT, Durand K, Fiteni E, Hilliou F, Legeai F, et al. The evolutionary process of invasion in the fall armyworm (Spodoptera frugiperda). Sci Rep. 2022;12(1):21063. pmid:36473923
- 35. Kergoat GJ, Goldstein PZ, Le Ru B, Meagher RL Jr, Zilli A, Mitchell A, et al. A novel reference dated phylogeny for the genus Spodoptera Guenée (Lepidoptera: Noctuidae: Noctuinae): new insights into the evolution of a pest-rich genus. Mol Phylogenet Evol. 2021;161:107161. pmid:33794395
- 36. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. pmid:20003500
- 37. Rodriguez RLM, Konstantinidis KT. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Inc.; 2016 Mar. Report No.: e1900v1.
- 38. Tay WT, Rane RV, Padovan A, Walsh TK, Elfekih S, Downes S, et al. Global population genomic signature of Spodoptera frugiperda (fall armyworm) supports complex introduction events across the Old World. Commun Biol. 2022;5(1):297. pmid:35393491
- 39. Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:88. pmid:26868221
- 40. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. pmid:21217122