Figures
Abstract
Entomological sampling and storage conditions often prioritise efficiency, practicality and conservation of morphological characteristics, and may therefore be suboptimal for DNA preservation. This practice can impact downstream molecular applications, such as the generation of high-throughput genomic libraries, which often requires substantial DNA input amounts. Here, we use a practical Tn5 transposase tagmentation-based library preparation method optimised for 96-well plates and low yield DNA extracts from insect legs that were stored under sub-optimal conditions for DNA preservation. The samples were kept in field vehicles for extended periods of time, before long-term storage in ethanol in the freezer, or dry at room temperature. By reducing DNA input to 6ng, more samples with sub-optimal DNA yields could be processed. We matched this low DNA input with a 6-fold dilution of a commercially available tagmentation enzyme, significantly reducing library preparation costs. Costs and workload were further suppressed by direct post-amplification pooling of individual libraries. We generated medium coverage (>3-fold) genomes for 88 out of 90 specimens, with an average of approximately 10-fold coverage. While samples stored in ethanol yielded significantly less DNA compared to those which were stored dry, these samples had superior sequencing statistics, with longer sequencing reads and higher rates of endogenous DNA. Furthermore, we find that the efficiency of tagmentation-based library preparation can be improved by a thorough post-amplification bead clean-up which selects against both short and large DNA fragments. By opening opportunities for the use of sub-optimally preserved, low yield DNA extracts, we broaden the scope of whole genome studies of insect specimens. We therefore expect these results and this protocol to be valuable for a range of applications in the field of entomology.
Citation: Cobb L, de Muinck E, Kollias S, Skage M, Gilfillan GD, Sydenham MAK, et al. (2024) High-throughput sequencing of insect specimens with sub-optimal DNA preservation using a practical, plate-based Illumina-compatible Tn5 transposase library preparation method. PLoS ONE 19(3): e0300865. https://doi.org/10.1371/journal.pone.0300865
Editor: Nafiu Bala Sanda, Bayero University Kano, NIGERIA
Received: September 11, 2023; Accepted: March 6, 2024; Published: March 22, 2024
Copyright: © 2024 Cobb et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All raw sequence read data are available under PRJEB65580 at the European Nucleotide Archive (ENA, https://www.ebi.ac.uk). All raw sequence statistics are in the Supplementary Table.
Funding: This work was funded by a grant from the University of Oslo awarded to B. Star and M.A.K. Sydenham. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In response to widespread declines in insect diversity (reviewed in [1]), and the subsequent severe implications for ecosystem functioning (reviewed in [2]), large-scale international insect survey and sampling schemes have been put in place (e.g. [3]). While the primary interest of such schemes is to monitor abundance and distribution changes, the application of genomic approaches to insect specimens can also be used to investigate spatial drivers of population trends, habitat connectivity and local adaptation (reviewed in [4]). These approaches rely on high-throughput sequencing (HTS) methods, which generally have considerably greater DNA quality requirements compared to more traditional PCR-based methods. However, entomological samples are often collected and stored using a variety of techniques which form a trade-off between practicality, morphological requirements and molecular demands. While samples should ideally be stored immediately following tissue death under optimal conditions [5], this is not always feasible during extended field periods. For instance, entomological specimens are often temporarily stored in warm fieldwork vehicles until sampling has concluded, and those captured by high-efficiency trapping methods, such as pitfall and flight interception traps, can remain in traps on site for several days, or even weeks, prior to collection and processing [6]. In addition, as entomological specimens are important biological resources with myriad applications in the fields of taxonomy and ecology, standard collection and storage methods are designed to prioritise morphological preservation. Given these considerations, alcohols such as ethanol are commonly used as both killing agents and preserving fluids (e.g. [3]). While this approach helps to prevent physical deterioration, it does not fully inhibit DNA degradation due to relatively high water content [7, 8]. The use of highly concentrated ethanol (>96%) can improve DNA preservation, although ethanol concentration may decrease due to evaporative loss. In addition, entomological specimens preserved in high concentration ethanol can become overly brittle, impacting morphological analyses [9]. Ethanol-based sampling and storage nevertheless remains the cheapest and most practical method compared to other options such as flash freezing with liquid nitrogen, and is therefore widespread (e.g. [10–12]). Library preparation methods which are compatible with material collected and stored using suboptimal preservation protocols are therefore required when working with entomological specimens.
An additional obstacle to the use of entomological specimens for HTS is the destructive nature of DNA extraction: certain protocols require the sample be physically ground or beaten with beads, while most call for tissue digestion through the use of corrosive chemicals and enzymes (e.g. [13]). While it is possible to extract from the entire insect in order to prioritise DNA quantity and quality (see [14], this destructive approach requires the sacrifice of biologically valuable entomological specimens. Some ‘non-destructive’ insect DNA extraction methods have been proposed, whereby the entire specimen is incubated during the digestion step and subsequently retained, allowing for the preservation of the whole insect [15–18]. However, this practice may result in some morphological damage such as pigment loss [16, 19], and its impacts on long-term preservation (i.e. >10 years) remain unclear. Importantly, while re-extractions of the same biological material may produce viable genomic DNA extracts, the first extraction is often the most effective in terms of DNA yield and endogenous DNA content [20]. Using the entire specimen therefore limits any future DNA applications, reduces the long-term biological value of entomological collections, and fails to incorporate ethical guidelines for the destructive sampling of biological specimens [21]. To most effectively preserve insect specimens, for both future entomological applications and future DNA extractions, a more conservative solution would be to sacrifice a small piece of tissue such as a leg [20]. This approach leaves the majority of the insect untouched, although the low sample volume and consequent low DNA yield may constrain specific downstream molecular applications.
While the cost of whole genome sequencing has decreased by several orders of magnitude since its inception [22], library preparation often remains expensive and now represents a substantial proportion of the overall cost of whole genome data generation. For instance, methods whereby DNA is physically fragmented are widely adopted, yet can be expensive, time consuming, and often require high DNA input [23]. Alternatively, tagmentation-based library preparation has streamlined the process by combining DNA fragmentation and adapter ligation into a single reaction facilitated by Tn5 transposase, and typically calls for lower DNA input [24, 25]. Commercial Tn5 transposase kits, however, remain relatively expensive, and their dependence on the use of undisclosed reagents inhibits methodological optimisation. Several adaptations to the tagmentation method have been developed, such as the in-house production of Tn5 transposase [26–28]. Despite significantly reducing costs, in-house enzyme production can be both time and resource intensive, may not be always possible and therefore lacks broad-scale practicality. Further modifications include reduction of reaction volumes, replacement of reagents with cheaper alternatives and the incorporation of magnetic bead-linked transposomes, highlighting the potential of matching low DNA input amounts with diluted transposase concentrations to increase both throughput and affordability [25, 29, 30]. Here, we explore the use of a commercially available hyperactive Tn5 transposase (Diagenode) to create high-throughput sequencing libraries from low-yield DNA extracts obtained from entomological specimens.
Specifically, we investigate the effectiveness of a practical and in-house developed Tn5 tagmentation-based library preparation method, using insect DNA extracted by applying minimally destructive methods to samples initially stored in ethanol for several weeks while kept in fieldwork vehicles at ambient summer temperatures. After fieldwork was completed, single legs were removed from the specimens and subsequently stored in 96% ethanol at -18°C, while the remaining specimens were dried, pinned and stored at room temperature. Our study utilises samples stored using both methods. We assessed the robustness of the protocol in response to varying tagmentation reaction time and Tn5 transposase enzyme concentration. We then rescaled our protocol to fit 96-well plates and a standardised DNA input of 6ng. Finally, we developed a double-sided post-amplification clean-up protocol in order to optimise library fragment size distribution and thus improve sequencing efficiency. We find that this economical, easily scalable library preparation protocol opens multiple opportunities for genomic studies, particularly in the cases of non-model organisms with limited funding, small organisms with low per-sample DNA yield or biological samples stored under non-optimal conditions for DNA preservation.
Materials and methods
Specimens, sampling and storage conditions
122 samples of two different bumblebee species (Bombus lapidarius and Bombus pascuorum) were stored following one of two protocols (see Table 1). All specimens were sampled in 2017 and 2022 by immediate submersion in 96% ethanol. They were then stored and transported in fieldwork vehicles for several weeks during the summer, before being dried, pinned and stored at room temperature with no preservative (Dry Protocol, n = 82). Samples stored using the Ethanol Protocol (n = 40) were comprised of single legs which were removed from the specimens collected in 2017 prior to drying, and subsequently stored in 96% ethanol at -18°C.
Details of two bumblebee sample storage protocols, including year sampled, killing agent, storage preservative and storage temperature, and the number of legs used for DNA extraction. Also presented are the sample sizes from each group used in two different sets of analyses: DNA concentration comparison of DNA extracts, and library preparation and sequencing statistics.
DNA extraction
DNA was extracted from either 1 or 2 mid or hind legs, including the coxa, femur, tibia, and tarsomeres, using the Qiagen DNeasy Blood & Tissue Kit. Samples from the Ethanol Protocol were washed with molecular-grade water in order to remove any ethanol residue. All samples were incubated in 380μL digestion buffer (300μL ATL buffer, 50μL DTT, 30μL proteinase K) at 56° C and 350 RPM overnight (minimum 17 hours), as implemented by the Campos & Gilbert method (in [31]) for DNA extraction from chitin. Following overnight digestion, 7μL RNase A and 300μL AL buffer were added to the lysate, followed by a room temperature incubation (15 minutes) and the addition of 360 μL 96% ethanol. DNA was subsequently bound and purified following the manufacturer’s recommendations, then eluted in AE buffer (105μL, 90μL or 80μL) and stored at -18°C. DNA concentration was then estimated using the Qubit dsDNA Assays (Thermofisher) and standardised to 1ng/μL.
Initially, DNA extractions of dry-stored specimens were performed on single legs (n = 21). In order to consistently avoid very low DNA yields, two legs were used for the remaining 61 extractions, along with the addition of 5μL 1M CaCl2 solution to the digestion buffer to aid tissue digestion.
Library preparation
Two 8-sample Illumina compatible trial libraries were generated to evaluate the impact of two variables on DNA fragment size distribution: 1) tagmentation reaction time (Trial 1) and 2) Tn5 transposase enzyme concentration (Trial 2). For Trial 1, two B. pascuorum DNA extracts were incubated for five, six, seven or eight minutes with 1μL of undiluted Loaded Tn5 Tagmentase (Diagenode, C01070012). For trial two, the same DNA extracts were incubated with diluted Loaded Tn5 Tagmentase (either a 2X, 4X, 6X or 8X-fold dilution of the original Tn5 Tagmentase) for 7 minutes.
Following these trials, the protocol was adapted for a 96-well plate setup, and 96 libraries were generated using a 7 minute incubation and 6X Loaded Tn5 Tagmentase dilution. The libraries were built using extracts with DNA concentrations of at least 1ng/μL, consisting of 82 dry-stored samples, 8 ethanol-stored samples, and 6 additional samples which were either found to be misidentified bumblebee species, or stored using a different protocol. Following PCR amplification, the libraries were pooled and cleaned using two alternative clean-up protocols (Pool SP and Pool S4) in order to optimise fragment size distribution for sequencing (see also S1 Protocol for full Tn5 tagmentation library preparation, PCR and post-amplification clean-up protocol.).
Tn5 tagmentation library preparation and PCR
All DNA extracts were standardised to 1ng/μL. 6μL DNA extract were incubated (55°C) with 5μL of 2X Tagmentation Buffer (Diagenode, C01019043) and 1μL of Loaded Tn5 Tagmentase (Diagenode, C01070012, using various dilutions, see above). After 5, 6, 7 or 8 minutes (see above), 3μL 0.2% Sodium dodecyl sulfate (SDS) Solution (Molecular Biology/Electrophoresis, Fisher BioReagents™ MFCD00036175) was added to stop the tagmentation reaction, followed by a 5 minute room temperature incubation. Post-tagmentation PCR (12 cycles) was done in 40μL reactions using KAPA HiFi DNA Polymerase (HotStart and Ready Mix formulation, Roche, KK2601). The manufacturer’s recommendations were followed by adding 24μL Mastermix, 1μL N7 Primer (1μM) and 1μL N5 Primer (1μM) (IDT for Illumina UD Indexes Plate A/Set 1, Illumina) to each 15μL reaction. After amplification, 6μL of each of the 96 wells was pooled (without quantification) before proceeding to clean-up (AMPure XP, Beckman Coulter).
Post-amplification clean-up and sequencing
All bead clean-ups were performed using AMPure XP beads (Beckman Coulter). The Trial 1, Trial 2 and Pool SP libraries were cleaned once, removing small DNA fragments by using a 0.6 DNA/bead ratio, following the manufacturer’s recommendations. Pool S4 was built from the same PCR reaction as Pool SP, but cleaned twice, using different bead ratios to remove both large and small DNA fragments (as demonstrated in [32]), with a target median fragment size distribution of ~600bp. Firstly, a 0.5 DNA bead/ratio was used to remove large DNA fragments. Secondly, a 0.65 DNA/bead ratio was used to remove smaller DNA fragments and shift the fragment size distribution to peak between 500 and 1000 bp. This “double sided” clean-up process was performed twice, and the final library was eluted in 40μL molecular grade water. DNA fragment length for all amplified libraries (individual libraries for Trial 1 and Trial 2, Pool SP, and Pool S4) were analysed using a Fragment Analyzer™ (Advanced Analytical) with the DNF-474 High Sensitivity Fragment Analysis Kit after 10-fold dilution of library aliquots. Pool SP was sequenced using an Illumina Novaseq 6000 SP flowcell, generating 923,253,536 read pairs passing filters and Pool S4 was sequenced using an Illumina Novaseq 6000 S4 flowcell and generated 2,799,713,996 read pairs passing filters. Base calling and demultiplexing were performed with RTA v3.4.4 and bcl2fastq v2.20.0.422 in both cases.
Read data were aligned using PALEOMIX v1.2.14 [33] to BomLapEIv1 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_936014575.1/) and BomPasc1.1 [34] after using AdapterRemoval/2.3.1 [35] and the mem algorithm as implemented in BWA/0.7.17 [36]. Only reads with a minimum MapQ value of 15 were retained and considered endogenous.
Statistics
The DNA yield comparison between dry-stored and ethanol-stored specimens was carried out using a t-test. Mann-Whitney-Wilcoxon tests were used to compare sequencing statistics between a) dry-stored and ethanol-stored specimens and b) Pool SP and Pool S4, excluding the endogenous DNA comparison between Pool SP and Pool S4, for which a t-test was used. All statistical analyses were performed using base R 4.3.1.
Results
Sample storage impact on DNA yield
One leg was used to extract DNA from samples stored in ethanol at -18°C (n = 40) and either one leg (n = 20) or two legs (n = 62) were used from samples stored dried at ambient temperatures (Table 1). When comparing DNA yields of extracts generated from comparable tissue volumes (a single leg), we found that significantly more DNA is obtained from the dry-stored specimens than the ethanol-stored specimens (Fig 1). Given that DNA yields from ethanol-stored specimens were low, we chose to prioritise dry-stored specimens for library preparation.
Specimens were sampled in 2017 from various locations in Norway, Germany and Denmark and stored in ethanol in fieldwork vehicles at ambient summer temperatures for several weeks. Samples were then either stored dry (at room temperature) or in ethanol (at -18°C). See Materials and Methods for details regarding sample collection and DNA extraction. Following extraction, DNA was eluted in 105μL. DNA yields from dry-stored specimens (mean = 298.21±181.03) were significantly greater than from those stored in ethanol (mean = 72.98±51.79; t = 5.45, df = 20.57, P < 0.001).
Tn5 transposase library preparation optimisation
Changing tagmentation reaction time had minimal impact on fragment size distribution, with all libraries showing similar distribution profiles and fragment size peaking slightly above 300 bp (see Fig 2A). Fragmentation size distribution profiles of libraries built using 2X, 4X and 6X transposase enzyme dilutions were broader, and showed a modest increase in the proportion of long DNA fragments with each ascending dilution. This increase becomes more pronounced with the use of an 8X dilution (see Fig 2B). We therefore generated libraries in a 96-well plate using a 7 minute incubation time and 6X Loaded Tn5 Tagmentase dilution. These libraries were then amplified (12 cycles), after which 5μL from each library was pooled to perform bead clean-up on all libraries combined using two different approaches.
(a) Libraries were built using four different tagmentation reaction times (5 minutes; 6 minutes; 7 minutes; 8 minutes) with a 1X transposase dilution. (b) Libraries were built using four different transposase dilutions (2X; 4X; 6X; 8X) with a 7 minute tagmentation reaction. RFU (Relative Fluorescence Units) signifies relative amount of DNA. A vertical line indicating a fragment size of 300 bp is included to assist visual comparison of the different treatments. See Materials and Methods for details regarding library preparation protocol.
Post-amplification clean-up optimisation
Following initial bead clean-up, we obtained a fragment size distribution situated between 400 bp and 1500 bp (Pool SP, Fig 3), a much broader range than the tight distribution around 600 bp recommended for Illumina DNA sequencing [37]. Pool SP was sequenced on an Illumina Novaseq SP flowcell with paired-end 150 bp reads, with a large number of reads collapsing into a single read (see further below). We therefore performed a more stringent clean-up on the original pooled libraries to tighten fragment size distribution, removing both very short and very long DNA fragments, resulting in a narrow distribution peaking around 600 bp (Pool S4, Fig 3). Pool S4 was subsequently sequenced on one quarter of an Illumina 6000 S4 flowcell, also with paired-end 150 bp reads.
RFU (Relative Fluorescence Units) signifies relative amount of DNA.
Sequencing results
We obtained 795,694,049 reads ranging between 261,576 and 23,087,451 per specimen for Pool SP, while Pool S4 produced significantly more reads: 2,409,119,984, ranging between 835,001 and 96,527,020 per specimen (W = 429, P = <0.001; Table 2 and Fig 4A). Furthermore, Pool SP contained a significantly higher number of collapsed reads (W = 8100, P = <0.001; Fig 4B), slightly lower endogenous DNA proportion (t = -25.89, df = 89, P = <0.001; Fig 4C), and slightly higher overall read length (W = 8012, P = <0.001; Fig 4D) compared to Pool S4. Overall, Pool S4 resulted in higher coverage than Pool SP (W = 365, P = <0.001; Fig 4E), which was predominantly driven by the higher sequencing yield (Fig 4A). Nonetheless, when calculating the ratio of coverage against sequencing yield (calculated as coverage divided by number of paired reads), the coverage achieved by Pool S4 was on average greater than that of Pool SP by a factor of 1.45±0.12. This increase in efficiency for Pool S4 can be explained by both its reduced proportion of collapsed reads, leading to the mapping of more nucleotides per pair, and its higher proportion of endogenous DNA, leading to more unique mapped sequences per pair.
Bumblebee specimens were sampled between 2017 and 2022 from various locations in Norway, Germany, Denmark and Sweden. Both pools were generated from the same parallel library session. Pool SP was cleaned up with a 0.6 DNA/bead ratio and was sequenced on an Illumina Novaseq SP flowcell. Pool S4 was cleaned up using a double-sided size selection protocol, with a 0.5 DNA/bead ratio clean-up followed by a 0.65 DNA/bead ratio clean-up (twice) and was sequenced on ¼ Illumina Novaseq S4 flowcell. The boxplots compare (a) number of reads; (b) proportion of collapsed reads; (c) endogenous DNA proportion (unique reads only); (d) read length and (e) coverage (fold).
Bumblebee specimens were sampled between 2017 and 2022 from various locations in Norway, Germany, Denmark and Sweden. Libraries for both pools were generated simultaneously. The SP pool was cleaned up with a 0.6 DNA/bead ratio and was sequenced on an Illumina Novaseq SP flowcell. The S4 pool was cleaned up using a double-sided size selection protocol, with a 0.5 DNA/bead ratio clean-up followed by a 0.65 DNA/bead ratio clean-up (twice) and was sequenced on ¼ Illumina Novaseq S4 flowcell. Statistics presented for each sequencing pool are specimen ID, number of reads, collapsed reads (proportion), clonality (proportion), endogenous DNA (proportion), nuclear coverage (fold), and read length (bp). Also presented is combined per-specimen coverage of both sequencing runs.
Ethanol-stored samples consistently produced preferable sequencing statistics, with significantly more paired reads (W = 553, P = <0.001; Fig 5A and see S1 Table), fewer collapsed reads (W = 1850, P = 0.007; Fig 5B), higher endogenous DNA proportions (W = 402, P = <0.001; Fig 5C), and similar read length (Fig 5D), leading to significantly higher overall coverage (W = 598, P = <0.001; Fig 5E).
Bumblebee specimens were sampled between 2017 and 2022 from various locations in Norway, Germany, Denmark and Sweden and were either stored dry at room temperature (n = 82) or in ethanol at -18°C (n = 8). The box plots compare (a) number of reads; (b) proportion of collapsed reads; (c) endogenous DNA proportion (unique reads only); (d) read length and (e) coverage (fold).
Finally, when combining the sequencing data of Pool SP and Pool S4, we obtain at least 3-fold overall coverage for 88 out of 90 samples, and at least 10-fold overall coverage for 58. (Table 2). Sequences exhibited minimal sequencing bias as a result of Tn5 library preparation (see S1 Fig).
Discussion
We here use an affordable and practical tagmentation-based genomic library preparation protocol to generate successful libraries from 6ng DNA extracted from entomological samples. We reduce costs by transposase enzyme dilution (similar to [25]), and pooling post-amplification libraries without quantification prior to clean-up. We also improve sequencing efficiency by applying a double-sided post-amplification clean-up protocol, selecting a narrow range of library fragments from the broad distribution of fragment sizes produced by the tagmentation reaction. We thus generated medium coverage (>3-fold) genomes for 88 out of 90 specimens, with an average of approximately 10-fold coverage. We estimate the overall economic cost for library preparation and PCR amplification of 96 reactions to be below 450 Euros. Below, we discuss a number of practical considerations.
First, initial extractions of bumblebee samples stored either dry or in ethanol revealed that dry-stored samples produced significantly higher overall DNA yields (up to 3-fold higher, see Fig 1); in fact, few extracts of ethanol-stored samples exhibited DNA concentrations beyond 1ng/μL. Assuming the dry-stored samples to be of higher quality, we chose to prioritise these samples for subsequent DNA extraction, library preparation and sequencing, resulting in the unbalanced sample sizes of the two groups (82 dry-stored samples, 8 ethanol-stored samples; see Table 1). Nonetheless, the eight extracts from ethanol-stored samples performed significantly better during sequencing, despite normalising the concentrations of all samples. Ethanol-stored samples obtained significantly more read pairs, which were also less often fully collapsed, indicating larger DNA fragments. They also yielded significantly greater proportions of endogenous DNA than the dry-stored sample sequences. We speculate that the combination of submerging samples in 96% ethanol with a storage temperature of -18°C reduces the presence of microbial or fungal contaminants, improves the proportion of endogenous DNA and decreases fragmentation of the little amount of DNA remaining. Given that our DNA library preparation protocol is of an adequate sensitivity to produce viable libraries from such low DNA amounts, we conclude that 96% ethanol-based storage of entomological specimens is preferable for high-throughput sequencing compared to dry storage, and that DNA yield is not a determining factor of library sequencing quality.
In order to increase affordability of library preparation, a 6X Tn5 transposase dilution was used, along with direct pooling of all amplified libraries prior to clean-up. As a result of reduced enzyme concentration, the tagmentation reaction produced a broad distribution of DNA library fragment sizes, with a large proportion of long fragments (see Figs 2B and 3, Pool SP). The ideal library fragment size distribution for Illumina DNA sequencing peaks tightly around approximately 600 bp [37]; in addition, long fragments have been found to produce significantly more error rates and reduced base qualities when sequenced on Illumina platforms [38]. In order to narrow library fragment size distribution around the recommended optimal length of 600 bp and thus increase sequencing efficiency, a more stringent double-sided post-amplification clean-up was performed on Pool S4, which removed both very short (<~400 bp) and very large (>~1000 bp) fragments. Consequently, Pool S4 produced significantly fewer collapsed reads, more endogenous reads and thus yielded approximately 1.5-fold more nucleotides per paired read compared to Pool SP. We therefore conclude that this stringent double-sided clean-up leads to a considerable improvement in sequencing efficiency.
To further reduce financial costs and workload, we pooled all libraries after PCR amplification and omitted individual quantification, which likely increased the variation and spread of coverage obtained by our sequences. Nonetheless, when combining the sequencing data of Pool SP and Pool S4, we find that 88 out of 90 samples have coverage of at least 3X, and 58 out of 90 samples with coverage over 10X. The vast majority of our samples are therefore suitable for a range of downstream analytical approaches, in spite of this pooling approach. While this 96-sample library preparation protocol (including PCR) runs in just a few hours, there is potential for increased efficiency through automation. The direct-post amplification pooling and lack of intermediate clean-up steps increases the ease with which this adaptation could be made.
Finally, we reiterate that 90 successful libraries were generated from a standardised DNA input of 6ng extracted from just one or two bumblebee legs. This low tissue volume minimises morphological damage to biologically valuable entomological specimens, helps to maintains their potential for future taxonomic, ecological and DNA-based research, and as a result reduces the requirement for additional entomological sampling efforts. In addition, library preparation protocols with low DNA input volumes are essential for the genomic sequencing of much smaller organisms than bumblebees, which may yield very little DNA even when entire specimens are used for DNA extraction. We therefore hope that this protocol can be of considerable value for the future of entomological genomics and monitoring.
Supporting information
S1 Protocol. Laboratory protocol describing Tn5 transposase tagmentation-based library preparation in 96-well plates, PCR and post-amplification clean-up.
https://doi.org/10.1371/journal.pone.0300865.s001
(DOCX)
S1 Table.
Sequencing statistics for two sequencing pools (SP and S4) of 90 specimens of two bumblebee species: (a) Bombus lapidarius (n = 46) and (b) Bombus pascuorum (n = 44). Bumblebee specimens were sampled between 2017 and 2022 from various locations in Norway, Germany, Denmark and Sweden, and samples were stored either dry at room temperature, or submerged in ethanol at -18°C. Libraries for both pools were generated simultaneously. The SP pool was cleaned up with a 0.6 DNA/bead ratio and was sequenced on an Illumina Novaseq SP flowcell. The S4 pool was cleaned up using a double-sided size selection protocol, with a 0.5 DNA/bead ratio clean-up followed by a 0.65 DNA/bead ratio clean-up (twice) and was sequenced on ¼ Illumina Novaseq S4 flowcell. Metadata presented for each sequencing pool are specimen ID, storage protocol, year sampled, number of reads (bp), collapsed reads (proportion), clonality (proportion), endogenous DNA (proportion), nuclear coverage (fold), mitochondrial clonality (proportion), mitochondrial coverage (fold) and read length (bp). Also presented is combined per-specimen coverage of both sequencing runs.
https://doi.org/10.1371/journal.pone.0300865.s002
(XLSX)
S1 Fig. Read fragmentation and nucleotide misincorporation patterns of sequencing read data from two representative libraries aligned to either the Bombus pascuorum (BEE008) or Bombus lapidarius (BEE009) assemblies.
While Tn5 transposase can cause sequence bias, this bias is associated with the first 10 bases of the reads only. We find that by using simple post-analysis filtering, preliminary data (not shown) indicate that these biases do not impact our ability to perform detailed population genetic analyses. Patterns were obtained using MapDamage v. 2.0.6 after down-sampling BAM files to 1,000,000 reads (Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F., & Orlando, L. MapDamage2.0: Fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics. 2013;29(13): 1682–1684).
https://doi.org/10.1371/journal.pone.0300865.s003
(TIF)
Acknowledgments
Sequencing was performed by the Norwegian Sequencing Centre (www.sequencing.uio.no), a national technology platform hosted by the University of Oslo and Oslo University Hospital, supported by the Research Council of Norway and South-Eastern Regional Health Authority.
References
- 1. Wagner DL, Grames EM, Forister ML, Berenbaum MR, Stopak D. Insect decline in the Anthropocene: Death by a thousand cuts. Proc Natl Acad Sci U S A. 2021;118(2). pmid:33431573
- 2. van der Sluijs JP. Insect decline, an emerging global environmental risk. Current Opinion in Environmental Sustainability. 2020;46:39–42.
- 3.
Potts S, Dauber, J., Hochkirch, A., Oteman, B., Roy, D., Ahnre, K., et al. Proposal for an EU Pollinator Monitoring Scheme. Luxembourg; 2020. Report No.: ISBN 978-92-76-23859-1 Contract No.: JRC122225.
- 4. Saunders ME, Janes JK, O’Hanlon JC. Moving On from the Insect Apocalypse Narrative: Engaging with Evidence-Based Insect Conservation. Bioscience. 2020;70(1):80–9.
- 5. Graham CF, Glenn TC, McArthur AG, Boreham DR, Kieran T, Lance S, et al. Impacts of degraded DNA on restriction enzyme associated DNA sequencing (RADSeq). Mol Ecol Resour. 2015;15(6):1304–15. pmid:25783180
- 6. Knuff AK, Winiger N, Klein AM, Segelbacher G, Staab M, Kotze DJ. Optimizing sampling of flying insects using a modified window trap. Methods in Ecology and Evolution. 2019;10(10):1820–5.
- 7. Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993;362(6422):709–15. pmid:8469282
- 8. Soniat TJ, Sihaloho HF, Stevens RD, Little TD, Phillips CD, Bradley RD. Temporal-dependent effects of DNA degradation on frozen tissues archived at –80°C. Journal of Mammalogy. 2021;102(2):375–83.
- 9. King JR, Porter SD. Recommendations on the use of alcohols for preservation of ant specimens (Hymenoptera, Formicidae). Insectes Sociaux. 2004;51(2):197–202.
- 10. Moreau CS, Wray BD, Czekanski-Moir JE, Rubin BER. DNA preservation: a test of commonly used preservatives for insects. Invertebrate Systematics. 2013;27(1):81–6.
- 11. Pd Sousa, Henriques A Silva SE, Carvalheiro LG, Smagghe G, Michez D, et al. Genomic Patterns of Iberian Wild Bees Reveal Levels of Diversity, Differentiation and Population Structure, Supporting the “Refugia within Refugia” Hypothesis. Diversity (Basel). 2023;15(6):746.
- 12. Frampton M, Droege S, Conrad T, Prager S, Richards MH. Evaluation of Specimen Preservatives for DNA Analyses of Bees. Journal of Hymenoptera Research. 2008;17(2):195–200.
- 13. Qiagen. DNeasy® Blood & Tissue Handbook. 2020.
- 14. Beadle K, Singh KS, Troczka BJ, Randall E, Zaworra M, Zimmer CT, et al. Genomic insights into neonicotinoid sensitivity in the solitary bee Osmia bicornis. PLoS Genet. 2019;15(2):e1007903. pmid:30716069
- 15. Santos D, Ribeiro GC, Cabral AD, Sperança MA. A non-destructive enzymatic method to extract DNA from arthropod specimens: Implications for morphological and molecular studies. PLoS One. 2018;13(2):e0192200. pmid:29390036
- 16. Tin MM-Y, Economo EP, Mikheyev AS. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics. PLoS One. 2014;9(5):e96793-e. pmid:24828244
- 17. Castalanelli MA, Severtson DL, Brumley CJ, Szito A, Foottit RG, Grimm M, et al. A rapid non-destructive DNA extraction method for insects and other arthropods. Journal of Asia-Pacific Entomology. 2010;13(3):243–8.
- 18. Thomsen PF, Elias S, Gilbert MTP, Haile J, Munch K, Kuzmina S, et al. Non-Destructive Sampling of Ancient Insect DNA. PLoS One. 2009;4(4):e5048-e. pmid:19337382
- 19. Korlević P, McAlister E, Mayho M, Makunin A, Flicek P, Lawniczak MKN. A Minimally Morphologically Destructive Approach for DNA Retrieval and Whole-Genome Shotgun Sequencing of Pinned Historic Dipteran Vector Species. Genome Biol Evol. 2021;13(10). pmid:34599327
- 20. Cavill EL, Liu S, Zhou X, Gilbert MTP. To bee, or not to bee? One leg is the question. Mol Ecol Resour. 2022;22(5):1868–74. pmid:34957693
- 21. Pálsdóttir AH, Bläuer A, Rannamäe E, Boessenkool S, Hallsson JH. Not a limitless resource: ethics and guidelines for destructive sampling of archaeofaunal remains. R Soc Open Sci. 2019;6(10):191059-.
- 22. Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, et al. DNA sequencing at 40: past, present and future. Nature. 2017;550(7676):345–53. pmid:29019985
- 23. Tvedte ES, Michalski J, Cheng S, Patkus RS, Tallon LJ, Sadzewicz L, et al. Evaluation of a high-throughput, cost-effective Illumina library preparation kit. Scientific Reports. 2021;11(1):15925-. pmid:34354114
- 24. Adey A, Morrison HG, Asan , Xun X, Kitzman JO, Turner EH, et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010;11(12):R119-R. pmid:21143862
- 25. Jones A, Stanley D, Ferguson S, Schwessinger B, Borevitz J, Warthmann N. Cost-conscious generation of multiplexed short-read DNA libraries for whole-genome sequencing. PLoS One. 2023;18(1):e0280004-e. pmid:36706059
- 26. Picelli S, Björklund AK, Reinius B, Sagasser S, Winberg G, Sandberg R. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 2014;24(12):2033–40. pmid:25079858
- 27. Hennig BP, Velten L, Racke I, Tu CS, Thoms M, Rybin V, et al. Large-Scale Low-Cost NGS Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol. G3 (Bethesda). 2018;8(1):79–89. pmid:29118030
- 28. Vonesch SC, Li S, Tu CS, Hennig BP, Dobrev N, Steinmetz LM. Fast and inexpensive whole-genome sequencing library preparation from intact yeast cells. G3 (Bethesda). 2021;11(1). pmid:33561223
- 29. Baym M, Kryazhimskiy S, Lieberman TD, Chung H, Desai MM, Kishony R. Inexpensive multiplexed library preparation for megabase-sized genomes. PLoS One. 2015;10(5):e0128036-e. pmid:26000737
- 30. Bruinsma S, Burgess J, Schlingman D, Czyz A, Morrell N, Ballenger C, et al. Bead-linked transposomes enable a normalization-free workflow for NGS library preparation. BMC Genomics. 2018;19(1):722-. pmid:30285621
- 31.
Shapiro B, Barlow A, Heintzman PD, Hofreiter M, Paijmans JLA, Soares AER. Ancient DNA: Methods and Protocols. New York, NY: Springer New York: Imprint: Humana; 2019.
- 32. Bronner IF, Quail MA. Best Practices for Illumina Library Preparation. Current Protocols in Human Genetics. 2019;102(1):n/a. pmid:31216112
- 33. Schubert M, Ermini L, Der Sarkissian C, Jónsson H, Ginolhac A, Schaefer R, et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc. 2014;9(5):1056–82. pmid:24722405
- 34. Crowley LM, Sivell O, Sivell D. The genome sequence of the Common Carder Bee, Bombus pascuorum (Scopoli, 1763). Wellcome Open Research. 2023;8:142. pmid:37621574
- 35. Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9(1):88. pmid:26868221
- 36.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Ithaca: Ithaca: Cornell University Library, arXiv.org; 2013.
- 37. Illumina DNA Prep Reference Guide. 2020.
- 38. Tan G, Opitz L, Schlapbach R, Rehrauer H. Long fragments achieve lower base quality in Illumina paired-end sequencing. Sci Rep. 2019;9(1):2856–. pmid:30814542