Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Haplotype-level metabarcoding of freshwater macroinvertebrate species: A prospective tool for population genetic analysis

  • Joeselle M. Serrana,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Validation, Visualization, Writing – original draft

    Current address: Ottawa Institute of Systems Biology, and the Department of Biochemistry, Microbiology, and Immunology, Faculty of Medicine, University of Ottawa, Ontario, Canada

    Affiliations Center for Marine Environmental Studies, Ehime University, Matsuyama, Ehime, Japan, Faculty of Engineering, Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime, Japan

  • Kozo Watanabe

    Roles Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Validation, Writing – review & editing

    watanabe.kozo.mj@ehime-u.ac.jp

    Affiliation Center for Marine Environmental Studies, Ehime University, Matsuyama, Ehime, Japan

Abstract

Metabarcoding is a molecular-based tool capable of large quantity high-throughput species identification from bulk samples that is a faster and more cost-effective alternative to conventional DNA-sequencing approaches. Still, further exploration and assessment of the laboratory and bioinformatics strategies are required to unlock the potential of metabarcoding-based inference of haplotype information. In this study, we assessed the inference of freshwater macroinvertebrate haplotypes from metabarcoding data in a mock sample. We also examined the influence of DNA template concentration and PCR cycle on detecting true and spurious haplotypes. We tested this strategy on a mock sample containing twenty individuals from four species with known haplotypes based on the 658-bp Folmer region of the mitochondrial cytochrome c oxidase gene. We recovered fourteen zero-radius operational taxonomic units (zOTUs) of 421-bp length, with twelve zOTUs having a 100% match with the Sanger haplotype sequences. High-quality reads relatively increased with increasing PCR cycles, and the relative abundance of each zOTU was consistent for each cycle. This suggests that increasing the PCR cycles from 24 to 64 did not affect the relative abundance of each zOTU. As metabarcoding becomes more established and laboratory protocols and bioinformatic pipelines are continuously being developed, our study demonstrated the method’s ability to infer intraspecific variability while highlighting the challenges that must be addressed before its eventual application for population genetic studies.

Introduction

DNA-based techniques for species identification, e.g., Sanger sequencing, have been developed for cases where morphology-based identification proved problematic [1]. DNA barcodes, i.e., selected short fragments of DNA containing genetic information specific to each organism, are used to identify species or higher taxa depending on their level of variability. For example, the typical marker for identifying eukaryotic species is the mitochondrial cytochrome c oxidase I (mtCOI) gene [1]. However, conventional DNA barcoding faces practical limitations restricting the method to single specimen analysis, making DNA extraction and sequencing of large-scale samples expensive [2]. Hence, conventional approaches do not always fulfill the need of ecologists and are not ideal for large quantity high-throughput species identification [3].

To overcome the limitations of processing large numbers of specimens, and with the recent advancements of molecular technologies, researchers tapped next-generation sequencing technologies to allow DNA barcode-based identification to be conducted in a massively parallel manner [4]. In particular, high throughput sequencing of amplified DNA markers, aka metabarcoding, enables the simultaneous multi-species identification of community samples containing DNA from different origins [3]. The advancements in sequencing platform technologies made it possible to generate millions of sequences from community (i.e., mixed or mass-collected) samples and potentially identify most taxa, including the rare and inconspicuous ones [58]. The development and application of metabarcoding for biodiversity surveys have been considered a game-changer for ecological research, specifically for eukaryotic organisms [6]. Other than the taxonomic characterization of the community sample, metabarcoding data can be used to infer biodiversity indices and has been established as a robust method for environmental impact assessments of freshwater ecosystems [e.g., 913]. For these applications, biodiversity is generally quantified with taxonomic, functional, or phylogenetic diversity at the community level (i.e., interspecific diversity) [14]. However, intraspecific diversity, the genetic variation within species from community samples, is typically not explored.

Including intraspecific diversity assessment in ecological monitoring and planning management strategies would be beneficial, given that haplotype data are direct proxies for the spatiotemporal dynamics of populations that can be substantially different compared to community-level assessments [3]. In particular, evaluating changes in population size in response to environmental stressors [e.g., 15, 16] are key areas of basic and applied ecological research [17]. Quantifying gene flow between populations can examine the magnitude and mechanisms of population connectivity. Metabarcoding is a faster and more cost-effective alternative to the conventional approach of repeated genetic analysis for multiple individuals from different populations. It may also allow the comparison of population genetic structures of multiple species with different life stages and dispersal modes or abilities [18].

Previous literature has proposed the applicability of metabarcoding for population genetic analysis [e.g., 1821]. A handful of studies in freshwater environments have also presented the possibility of inferring haplotype information from metabarcoding of individually-tagged specimens [e.g., 22], macroinvertebrate community samples [e.g., 2326], and environmental DNA collected from water samples [e.g., 2731]. These studies highlighted the possibility of extracting haplotype or sequence variant information from high-throughput marker-gene sequencing data. However, the high number of reads containing sequencing errors that may occur throughout the metabarcoding procedure, from polymerase chain reaction (PCR) amplification bias and errors to bioinformatics pipelines, may influence the reliability of metabarcoding for population genetic analysis [32, 33]. DNA template concentration and the number of PCR cycles introduce bias and errors, mainly when applied to community samples [34, 35]. This makes it difficult to distinguish true haplotypes from erroneous sequencing noise [33]. Thus, further exploration and assessment of the laboratory and bioinformatics strategies are required to unlock the potential of DNA metabarcoding-based inference of haplotype information.

Using a mock sample with known Sanger-sequenced haplotypes (referred to as true haplotypes here on) of the 658-bp barcode region of the mtCOI gene, we assessed whether these haplotypes would be identified in the metabarcoding data. We also examined the influence of varying DNA template concentration and PCR cycles on detecting true haplotypes and reducing spurious haplotypes obtained from metabarcoding using the unoise3 denoising parameters [36]. As metabarcoding becomes more established and laboratory protocols and bioinformatics pipelines are continuously being developed, we demonstrated that the method could infer intraspecific genetic variability, showing promising applications for population genetic analysis.

Materials and methods

Study species and Sanger-sequence haplotypes

We inferred haplotypes from a mock sample by pooling extracted DNA of 20 individuals from four species with known haplotypes assessed from published population genetics studies (Amphinemura decemseta) [37] or our DNA barcoding projects (Kamimuria tibialis, Eucapnopsis bulba, and Epeorus latifolium) (Table 1). Each individual sample was morphologically identified accordingly, and the selected haplotypes for each species had variable nucleotide diversity that would be good for this assessment. Additional information on the Sanger sequence generation of these samples is provided as a supplementary text in S1 File. The haplotypes (h), haplotype diversity (Hd), and the total number of mutations (Eta) were calculated in DnaSP v.6.10.04 [38]. The genealogical relationship of the haplotypes of each species via a median-joining network was determined using a haplotype network tree (Fig 1A) drawn using the PopART software [39]. To examine if DNA template concentration influences the detection of true haplotypes from the mock sample, haplotypes for each species were prepared with different concentrations before pooling their DNA samples in one tube for amplicon library preparation. The DNA concentration was quantified with the QuantiFluor dsDNA system (Promega, Madison, WI, USA) on the Quantus Fluorometer (Promega, Madison, WI, USA), and sample concentrations from 0.01, 0.05, 0.1, 0.5, and 1.0 ng/μL were prepared (Table 1). The mock sample included two samples with high concentrations of 5.0 and 10.0 ng/μL for K. tibialis (Table 1).

thumbnail
Fig 1. Haplotype networks and barcode information.

Median-joining haplotype networks showing the level of intraspecific variation within each of the four species based on the 658-bp mtCOI Sanger sequence (a). The circle size represents the concentration of the extracted DNA of each haplotype in ng/μL, i.e., 0.01 (1), 0.05 (2), 0.1 (3), 0.5 (4), 1 (5), 5 (6), 10 (7), and the number of mutations is represented as hatch marks or numbers. Additional two samples with different concentrations of 5.0 and 10.0 ng/μL. Position of the sequenced amplicon (BF2 & BR2) along the mtCOI barcode region (b).

https://doi.org/10.1371/journal.pone.0289056.g001

thumbnail
Table 1. Sample and haplotype information of the individuals used for the mock sample.

https://doi.org/10.1371/journal.pone.0289056.t001

Library preparation and next-generation sequencing

The extracted DNA of the 20 individuals adjusted to their respective concentrations were pooled by adding 15 μl each in one tube. Elbrecht and Leese’s [40] fusion primers (BF2 and BR2; Fig 1B) for freshwater arthropods were used to construct amplicon libraries following a one-step PCR protocol. Each primer has unique inline shifts and Illumina adapters attached to the 5’-end for sample tagging and multiplexing. The PCR master mix consists of 0.25 μl Phusion polymerase with five μl HF Buffer (New England Biolabs, NEB), 0.75 μl DMSO, one μl dNTPs, 1.25 μl each of the fusion forward and reverse primers (10 μM), and 15.5 μl of PCR-grade water. PCR cycling conditions were 30 s of initial denaturation at 98°C, followed by 20 cycles of 10 s denaturation at 98°C, 30 s annealing at 55°C, 30 s extension at 72°C, and a final extension step of 5 minutes at 72°C. The same PCR protocol was done for eleven other cycles from 24 to 64 (increments of 4) in triplicate PCR amplifications, including no-template (blank) controls for each PCR cycle (also in triplicates).

The 72 PCR samples were pooled according to replicates and size selected via solid-phase reversible immobilization (SPRI) beads. Each replicate was quantified via qPCR with the KAPA Library Quantification Kit (Kapa Biosystems, Wilmington, MA, USA). Before sequencing, quality was assessed via the DNA 1000 assay using the Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA, USA). The replicate amplicon libraries were normalized to 2nM before pooling to ensure even read output distribution between replicates. Paired-end sequencing of the pooled library spiked with 20% PhiX was performed on the Illumina MiSeq platform using the MiSeq Reagent v3 600-cycle Kit (2 × 300 cycles).

Metabarcoding data processing and haplotype inference

The raw Illumina MiSeq paired-end reads were demultiplexed according to sample tags via the R package JAMP v.0.67 [32] and were quality-checked with FastQC [41]. We demultiplexed 14.7 million reads assigned to each of the 72 samples (S1 Table in S1 File). The paired-end reads were merged via the JAMP pipeline, primer stripped, truncated at 421-bp, quality filtered with a maximum expected error filtering value of 2, and dereplicated, excluding singletons using the USEARCH v11.0.667 pipeline [42]. To extract individual haplotypes from the dereplicated sequences, we employed a denoising strategy using unoise3 [36]. The zero-radius operational taxonomic unit (zOTUs) sequences were mapped against the 658-bp mtCOI Sanger sequences of the mock samples using the UPARSE-REF algorithm [42]. The command is designed for validating mock community sequencing experiments where the set of biological sequences in the sample is known. Note that sub-OTUs, OTUs with 100% sequence similarity, and zOTUs are synonymous [36, 43]. Similarly, amplicon sequence variants (ASVs) and exact sequence variants (ESVs) are terms used for outputs of other denoising pipelines (e.g., DADA2 [44]). We used zOTU in this study since we employed unoise3’s algorithm for denoising the quality-filtered reads [36].

We performed phylogenetic inference via neighbor-joining analysis of the Sanger and the zOTU sequences employing the Jukes-Cantor substitution model with bootstrapping (1000) via the online MAFFT multiple sequence alignment software version 7 [45]. Before the analysis, the 658-bp Sanger sequences of the mock samples were truncated to the 421-bp length of the BF2 and BR2 barcodes. Data visualization (i.e., boxplots and bubble plots) and statistical analysis were performed in R v.4.2.3 [46]. The default Wilcoxon T-test analysis was performed to compare the read abundances of the raw, merged, and zOTU-assigned sequences between the DNA template and negative controls for each cycle using the function stat_compare_means(). To normalize the read abundance per sample, the zOTU table was log-transformed using the phyloseq_standardize_otu_abundance() function in the metagMisc v0.5.0 package [47] for phyloseq objects generated using the phyloseq v.1.42.0 package [48].

Results

Sanger sequence haplotypes

The mock sample consisted of four different species, i.e., Amphinemura decemseta (Nemouridae), Kamimuria tibialis (Perlidae), Eucapnopsis bulba (Capniidae), and Epeorus latifolium (Heptageniidae) from two freshwater insect orders (Table 1). After trimming the Sanger sequences into the 421-bp length of the BF2 and BR2 barcode region to complement the amplified region in the DNA metabarcoding data, three of the K. tibialis Sanger haplotypes (i.e., KT1, KT3, and KT4) were grouped into one haplotype, similar to the two haplotypes of A. decemseta (i.e., AD1 and AD3). Hence, the 20 haplotypes from the 658-bp long fragment of the mtCOI barcoding region [49] were reduced to 17 haplotypes after trimming to 421-bp length.

Read abundance, reference sequence match, and phylogenetic inference

The read abundances significantly differed between the samples and blank controls, except for the raw and merged read counts of samples at 20 cycles (Fig 2A). From the demultiplexed reads, 90% were merged, but only 3% were retained after denoising (S1 Table in S1 File). Most of the reads removed were filtered out being erroneous sequences after the denoising step. From the dereplicated reads, 462,665 were assigned to 14 zOTUs (also referred to as "DNA metabarcoding haplotypes" in this study) (S2 Table in S1 File) of 421-bp length. For assessing the different PCR cycle numbers, all of the zOTUs were detected from cycles 24 to 64. For 20-cycle, only eight of the 14 zOTUs were represented, all of which match haplotypes with high concentrations (i.e., KT7, KT6, KT5, EL5, EL4, EB5, EB4, and EB3) (Fig 2B). Quality passing reads increased with increasing cycles, and the relative abundance of each zOTU was relatively consistent across the cycle numbers (S2 Table in S1 File). Notably, four zOTUs were detected in the negative samples from different cycles. zOTU04 was represented on nine negative samples from cycles 28 to 64, zOTU09 and zOTU11 on two negative samples, and zOTU08 and zOTU10 on one, i.e., cycles 60 and 64, respectively. However, it should be noted that most of these occurrences in the negative samples were singleton among samples or doubleton among samples and reads, with one sample (i.e., 40BR1) having the highest detection of only 22 reads.

thumbnail
Fig 2. Read processing and zero-radius operational taxonomic units (zOTU) abundance.

Comparison of read abundance (log-transformed) at different read processing steps per cycle and between the DNA template samples and the negative controls (a). Significance code: ’**’ associated with a variable at p < 0.01 and ’*’ at p < 0.05. Bubble plot showing the relative abundance of zOTUs per PCR cycle (b). Circle size represents the Mean values of three replicates, and the gray shadow is the standard deviation (SD).

https://doi.org/10.1371/journal.pone.0289056.g002

After mapping the zOTU sequences against the 20 Sanger sequence-haplotypes with 658-bp, 12 zOTUs, comprised of 450,793 (97%) matched-to-zOTU reads, had 100% sequence match against 12 of the Sanger haplotypes of the mock reference sequences (S3 Table in S1 File). The remaining eight Sanger haplotypes that were not detected from the metabarcoding dataset were the A. decemseta samples (AB1, AB2, and AB3), two E. bulba (EB1 and EB2), E. latifolium (EL1), and two K. tibialis (KT1 and KT3). Based on the neighbor-joining tree (Fig 3), the remaining zOTUs (i.e., zOTU10 and zOTU14) without a 100% taxonomic match against the Sanger sequences clustered with the A. decemseta sequences.

thumbnail
Fig 3. Neighbor-joining tree of the mock and the zero-radius operational taxonomic units (zOTU) sequences.

Sequences are highlighted based on species. The red bar represents a zOTU without a taxonomic match, and the text in red represents a Sanger haplotype without a zOTU match.

https://doi.org/10.1371/journal.pone.0289056.g003

Discussion

Using a mock sample with known haplotypes based on Sanger sequencing of the highly variable mtCOI gene region, we presented the feasibility and limitations of using metabarcoding data to extract intraspecific genetic diversity information by denoising the sequences into zOTUs. Most recent studies that inferred intraspecific diversity from macroinvertebrate metabarcoding data were also based on denoising algorithms [e.g., 32, 33, 50, 51]. Before denoising, our read processing steps, i.e., paired-end merging, stripping adapter and primer sequences, barcode length truncation, quality filtering, and dereplication, were performed mainly following default and stringent settings. It is also important to note that we truncated and only retained merged reads with a 421-bp (i.e., the entire length of the BF2 and BR2 barcode region) to prevent the generation of shorter-than-barcode zOTUs for a full-length match against the trimmed Sanger haplotype sequences.

DNA metabarcoding for haplotype-level inference

Some species lost nucleotide information, i.e., polymorphic sites, to distinguish the haplotypes due to shortening the barcode region (658-bp to 421-bp). For example, the K. tibialis haplotype with a relatively high input DNA template concentration (i.e., 0.10 ng/μL) was likely not a false negative detection. Its reads were present in the DNA metabarcoding data but merged with the other K. tibialis haplotype. This highlights the inability of shorter marker regions to differentiate haplotypes of certain species, which requires further development and evaluation. However, many high-throughput sequencing platforms used for metabarcoding still have strict limitations in sequence length, not utilizing the entire length of the mtCOI barcode [4]. Opportunely, recent advancements in sequencing technologies to generate longer sequences could allow the full-length generation of longer DNA barcodes from community samples [52], which may resolve the limitations of short amplicon sequencing in metabarcoding. Longer reads would allow the generation of high-resolution mitochondrial haplotype data, with potential applications for demographic history and selection analyses [53]. Still, given the raw read error rate from these long-read sequencing platforms, i.e., around 6% for quality scores at least equal to 10 for the Oxford Nanopore MinIONTM [54] (Delahaye & Nicolas, 2021), further assessment and exploration of library preparation and error-correction methods are recommended [53] for its utilization for metabarcoding studies, more so for haplotype-level inference.

The two metabarcoding haplotypes that failed to have a perfect sequence match against the Sanger sequences clustered with the A. decemseta samples. Although we could identify them as A. decemseta sequences based on the phylogenetic approach, metabarcoding failed to detect a 100% match of the Sanger haplotypes of this species from the mock sample. We could not rule out PCR amplification, primer bias, or sequencing errors as the reason for these false positive detections or spurious haplotypes obtained from the metabarcoding data, even if we performed relatively strict read quality filtering parameters. Identifying and eliminating artificial or false haplotypes has been a major challenge for population-level inference from metabarcoding data. Elbrecht et al. [32] (2018) reported that artificial or false haplotypes could not be entirely excluded even with stringent filtering settings due to undetected chimeric sequences or systematic sequencing errors that might persist across replicates. Macé et al. [55] (2022) suggested that a denoising method with an additional bimeric sequence removal step combined with a specific polymorphic mitochondrial barcode might resolve the issue of false haplotype detection. In addition, read filtering by relative abundance could help remove false positives and chimeras, given that these sequences are usually low in abundance [55]. Some studies have performed additional read filtering steps based on haplotype presence rate for different PCR replicates [28, 29]. However, these previous resolutions might lead to removing true haplotypes in the community sample with lower abundance. Hence, additional assessment and development of read processing are warranted moving forward.

Effects of PCR condition on the inference of haplotypes in a mock sample

Initial DNA template concentration significantly influenced the detection of individuals from a mock sample. This observation is in accordance with previous studies, which reported that samples or taxa with low DNA template concentrations had lower detection probability [56]. Accordingly, abundant taxa or samples with high biomass tend to have higher detection probabilities than the rare, smaller, or low-biomass individuals from mixed-community samples [5759]. The difference in biomass affects haplotype detection since most of the large specimens would be retained after read processing. These factors must be addressed when metabarcoding-based haplotyping is used to infer abundance-based analysis for population genetics applications.

Additionally, we used a PCR annealing temperature of 55°C chosen after a temperature gradient PCR of each individual, which was also the optimal condition used from a previous bulk metabarcoding study on freshwater macroinvertebrate communities [13]. This temperature is relatively higher than the typical conditions, i.e., 50°C, previously used for the BF2 and BR2 fusion primers [e.g., 60, 61]. This could have also led to the non-detection of the low-concentration haplotypes (e.g., AD1, EB1, EL1 with 0.01 ng/uL) in the mock sample. Although previous studies have reported that annealing temperatures from 40–56°C did not universally affect taxonomic recovery [62], lowering the annealing temperature could have recovered the haplotypes with low DNA template concentration [9]. Hence, we recommend further assessment of the PCR conditions expected to influence species recovery from metabarcoding samples, e.g., GC content [63], annealing temperature [64], polymerase [65, 66], and other primer sets that match the region of interest for freshwater macroinvertebrate species.

The quality-passed reads relatively increased with increasing cycles, and the relative abundance of each zOTU was consistent for each cycle number. This suggests that increasing the PCR cycle from 24 to 64 did not affect the relative abundance of quality-passed reads of each zOTU. Our findings align with previous studies that reported minor or no effect of PCR cycle number on amplification bias [63, 67]. This contrasts with other reports that increasing PCR cycles reduces the proportion of sequences with low starting DNA or less well-amplified species in the mock sample [68]. Moreover, higher PCR cycles have been reported to increase the formation of chimeric sequences and amplification bias [9]; that is why some metabarcoding protocols discourage increasing the PCR cycles above 30. However, a literature survey on bulk sample metabarcoding studies showed that 73% of reports used more than 30 PCR cycles to circumvent primer annealing issues or amplify samples with low amounts of DNA [69]. Our findings proved otherwise, wherein low template samples were undetected or not amplified even after increasing the cycle number to 64 cycles. The relative abundances of the samples detected were relatively consistent for each cycle. Nonetheless, we note that the sequence diversity in our mock sample was relatively low (i.e., four species with 20 haplotypes). A more diverse community might present a different pattern in PCR cycle effects; hence, it warrants further evaluation. Also, primer design is a major issue for metabarcoding studies, and our report was limited to using one fusion primer.

Properly selecting the DNA marker in a metabarcoding assay is crucial because all downstream analyses, e.g., species detection and identification, rely on the marker’s ability to amplify and discriminate the representative taxa of the target organisms [4]. The mtCOI gene has a relatively high mutation rate and can detect intraspecific variation. Thus, its widespread use for population genetics studies [e.g., 70, 71], and has been widely used for DNA metabarcoding macroinvertebrates to date. With this, intraspecific variation can be extracted from community samples using various algorithms for sequence clustering and phylogenetic rates [62, 72, 73]. Here, we showed the possibility of generating haplotype information from metabarcoding the mtCOI gene in a mock sample based only on a single gene marker, which could be enough to test or derive population-level hypotheses, e.g., taxa dispersal and distribution at unprecedented scales [32]. Still, given the relatively low cost of metabarcoding, multi-marker assessments can be employed for a more comprehensive population genetic assessment of mixed community samples.

Moreover, nine blank samples had sequences for some of the zOTUs. The presence of these reads in the blank samples might be due to tag jumps or the amplification of sequences carrying false combinations of used nucleotide tags that is common for dual-indexed libraries [60, 7476]. Here, we generated amplicon libraries using a one-step PCR strategy with fusion primers developed for freshwater arthropods [40]. This strategy would produce lesser false combinations of tags on the samples compared to a two-step PCR approach. Although we observed tag-jumping for some of the zOTUs, most of these occurrences were singleton or doubleton reads, which could be removed from the downstream analysis. Still, the challenges with tag jumping and contamination between libraries require attention to alleviate false read-to-sample assignments, which would be problematic once the method is employed with environmental samples.

Conclusion

We demonstrate that metabarcoding can infer intraspecific variability and confirm its ability to detect true haplotypes with the classical Sanger method as a basis, showing promise for possible applications in population genetic studies. In particular, we showed that haplotype information could be extracted from mixed community samples of freshwater macroinvertebrates. Quality-passed reads relatively increased with increasing PCR cycle numbers. However, the relative abundance of each zOTU was consistent across the cycle numbers suggesting that increasing the cycles did not affect the relative abundance of quality-passed reads in this low-diversity mock sample. Although conventional population genetics tools, e.g., Sanger sequencing, are used for targeted sequencing of specific genes or regions, metabarcoding is advantageous for studying complex mixtures since it enables high-throughput sequencing allowing massively parallel analysis of many samples. Hence, metabarcoding has a lower per-sample cost than Sanger sequencing [77, 78]. Still, the overall cost would depend on the number of samples and the sequencing depth required. As metabarcoding becomes more established and laboratory protocols and bioinformatics pipelines are continuously being developed, our study demonstrated that the method could be used to infer intraspecific variability, showing promise for possible applications and highlighting the challenges that need to be addressed before haplotype-level metabarcoding can entirely be used for population genetic applications. This includes further assessment of the laboratory, e.g., amplicon library constriction, PCR conditions, and sequence read processing approaches, e.g., denoising and read filtering steps, needed to confidently recover intraspecific information from metabarcoding data.

Acknowledgments

We thank Dr. Dávid Murányi, Dr. Maribet Gamboa, and Dr. Sakiko Yaegashi for the DNA samples and their molecular data. We also thank Dr. Naohito Tokunaga of the Division of Analytical Bio-Medicine for his assistance in performing high-throughput sequencing on the Illumina MiSeq platform of the Advanced Research Support Center (ADRES) at Ehime University, Japan.

References

  1. 1. Hebert P. D., Cywinska A., Ball S. L., & Dewaard J. R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1512), 313–321. pmid:12614582
  2. 2. Stein E. D., Martinez M. C., Stiles S., Miller P. E., & Zakharov E. V. (2014). Is DNA barcoding actually cheaper and faster than traditional morphological methods: results from a survey of freshwater bioassessment efforts in the United States?. PloS One, 9(4), e95525. pmid:24755838
  3. 3. Taberlet P., Coissac E., Pompanon F., Brochmann C., & Willerslev E. (2012). Towards next‐generation biodiversity assessment using DNA metabarcoding. Molecular Ecology, 21(8), 2045–2050. pmid:22486824
  4. 4. Piper A. M., Batovska J., Cogan N. O., Weiss J., Cunningham J. P., Rodoni B. C., et al. (2019). Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. GigaScience, 8(8), giz092. pmid:31363753
  5. 5. Yu D. W., Ji Y., Emerson B. C., Wang X., Ye C., Yang C., et al. (2012). Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution, 3(4), 613–623.
  6. 6. Creer S., Deiner K., Frey S., Porazinska D., Taberlet P., Thomas W. K., et al. (2016). The ecologist’s field guide to sequence‐based identification of biodiversity. Methods in Ecology and Evolution, 7(9), 1008–1018.
  7. 7. Zinger L., Bonin A., Alsos I. G., Bálint M., Bik H., Boyer F., et al. (2019). DNA metabarcoding—Need for robust experimental designs to draw sound ecological conclusions. Molecular Ecology, 28(8), 1857–1862. pmid:31033079
  8. 8. Nichols S. J., Kefford B. J., Campbell C. D., Bylemans J., Chandler E., Bray J. P., et al. (2020). Towards routine DNA metabarcoding of macroinvertebrates using bulk samples for freshwater bioassessment: Effects of debris and storage conditions on the recovery of target taxa. Freshwater Biology, 65(4), 607–620.
  9. 9. Aylagas E., Borja Á., Irigoien X., & Rodríguez-Ezpeleta N. (2016). Benchmarking DNA metabarcoding for biodiversity-based monitoring and assessment. Frontiers in Marine Science, 3, 96.
  10. 10. Pawlowski J., Kelly-Quinn M., Altermatt F., Apothéloz-Perret-Gentil L., Beja P., Boggero A., et al. (2018). The future of biotic indices in the ecogenomic era: Integrating (e) DNA metabarcoding in biological assessment of aquatic ecosystems. Science of the Total Environment, 637, 1295–1310. pmid:29801222
  11. 11. Serrana J. M., Yaegashi S., Kondoh S., Li B., Robinson C. T., & Watanabe K. (2018). Ecological influence of sediment bypass tunnels on macroinvertebrates in dam-fragmented rivers by DNA metabarcoding. Scientific Reports, 8(1), 1–10.
  12. 12. Zieritz A., Lee P. S., Eng W. W. H., Lim S. Y., Sing K. W., Chan W. N., et al. (2022). DNA metabarcoding unravels unknown diversity and distribution patterns of tropical freshwater invertebrates. Freshwater Biology, 67(8), 1411–1427.
  13. 13. Serrana J. M., Li B., Sumi T., Takemon Y., & Watanabe K. (2022). Implications of taxonomic and numerical resolution on DNA metabarcoding-based inference of benthic macroinvertebrate responses to river restoration. Ecological Indicators, 135, 108508.
  14. 14. Gotelli N. J., Shimadzu H., Dornelas M., McGill B., Moyes F., & Magurran A. E. (2017). Community-level regulation of temporal trends in biodiversity. Science Advances, 3(7), e1700315. pmid:28782021
  15. 15. Monaghan M. T., Robinson C. T., Spaak P., & Ward J. V. (2005). Macroinvertebrate diversity in fragmented Alpine streams: implications for freshwater conservation. Aquatic Sciences, 67(4), 454–464.
  16. 16. Weiss M., & Leese F. (2016). Widely distributed and regionally isolated! Drivers of genetic structure in Gammarus fossarum in a human-impacted landscape. BMC Evolutionary Biology, 16(1), 153.
  17. 17. Sutherland W. J., Freckleton R. P., Godfray H. C. J., Beissinger S. R., Benton T., Cameron D. D., et al. (2013). Identification of 100 fundamental ecological questions. Journal of Ecology, 101(1), 58–67.
  18. 18. Shum P., & Palumbi S. R. (2021). Testing small‐scale ecological gradients and intraspecific differentiation for hundreds of kelp forest species using haplotypes from metabarcoding. Molecular Ecology, pmid:33682164
  19. 19. Bohmann K., Evans A., Gilbert M. T. P., Carvalho G. R., Creer S., Knapp M., et al. (2014). Environmental DNA for wildlife biology and biodiversity monitoring. Trends in Ecology & Evolution, 29(6), 358–367. pmid:24821515
  20. 20. Adams C. I., Knapp M., Gemmell N. J., Jeunen G. J., Bunce M., Lamare M. D., et al. (2019). Beyond biodiversity: can environmental DNA (eDNA) cut it as a population genetics tool? Genes, 10(3), 192. pmid:30832286
  21. 21. Arribas P., Andújar C., Salces‐Castellano A., Emerson B. C., & Vogler A. P. (2021). The limited spatial scale of dispersal in soil arthropods revealed with whole‐community haplotype‐level metabarcoding. Molecular Ecology, 30(1), 48–61. pmid:32772446
  22. 22. Shokralla S., Gibson J. F., Nikbakht H., Janzen D. H., Hallwachs W., & Hajibabaei M. (2014). Next‐generation DNA barcoding: using next‐generation sequencing to enhance and accelerate DNA barcode capture from single specimens. Molecular Ecology Resources, 14(5), 892–901. pmid:24641208
  23. 23. Elbrecht V., & Leese F. (2015). Can DNA-based ecosystem assessments quantify species abundance? Testing primer bias and biomass—sequence relationships with an innovative metabarcoding protocol. PloS One, 10(7), e0130324. pmid:26154168
  24. 24. Pedro P. M., Piper R., Bazilli Neto P., Cullen L. Jr, Dropa M., Lorencao R., et al. T. (2017). Metabarcoding analyses enable differentiation of both interspecific assemblages and intraspecific divergence in habitats with differing management practices. Environmental Entomology, 46(6), 1381–1389. pmid:29069398
  25. 25. Elbrecht V., & Steinke D. (2019). Scaling up DNA metabarcoding for freshwater macrozoobenthos monitoring. Freshwater Biology, 64(2), 380–387.
  26. 26. Zizka V. M. A., Weiss M., & Leese F. (2020). Can metabarcoding resolve intraspecific genetic diversity changes to environmental stressors? A test case using river macrozoobenthos. Metabarcoding and Metagenomics, 4, e51925.
  27. 27. Nakagawa H., Yamamoto S., Sato Y., Sado T., Minamoto T., & Miya M. (2018). Comparing local‐and regional‐scale estimations of the diversity of stream fish using eDNA metabarcoding and conventional observation methods. Freshwater Biology, 63(6), 569–580.
  28. 28. Tsuji S., Maruyama A., Miya M., Ushio M., Sato H., Minamoto T., et al. (2020a). Environmental DNA analysis shows high potential as a tool for estimating intraspecific genetic diversity in a wild fish population. Molecular Ecology Resources, 20(5), 1248–1258. pmid:32293104
  29. 29. Tsuji S., Shibata N., Sawada H., & Ushio M. (2020b). Quantitative evaluation of intraspecific genetic diversity in a natural fish population using environmental DNA analysis. Molecular Ecology Resources, 20(5), 1323–1332. pmid:32452621
  30. 30. Doi H., Inui R., Matsuoka S., Akamatsu Y., Goto M., & Kono T. (2021). Estimation of biodiversity metrics by environmental DNA metabarcoding compared with visual and capture surveys of river fish communities. Freshwater Biology, 66(7), 1257–1266.
  31. 31. Dugal L., Thomas L., Reinholdt Jensen M., Sigsgaard E. E., Simpson T., Jarman S., et al. (2021). Individual haplotyping of whale sharks from seawater environmental DNA. Molecular Ecology Resources, pmid:34146448
  32. 32. Elbrecht V., Vamos E. E., Steinke D., & Leese F. (2018). Estimating intraspecific genetic diversity from community DNA metabarcoding data. PeerJ, 6, e4644. pmid:29666773
  33. 33. Turon X., Antich A., Palacín C., Præbel K., & Wangensteen O. S. (2020). From metabarcoding to metaphylogeography: separating the wheat from the chaff. Ecological Applications, 30(2), e02036. pmid:31709684
  34. 34. Pawluczyk M., Weiss J., Links M. G., Aranguren M. E., Wilkinson M. D., & Egea-Cortines M. (2015). Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples. Analytical and Bioanalytical Chemistry, 407(7), 1841–1848. pmid:25577362
  35. 35. Collins R. A., Bakker J., Wangensteen O. S., Soto A. Z., Corrigan L., Sims D. W., et al. (2019). Non‐specific amplification compromises environmental DNA metabarcoding with COI. Methods in Ecology and Evolution, 10(11), 1985–2001.
  36. 36. Edgar R. C. (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv, 081257.
  37. 37. Gamboa M., Muranyi D., Kanmori S., & Watanabe K. (2019). Molecular phylogeny and diversification timing of the Nemouridae family (Insecta, Plecoptera) in the Japanese Archipelago. PloS One, 14(1), e0210269. pmid:30633758
  38. 38. Rozas J., Ferrer-Mata A., Sánchez-DelBarrio J. C., Guirao-Rico S., Librado P., Ramos-Onsins S. E., et al. (2017). DnaSP 6: DNA sequence polymorphism analysis of large data sets. Molecular Biology and Evolution, 34(12), 3299–3302. pmid:29029172
  39. 39. Leigh J. W., & Bryant D. (2015). popart: full‐feature software for haplotype network construction. Methods in Ecology and Evolution, 6(9), 1110–1116.
  40. 40. Elbrecht V., & Leese F. (2017). Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment. Frontiers in Environmental Science, 11.
  41. 41. Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data.
  42. 42. Edgar R. C. (2013). UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10(10), 996–998. pmid:23955772
  43. 43. Porter T. M., & Hajibabaei M. (2018). Scaling up: A guide to high‐throughput genomic approaches for biodiversity analysis. Molecular Ecology, 27(2), 313–338. pmid:29292539
  44. 44. Callahan B. J., McMurdie P. J., Rosen M. J., Han A. W., Johnson A. J. A., & Holmes S. P. (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581–583. pmid:27214047
  45. 45. Katoh K., Rozewicki J., & Yamada K. D. (2019). MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics, 20(4), 1160–1166. pmid:28968734
  46. 46. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  47. 47. Mikryukov V. (2019). metagMisc: miscellaneous functions for metagenomic analysis.
  48. 48. McMurdie P. J., & Holmes S. (2013). phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PloS One, 8(4), e61217. pmid:23630581
  49. 49. Folmer O., Black M., Hoeh W., Lutz R., & Vrijenhoek R (1994). DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology Biotechnology, 3: 294–9. pmid:7881515
  50. 50. Antich A., Palacin C., Wangensteen O. S., & Turon X. (2021). To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography. BMC Bioinformatics, 22(1), 1–24.
  51. 51. Brandt M. I., Trouche B., Quintric L., Günther B., Wincker P., Poulain J., & Arnaud‐Haond S. (2021). Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding. Molecular Ecology Resources, pmid:33835712
  52. 52. Baloğlu B., Chen Z., Elbrecht V., Braukmann T., MacDonald S., & Steinke D. (2021). A workflow for accurate metabarcoding using nanopore MinION sequencing. Methods in Ecology and Evolution, 12(5), 794–804.
  53. 53. Sigsgaard E. E., Jensen M. R., Winkelmann I. E., Møller P. R., Hansen M. M., & Thomsen P. F. (2020). Population‐level inferences from environmental DNA—Current status and future perspectives. Evolutionary Applications, 13(2), 245–262. pmid:31993074
  54. 54. Delahaye C., & Nicolas J. (2021). Sequencing DNA with nanopores: Troubles and biases. PloS One, 16(10), e0257521. pmid:34597327
  55. 55. Macé B., Hocdé R., Marques V., Guerin P. E., Valentini A., Arnal V., et al. (2022). Evaluating bioinformatics pipelines for population‐level inference using environmental DNA. Environmental DNA, 4(3), 674–686.
  56. 56. Martins F. M., Porto M., Feio M. J., Egeter B., Bonin A., Serra S. R., et al. (2020). Modelling technical and biological biases in macroinvertebrate community assessment from bulk preservative using multiple metabarcoding markers. Molecular Ecology, pmid:32860303
  57. 57. Carew M. E., Coleman R. A., & Hoffmann A. A. (2018). Can non-destructive DNA extraction of bulk invertebrate samples be used for metabarcoding? PeerJ, 6, e4980. pmid:29915700
  58. 58. Erdozain M., Thompson D. G., Porter T. M., Kidd K. A., Kreutzweiser D. P., Sibley P. K., et al. (2019). Metabarcoding of storage ethanol vs. conventional morphometric identification in relation to the use of stream macroinvertebrates as ecological indicators in forest management. Ecological Indicators, 101, 173–184.
  59. 59. Serrana J. M., Miyake Y., Gamboa M., & Watanabe K. (2019). Comparison of DNA metabarcoding and morphological identification for stream macroinvertebrate biodiversity assessment and monitoring. Ecological Indicators, 101, 963–972.
  60. 60. Zizka V. M., Elbrecht V., Macher J. N., & Leese F. (2019). Assessing the influence of sample tagging and library preparation on DNA metabarcoding. Molecular Ecology Resources, 19(4), 893–899. pmid:30963710
  61. 61. Leese F., Sander M., Buchner D., Elbrecht V., Haase P., & Zizka V. M. (2021). Improved freshwater macroinvertebrate detection from environmental DNA through minimized nontarget amplification. Environmental DNA, 3(1), 261–276.
  62. 62. Elbrecht V., Braukmann T. W., Ivanova N. V., Prosser S. W., Hajibabaei M., Wright M., et al. (2019). Validation of COI metabarcoding primers for terrestrial arthropods. PeerJ, 7, e7745. pmid:31608170
  63. 63. Krehenwinkel H., Wolf M., Lim J. Y., Rominger A. J., Simison W. B., & Gillespie R. G. (2017). Estimating and mitigating amplification bias in qualitative and quantitative arthropod metabarcoding. Scientific Reports, 7(1), 1–12.
  64. 64. Krehenwinkel H., Fong M., Kennedy S., Huang E. G., Noriyuki S., Cayetano L., et al. (2018). The effect of DNA degradation bias in passive sampling devices on metabarcoding studies of arthropod communities and their associated microbiota. PLoS One, 13(1), e0189188. pmid:29304124
  65. 65. Nichols R. V., Vollmers C., Newsom L. A., Wang Y., Heintzman P. D., Leighton M., et al. (2018). Minimizing polymerase biases in metabarcoding. Molecular Ecology Resources, 18(5), 927–939. pmid:29797549
  66. 66. Nagai S., Sildever S., Nishi N., Tazawa S., Basti L., Kobayashi T., et al. (2022). Comparing PCR-generated artifacts of different polymerases for improved accuracy of DNA metabarcoding. Metabarcoding and Metagenomics, 6, e77704.
  67. 67. Sipos R., Székely A. J., Palatinszky M., Révész S., Márialigeti K., & Nikolausz M. (2007). Effect of primer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targetting bacterial community analysis. FEMS Microbiology Ecology, 60(2), 341–350. pmid:17343679
  68. 68. Piñol J., Mir G., Gomez‐Polo P., & Agustí N. (2015). Universal and blocking primer mismatches limit the use of high‐throughput DNA sequencing for the quantitative metabarcoding of arthropods. Molecular Ecology Resources, 15(4), 819–830. pmid:25454249
  69. 69. van der Loos L. M., & Nijland R. (2020). Biases in bulk: DNA metabarcoding of marine communities and the methodology involved. Molecular Ecology, pmid:32779312
  70. 70. Hajibabaei M., Singer G. A., Hebert P. D., & Hickey D. A. (2007). DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics, 23(4), 167–172. pmid:17316886
  71. 71. Curry C. J., Gibson J. F., Shokralla S., Hajibabaei M., & Baird D. J. (2018). Identifying North American freshwater invertebrates using DNA barcodes: are existing COI sequence libraries fit for purpose? Freshwater Science, 37(1), 178–189.
  72. 72. Giusti A., Tinacci L., Sotelo C. G., Marchetti M., Guidi A., Zheng W., et al. (2017). Seafood identification in multi-species products: assessment of 16SrRNA, cytb, and COI Universal Primers’ efficiency as a preliminary analytical step for setting up metabarcoding next-generation sequencing techniques. Journal of Agricultural and Food Chemistry, 65(13), 2902–2912. pmid:28290697
  73. 73. Vamos E., Elbrecht V., & Leese F. (2017). Short COI markers for freshwater macroinvertebrate metabarcoding. Metabarcoding and Metagenomics, 1, e14625.
  74. 74. Esling P., Lejzerowicz F., & Pawlowski J. (2015). Accurate multiplexing and filtering for high-throughput amplicon-sequencing. Nucleic Acids Research, 43(5), 2513–2524. pmid:25690897
  75. 75. Schnell I. B., Bohmann K., & Gilbert M. T. P. (2015). Tag jumps illuminated–reducing sequence‐to‐sample misidentifications in metabarcoding studies. Molecular Ecology Resources, 15(6), 1289–1303. pmid:25740652
  76. 76. Bohmann K., Gopalakrishnan S., Nielsen M., Nielsen L. D. S. B., Jones G., Streicker D. G., et al. (2018). Using DNA metabarcoding for simultaneous inference of common vampire bat diet and population structure. Molecular Ecology Resources, 18(5), 1050–1063. pmid:29673092
  77. 77. Maggia M. E., Vigouroux Y., Renno J. F., Duponchelle F., Desmarais E., Núñez J., et al. (2017). DNA metabarcoding of Amazonian ichthyoplankton swarms. PLoS One, 12(1), e0170009. pmid:28095487
  78. 78. Nobile A. B., Freitas-Souza D., Ruiz-Ruano F. J., Nobile M. L. M., Costa G. O., De Lima F. P., et al. (2019). DNA metabarcoding of Neotropical ichthyoplankton: Enabling high accuracy with lower cost. Metabarcoding and Metagenomics, 3, e35060.