Abstract
Arthropods account for a large proportion of animal biomass and diversity in terrestrial systems, making them crucial organisms in our environments. However, still too little is known about the highly abundant and megadiverse groups that often make up the bulk of collected samples, especially in the tropics. With molecular identification techniques ever more evolving, analysis of arthropod communities has accelerated. In our study, which was conducted within the Global Malaise trap Program (GMP) framework, we operated two closely placed Malaise traps in Padang, Sumatra, for three months. We analyzed the samples by DNA barcoding and sequenced a total of more than 70,000 insect specimens. For sequence clustering, we applied three different delimitation techniques, namely RESL, ASAP, and SpeciesIdentifier, which gave similar results. Despite our (very) limited sampling in time and space, our efforts recovered more than 10,000 BINs, of which the majority are associated with “dark taxa”. Further analysis indicates a drastic undersampling of both sampling sites, meaning that the true arthropod diversity at our sampling sites is even higher. Regardless of the close proximity of both Malaise traps (< 360 m), we discovered significantly distinct communities.
Figures
Citation: Chimeno C, Schmidt S, Cancian de Araujo B, Perez K, von Rintelen T, Schmidt O, et al. (2023) Abundant, diverse, unknown: Extreme species richness and turnover despite drastic undersampling in two closely placed tropical Malaise traps. PLoS ONE 18(8): e0290173. https://doi.org/10.1371/journal.pone.0290173
Editor: Pierfilippo Cerretti, Universita degli Studi di Roma La Sapienza, ITALY
Received: April 27, 2023; Accepted: August 3, 2023; Published: August 16, 2023
Copyright: © 2023 Chimeno et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All original data spreadsheets that were downloaded from BOLD for analysis have been uploaded to Figshare (https://doi.org/10.6084/m9.figshare.21815034). The R script and all input data sets are also deposited on Figshare (R code: https://doi.org/10.6084/m9.figshare.21806370.v2; BIN dataset: https://doi.org/10.6084/m9.figshare.21815142; ASAP dataset: https://doi.org/10.6084/m9.figshare.21815064.v1). The datasets on BOLD can be found under doi.org/10.5883/DS-GMTINDO1 and doi.org/10.5883/DS-GMTINDO2.
Funding: The project was supported by the Deutsche Forschungsgemeinschaft and the Bundesministerium für Bildung und Forschung (BMBF) within the bilateral "Biodiversity and Health" funding program (Project numbers: 16GW0111K, 16GW0112) with additional support from DIPA PUSLIT Biologi LIPI 2015-2016. The sequence analyses for this study were supported, in part, by Genome Canada through the Ontario Genomics Institute, while informatics support was provided through a grant from the Ontario Ministry of Research and Innovation. The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In the age of rapid biodiversity decline, taxonomists find themselves in a race against time to discover and describe new species before they become extinct [1–5]. However, identifying species in several megadiverse groups of organisms requires in-depth taxonomic expertise, which is either in decline or very limited, the latter being the case in the so-called dark taxa [6, 7]. This mismatch between high species numbers awaiting discovery and few researchers available for doing so is also known as the “taxonomic impediment”. It is prominent among arthropods [8], and considering that arthropods account for a large proportion of the animal biomass and diversity in terrestrial ecosystems [9–11], is a direct constraint to global biodiversity research. Often, ecological surveys must limit their analyses to a subset of known species (e.g., flagship indicator species) because there is not enough know-how to analyze the highly abundant, often minute specimens that make up the bulk of the sample [12, 13].
As a potential remedy, molecular identification techniques have greatly evolved in the last decade, providing accelerated sample processing methodologies in various fields of research [14]. DNA barcoding, for example, is a method that uses a short DNA sequence of the COI gene in the mitochondrial DNA to identify and distinguish species from one another [15–17]. Paul Hebert and colleagues first introduced it in 2003, and today, it is a standard approach for molecular identification or presorting species [18]. DNA barcoding is easy to use [even for non-experts], widely available, and nowadays economic [16, 19–22].
In 2012, the Global Malaise trap Program was initiated by the Centre for Biodiversity Genomics (CBG) at the Biodiversity Institute of Ontario (BIO) with the large-scale worldwide deployment of Malaise traps (see https://biodiversitygenomics.net/site/projects/gmp/). Malaise traps are very efficient at capturing flying insects and are, therefore, commonly used in surveys of terrestrial arthropods [23–26]. More than 158 sites in 33 countries were sampled and analyzed via DNA barcoding to provide an overview of the global arthropod biodiversity and provide detailed temporal and spatial information on arthropod communities (see https://biodiversitygenomics.net/site/projects/gmp/). In a joint project with the Andalas University, two Malaise traps were deployed in Padang, Sumatra, Indonesia, and operated for three months each. Insect communities in tropical regions are notorious for being extraordinarily diverse [11, 27, 28] yet severely understudied [29, 30], making the large-scale sequencing of the Malaise traps contents especially interesting. In this study, we present and evaluate the sequencing results recovered for each Malaise trap.
Materials and methods
Collecting
In 2016, we deployed two Malaise traps, installed ca. 360 m apart from each other. We set up the traps at the northern forest edge of the 500-hectare campus area of the University of Andalas at the eastern part of Padang City, West Sumatra Province, Indonesia (Fig 1). The traps were located in a semi-open area dominated by ferns, interspersed with medium-sized and a few large-sized trees. The trap locations were set up in spots with sparse vegetation in such a way that flight paths were open in both directions of the traps. The adjacent tropical forest was dominated by secondary tree vegetation and is connected to the Bukit Barisan mountain range. Both Malaise traps were operated from May 5th to July 30th. The collection bottles were emptied biweekly and topped up with fresh 80% EtOH. Samples were stored in a freezer until further processing. Because both traps were located on the grounds of the university, no collection permit was needed.
Malaise trap sites near Padang, West Sumatra. Created by the authors using QGis.
Sample processing
All collection bottles were sent to the Centre for Biodiversity Genomics for sorting and further processing. An attempt was made to barcode as many specimens as possible, but due to funding constraints, not all specimens were processed, and for some collection bottles, a maximum of fifteen 96-well microplates were filled (Table 1). A selective strategy was used to narrow down the number of specimens especially for the samples with only 15 plates processed. Individuals chosen for sequencing were selected to capture as much diversity as possible based on size and morphospecies. Two sizes of sieves were used to subsample from three different size classes (no sieve, 8mm sieve, and 2mm sieve). As most of the diversity was likely hidden in the smaller organisms (particularly the abundant insect orders: Hymenoptera, Coleoptera, and Diptera), more specimens were chosen from the smallest size class. Additionally, more Hymenoptera and Coleoptera were selected as opposed to Diptera because Diptera are often so abundant in Malaise trap samples that there is a higher risk of oversampling the same species.
Collection dates, sequencing capacity, specimens processed, and sequences obtained per sample.
Tissue lysis was performed overnight at 56°C, and DNA extraction was conducted using an automated, silica membrane-based protocol [31]. To reduce costs and the amount of reagents needed for PCR amplification of the COI gene, the DNA extracts from four 96-well plates were consolidated into 386-well PCR plates [23, 32]. The PCR products were diluted, unidirectionally sequenced, then cleaned-up using an automated magnetic bead-based method before being sequences on an ABI 3730 xl DNA Analyzer (Applied Biosystems). For more details on the laboratory protocols, see [23].
All barcoded specimens are currently stored at the Center for Biodiversity Genomics (CBG) natural history archive (collection code BIOUG) at the University of Guelph, Canada. However, this collection, as well as the rest of the unprocessed material, will eventually be repatriated to Museum Zoological Bogoriense in Cibinong, Indonesia.
Data analysis
All specimen metadata and sequence data were uploaded to the Barcode of Life Data System (BOLD), an online workbench and database [32]. All data is publicly available on BOLD in two datasets (doi.org/10.5883/DS-GMTINDO1 and doi.org/10.5883/DS-GMTINDO2). We also uploaded the BOLD data spreadsheet including all metadata of specimens to Figshare.
Sequences were assigned a Barcode Index Number (BIN) by the BOLD system using the RESL-algorithm. BINs represent globally unique identifiers for clusters of sequences as a species proxy [32]. Every sequence ≥ 300 base pairs (bp) is automatically assigned to a Barcode Index Number (BIN) that is already in BOLD if sequence similarity based on the RESL-algorithm is fulfilled [32]. Sequences with ≥ 500 bp which do not find a match, serve as founders of new BINs. Family-level identifications were conducted using the BIN taxonomy match tool on BOLD.
All analyses were performed in R version 4.2.1 [33], using the packages vegan version 2.5–7 [34], iNEXT version 2.0.20 [35], and SpadeR version 0.1.1 [36]. To assess our sampling effort, we created accumulation curves of BINs for each Malaise trap (via iNEXT; iNEXT package) and estimated the species diversity present at each sampling site (via ChaoSpecies; SpadeR package). We created continuous diversity profiles for each trap (via Diversity; SpadeR package) to illustrate the variation in the three standard metrics of biodiversity that are quantified by Hill numbers (q): species richness (q = 0), Shannon diversity (q = 1), and Simpson diversity (q = 2). Hill numbers are a mathematically consolidated group of diversity indices that include relative species abundances to quantify biodiversity. To evaluate the faunal similarity between Malaise traps, we performed permutation multivariate analysis of variance (PERMANOVA) (via adonis2; vegan package; Bray Curtis dissimilarity; 999 permutations). We differentiated between location and dispersion effects by applying a beta dispersion test analogous to Levene’s test (via betadisper; vegan package) and an F-test (via permutest; vegan package). For visualization, we created a non-metric dimensional scaling (NMDS) ordination (via metamds; vegan package; Bray Curtis dissimilarity). Using the universal insect trait tool (ITT; version 1.0) [37], we categorized all arthropod families into ecological guilds to analyze differences of the functional diversity between the insect communities of the two trap sites in addition to their taxonomic diversity.
Because the BIN concept has been challenged recently [38], we decided to compare the number of OTUs recovered with other clustering algorithms. BINs should not be considered synonymously of “species”, but rather as a dynamic tool to presort the global DNA barcode database into MOTUs that taxonomists can further evaluate; BIN definitions might change on BOLD as more sequences are added to the database. Since the assignment of BINs in a dataset is affected by other sequences in BOLD that are not included in the dataset, we analyzed the sequences of our datasets using the “Cluster Sequences” option in BOLD. This way, the resulting OTUs are directly comparable to the results of other species delimitation algorithms. As a consequence, the number BINs found in our project on BOLD are slightly higher than in our analyses because the system assigns BINs to sequences between 300 and 499 bp if the BIN is already present in the database, whereas we limited analyses to sequences displaying a minimum length of 500 bp. In addition to RESL, we analyzed our data using the Assemble Species by Automatic Partitioning program (ASAP) [39] using the web interface, and we analyzed the same data using SpeciesIdentifier version 1.9 [40]. ASAP employs pairwise genetic distances for hierarchical clustering without using the information on intraspecific diversity, and SpeciesIdentifier is an algorithm that allows clustering sequences based on their pairwise genetic distances (p-distances). To visualize the outputs of the different clustering algorithms (RESL, ASAP, SpeciesIdentifier), we created accumulation curves (via iNEXT; iNEXT package) depicting the number of clusters obtained for each Malaise trap. Detailed specimen and sequence data are accessible in BOLD as two citable datasets (doi.org/10.5883/DS-GMTINDO1 and doi.org/10.5883/DS-GMTINDO2).
Results
Alpha-diversity assessments
We obtained 39,374 COI-sequences from Malaise trap 1, and 19,394 for Malaise trap 2 which led to the recovery of 6,177 and 5,206 BINs respectively. Together, we obtained a total of 9,212 BINs, with 2,171 being shared between traps. More than two-thirds (6,125) of all BINs were unique to BOLD, meaning that they were added for the first time with the upload of these sequences. Of the 58,769 specimens that were successfully sequenced, only 961 automatically obtained a species-level identification, providing coverage for 231 species. The majority of sequences provided identification only to the family level (94%), and most of these were associated to families of insects that are reknown for being challenging to study and therefore highly underrepresented in databases (see Discussion). In this study, eight families of dark taxa were largely represented among our data, namely Cecidomyiidae (gall midges), Ceratopogonidae (biting midges), Chironomidae (non-biting midges), Phoridae (scuttle flies), Psychodidae (sand flies), Sciaridae (dark-winged fungus gnats), Platygastridae and Braconidae (parasitoid wasps). These eight families make up 70% of all specimen numbers, and 58% of all BINs. Fig 2, which presents the frequency of rare and common BINs among the merged dataset, shows that the majority of BINs (66%; 6,078 BINs) were represented by one or two specimens only.
The majority of BINs are rare and are represented by one (BIN frequency = 1) or two (BIN frequency = 2) specimens only. The pie charts represent the proportion of dark taxa among the BIN diversity (in black). These include members of Cecidomyiidae, Ceratopogonidae, Chironomidae, Phoridae, Psychodidae, Braconidae, and Platygastridae.
Malaise trap 1.
The BINs recovered from Malaise trap 1 provide coverage for 231 families in 21 arthropod orders. The top ten most diverse (from most to least diverse) families are Cecidomyiidae (Diptera; 1,858 BINs), Chironomidae (Diptera; 491 BINs), Ceratopogonidae (Diptera; 470 BINs); Phoridae (Diptera; 439 BINs), Platygastridae (Hymenoptera; 284 BINs); Sciaridae (Diptera; 239 BINs), Psychodidae (Diptera; 145 BINs), Formicidae (Hymenoptera; 125 BINs), Cicadellidae (Hemiptera; 111 BINs), and Braconidae (Hymenoptera; 105 BINs). In total, these families represent 70% of all recovered BINs for this Malaise trap. Chao1 analysis estimated that about 11,000 species may occur at this sampling site, and extrapolation to double the number of captured and processed specimens would have increased the number of recovered BINs to 8,531, which is an increase of 38% (Table 2). In the diversity profile, there is no overlap between the species richness obtained through the analysis of specimens and that estimated to occur at the trap sites (Hill number q = 0, Fig 3B).
Dotted lines represent extrapolated values (up to double the sampling effort), bold lines represent interpolated values. Shaded areas represent the 95% confidence intervals.
Malaise trap 2.
Although we processed substantially fewer specimens from Malaise trap 2, we obtained almost as many BINs (Table 2 and Fig 3A). The BINs from Malaise trap 2 provide coverage for 254 families in 24 arthropod orders. The ten most diverse families are (from most to least diverse): Cecidomyiidae (Diptera; 1,003 BINs), Phoridae (Diptera; 484 BINs), Platygastridae (Hymenoptera; 305 BINs), Sciaridae (Diptera; 220 BINs), Ceratopogonidae (Diptera; 189 BINs), Chironomidae (Diptera; 186 BINs), Cicadellidae (Hemiptera; 158 BINs), Braconidae (Hymenoptera; 152), Erebidae (Lepidoptera; 128 BINs), and Psychodidae (Diptera; 128 BINs). In total, these families represent 86% of all recovered BINs. Chao1 analysis revealed that about 10,000 species might occur at this trap site. Doubling the number of captured specimens would have increased the obtained BIN diversity to 7,481, an increase of 44% (Table 2). As for Malaise trap 1, there is no overlap between the number of empirical BINs obtained from our analyses and the species richness estimated to be present at the site (Fig 3C)
Beta-diversity analysis
Analysis revealed that 2,171 BINs are shared between both traps, and Chao1-shared estimates suggest that up to 4,281 (± 183) BINs are shared between both communities at the trap sites. PERMANOVA analysis of the sample contents uncovered that the arthropod communities from the Malaise traps are significantly distinct from one another (adonis2 p = 0.001) and that this significance is driven by location effects only (S1 Table). In the NMDS ordination, collection samples are clearly clustered based on Malaise trap (S1 Fig). Evaluating the data in more detail, we see that despite high species turnover, both traps depict similar compositions at the family level, which in turn has the same effect on the guild composition (Fig 4A and 4B).
a. Relative BIN diversity across the top most abundant families in our Malaise traps. b. Relative BIN diversity across ecological guilds.
COI clusters across methods
In total, 77,497 specimens of insects were processed, 52,362 from Malaise trap 1 and 25,135 from Malaise trap 2. Excluding all flagged sequences from analysis (and retaining only those with at least 500 bp) reduced our numbers to 39,374 and 19,394 COI-sequences for each trap respectively. For comparative analysis of cluster algorithms (in terms of cluster diversity), we reran the RESL-algorithm on these sequences which led to the recovery of 6,283 (MT1) and 5,253 (MT2) OTUs that are unique to our project (Table 2). SpeciesIdentifier (using the 3% threshold) suggested slightly fewer clusters than the RESL-algorithm, while ASAP (1st partition) calculated more conservative values, i.e., a much lower number of putative species (Table 2 and Fig 5).
Recovered with each clustering algorithm (R: RESL, A: ASAP, S: SpeciesIdentifier) for each Malaise trap. Dotted lines represent extrapolated values (up to double the sampling effort), bold lines represent interpolated values. Shaded areas represent the 95% confidence intervals.
Discussion
Overwhelming species richness despite drastic undersampling
All accumulation curves (Figs 3A and 5) and diversity profiles (Fig 3B and 3C) indicate that we have drastically undersampled both trap sites. This was expected for several reasons. First, our collection effort was limited in space and time, using two Malaise traps for three months only. Unlike temperate regions, generally speaking, no individual season in the tropics is highly unsuitable in terms of activity for all arthropod species [41], meaning that arthropods are present and mobile all year round [41, 42]. Therefore, sampling only three months provides a limited coverage of temporal species diversity. Second, while Malaise traps are very effective at collecting arthropods [26], we did not use any additional sampling method to incorporate the diverse canopy communities present in many tropical forests [29, 43, 44]. Our sampling techniques targeted arthropods that are found in the litter and understory habitats, whereas [43] and colleagues have demonstrated that the highest species richness is found in the forest canopy. Third, we did not process all collected individuals due to economic constraints. We had a total of eleven collection events per trap. Seven bulk samples that were collected with Malaise trap 1 were processed entirely; however, sequencing of all other samples was limited to 15 (1,475 specimens) plates per sample. Had we doubled our sampling effort, we would have recovered at least 38% and 44% more putative species for Malaise trap 1 and 2, respectively (Fig 3A). Sampling was slightly more comprehensive with Malaise trap 1, which is presumably due to the fact that more individuals were processed from these samples. Nevertheless, we clearly only recovered a fraction of the actual diversity present at the sites: Chao1 calculations estimated much higher species numbers for each trap site, and we see no overlap between empirical and estimated BIN numbers for all three diversity orders (species richness q = 0; Shannon diversity q = 1; Simpson diversity q = 2).
Patchiness in arthropod diversity
Beta diversity assessments show that the communities from each trap site are significantly distinct and that this difference is driven by location effects only (all samples were dispersed homogeneously) (S1 Table and S1 Fig). Even after pooling all collection events together, we observed only 24% overlap in putative species between traps despite the close proximity (< 400 m). One can argue that due to the limited sequencing of sample contents, we are unknowingly comparing two very different subsets of actual similar communities, which was also mentioned by [45] as a factor contributing to the overestimation of beta-diversity [45]. However, we suggest that this is not the case because we recovered more than 80% sample coverage for each Malaise trap. Instead, we argue that we here witness arthropod diversity patchiness, as described by [46]. Forest floors are highly heterogeneous in the tropics over small spatial scales, resulting in high microhabitat richness in x-dimensions [46–48]. Nutrient availability, habitat heterogeneity, spatial variation of plant communities, degree of exposure to predators, and ecosystem disturbances are just some factors that define these microhabitats and their arthropod communities [47, 49, 50].
Prior studies on tropical rainforests have demonstrated that because the majority of insects are herbivores and host-specific, vegetation has a high impact on the prevailing arthropod species, which can account for up to 60% of insect variation [45, 51]. In our study, almost half of all recovered species per trap were phytophages (Fig 4B), meaning that differing vegetation at each trap site could be a driving factor behind the high species turnover [52]. Moreover, because Malaise traps capture insects that happen to fly through a very limited area, various factors such as trap location, orientation, height based on vegetation, light exposure and surrounding structures also have a direct effect on captured communities [53, 54]. In a recent study [54], examined the effects of Malaise trap spacing on species richness and composition, and found that community-similarity decreased among all major taxa with increasing trap-to-trap distances. Also, they found that 18 m between traps was the cut-off value where the number of shared species dropped significantly [54]. These results reinforce our assumption that we are in fact sampling and comparing two very different insect communities with one another.
Guild structures
Despite recovering a high species turnover between trap sites, community compositions at the family level were very similar (Fig 4A). In consequence, guild structure was also conserved (Fig 4B). However, we highly encourage further research to look into this because we analyzed guild structures only at the family-level. Although it is convenient to place entire families into guilds, it is also a source of error because species of the same families can cover a wide range of feeding behaviors [48]. However, assigning single species to guilds is a major challenge, especially in large-scale surveys. There is too little literature on the feeding activities of single species, and even then, different life stages of the same species can fall into different guild categories (e.g., parasitoid Hymenoptera), and for some taxa, feeding activities of adult species are completely unknown [48]. Also, only a small proportion of our sequences provided identification at the species level, meaning that we cannot apply feeding traits to species proxies. In this study, we did not conduct morphological identifications. Instead, all family-level identifications were assigned automatically using the identification tool on BOLD. It is therefore important to note that accurate results are only guaranteed provided that high quality reference libraries are being used as a backbone, which include sequences of vouchers that have been accurately identified morphologically. Despite these sources of bias, we still believe that we can rely on these assigned identifications as we are only using them at the family-level, for which extensive information is available on BOLD.
For the family-level guild assignment, we used the Insect Trait Tool that was developed by [37]. Because this tool was developed for the Central European fauna, the extended trait information provided by the tool may not be accurate for tropical fauna. However, because we conducted only a broad guild analysis, we do not think that this is problematic in our study.
Dark taxa: Abundant, diverse, unknown
In our study, the majority of all BINs were rare, being represented by one or two specimens only (Fig 2). Although we did expect to capture a high proportion of singleton species, we recovered a surprisingly higher frequency of rare species than expected for large-scale tropical surveys, which is typically at about 32% [55, 56]. A closer look at the data revealed that the majority of these singletons are associated with “dark taxa”, highly diverse groups of arthropods (mostly Diptera and Hymenoptera) for which little taxonomic or life-history information is available [6, 8]. In total, 70% (40,807) of all processed specimens and 58% (5,340 BINs) of all recovered putative species in this study are shared by eight dark taxa families only, namely Cecidomyiidae (gall midges), Ceratopogonidae (biting midges), Chironomidae (non-biting midges), Phoridae (scuttle flies), Psychodidae (sand flies), Sciaridae (dark-winged fungus gnats), Platygastridae and Braconidae (both parasitoid wasps).
As demonstrated in this study, dark taxa can be highly abundant and often make up the bulk of an insect sample not only in the tropics, but also in temperate regions [6, 57]. With this being a global phenomenon, the inability to associate these insects to species names or ecological functions is a large constraint to biodiversity research, conservation priority setting as well as understanding ecosystem functioning. One recent publication [20] highlighted that dark taxa are so abundant that they should be included in any holistic biodiversity assessment, but tackling them with traditional taxonomic techniques is too slow [20, 58]. Specimens of dark taxa are often small-bodied and cryptic diverse, so often (especially for Diptera), specimens need to be dissected and studied microscopically. Moreover, species identifications for these insects is often only possible with the use of multiple approaches in parallel to ensure accurate results. Integrative approaches that combine various methodologies are therefore becoming ever more important in making these groups tangible to science [4, 20, 59].
Since 2020, the third phase of the nationwide German Barcode of Life project (GBOL III: Dark Taxa; https://bolgermany.de/home/gbol3/de/projekte/) is dedicated to tackling difficult groups of taxa and training a new generation of taxonomists. In this initiative, integrative methods are being used in order to speed up the identification of dark taxa and making them more tangible to science. to do this, researchers are using (among others) a reverse and integrative taxonomical approach to effectively target and study their groups of interest. This consists of first applying molecular methods (including MinION technologies) to rapidly distinguish sequences clusters among thousands of preselected specimens, then applying morphological methods to target specimens of specific clusters for species identification. This technique drastically reduces the workload because time-consuming specimen processing and morphological analysis is drastically reduced. However, this approach it still time consuming, because it still requires the processing of thousands of individuals, as in our case [60]. One technology that is currently expediting biomonitoring surveys is metabarcoding, which allows the analysis of entire bulk samples in one sequencing run [14, 17]. However, this method only provides information on community compositions and not on abundance data, nor it the link between sequence and specimen conserved [60, 61]. This makes it especially difficult to study dark taxa because they consist of many species that are not yet described, so these remain undescribed because specimens cannot be easily pinpointed [60].
Just recently, new technological developments have emerged which can help accelerate biomonitoring studies by speeding up the greatest bottleneck of ecological research–sample sorting. Bulk samples of arthropods often contain hundreds to thousands of specimens, that need to be sorted before conducting species-level analyses. In their study, [60] present a compact insect sorting robot which has the ability to recognize and sort insect specimens based on overview images of bulk samples. Especially interesting is the fact this robot, the DiversityScanner, is able to process very small specimens (<3 mm) [60]. Specimens are automatically selected by the scanner, imaged, assigned to a class or family, then moved to a microplate. Another study, [62], propose a workflow that combines HotSHOT with MinION technologies to conduct fast and accurate species-level sorting of ecological samples [62]. With a modest amount of equipment, manpower, and training, the authors were able to conduct species-level sorting within hours, which came down to 2.5 minutes per specimen. Of course, species identification can only be provided if identified sequences are present in databases, however, coupling this approach with the aforementioned reverse workflow that is applied in the GBOL III project could drastically expedite the work for taxonomists. Because no taxonomic expertise is necessary for the laboratory produces, taxonomists can be first brought on board to analyze vouchers after cluster analysis.
Employing DNA-based delimitation methods: Working with species proxies
BOLD not only provides a variety of analytical- and visualization techniques, its interface is also very user-friendly, making it easy for all researchers (even with little or no bioinformatic knowledge) to use [32]. Due to this, BOLD is commonly used in DNA barcoding research, so consequently, its integrated RESL algorithm and BIN system is also commonly used for sequence data clustering. For our analyses, we used BIN-counts as a proxy for species diversity, as has been done in various studies [6, 23, 63–66]. However, there are varying opinions regarding using BINs for species delimitation [39, 46], especially when assuming that BIN numbers are equal to species numbers in a 1:1 ratio. Therefore, as [67] recommended, we analyzed our sequence data with several species delimitation methods that apply different algorithms to compare the number of clusters recovered with each method. We used SpeciesIdentifier for objective clustering using a preset threshold (3%) for comparative purposes and to increase confidence regarding the relative extent of diversity in our traps. We recovered slightly fewer clusters than with the RESL-algorithm from BOLD [Malaise Trap 1: 5,967 versus 6,283 OTUs; Malaise Trap 2: 5,054 versus 5,253 OTUs]. With ASAP, hierarchical clustering was done using pairwise genetic distances of sequences. The program builds numerous partitions ranked by scores, with the best ones provided in the output to be used for analysis. With ASAP, we obtained much more conservative cluster counts than with the RESL (and SpeciesIdentifier) algorithm, especially for Malaise trap 1 (Table 2 and Fig 5). Analysis across methods displayed similar trends in regard to sample coverage, depicting that the sample contents of Malaise trap 1 were much better sampled than of Malaise trap 2 (Table 2).
Conclusion
Here, processing only a fraction of bulk samples collected during merely three months of Malaise trap sampling recovered more than 9,000 putative species and high species turnover among two very close sites. Despite processing more than 77,000 specimens, community analysis suggests that we strongly undersampled both collection sites. Community compositions at the family level were conserved between traps, revealing similar ecological guild functions. The majority of specimens collected and processed belong to the so-called dark taxa, for which little taxonomic and life history information is available. Comprehensive specimen sampling, KI-powered sample processing, and highest throughput sequencing coupled with trait analysis will be crucial to address this knowledge gap, for which the technological is being created now [38, 68–70].
Supporting information
S1 Fig. Non-metric dimensional scaling (NMDS) of the community compositions.
NMDS plot of the insect community compositions within each collection sample. Ellipses are 95% confidence intervals of centroids for each Malaise trap.
https://doi.org/10.1371/journal.pone.0290173.s001
(PDF)
S1 Table. Statistical analysis of the community compositions.
https://doi.org/10.1371/journal.pone.0290173.s002
(PDF)
Acknowledgments
We would like to thank the Ontario Genomics Institute conducting sequence analysis and informatics.
References
- 1. Borkent A, Brown BV. How to inventory tropical flies (Diptera)—One of the megadiverse orders of insects. Zootaxa. 2015 Apr 28;3949(3):301–22. pmid:25947810
- 2. Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, Schwan H, et al. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLOS ONE. 2017 Oct 18;12(10):e0185809. pmid:29045418
- 3. Pimm SL, Jenkins CN, Abell R, Brooks TM, Gittleman JL, Joppa LN, et al. The biodiversity of species and their rates of extinction, distribution, and protection. Science. 2014 May 30;344(6187):1246752. pmid:24876501
- 4. Riedel A, Sagata K, Suhardjono YR, Tänzler R, Balke M. Integrative taxonomy on the fast track - towards more sustainability in biodiversity research. Front Zoology. 2013 Mar 27;10(1):15. pmid:23537182
- 5. Seibold S, Gossner MM, Simons NK, Blüthgen N, Müller J, Ambarlı D, et al. Arthropod decline in grasslands and forests is associated with landscape-level drivers. Nature. 2019 Oct; 574(7780):671–4. pmid:31666721
- 6. Chimeno C, Hausmann A, Schmidt S, Raupach MJ, Doczkal D, Baranov V, et al. Peering into the darkness: DNA barcoding reveals surprisingly high diversity of unknown species of Diptera (Insecta) in Germany. Insects. 2022 Jan;13(1):82. pmid:35055925
- 7. Page RDM. DNA barcoding and taxonomy: dark taxa and dark texts. Philosophical Transactions of the Royal Society B: Biological Sciences. 2016 Sep 5;371(1702):20150334. pmid:27481786
- 8. Hausmann A, Krogmann L, Peters R, Rduch V, Schmidt S. GBOL III: DARK TAXA. iBOL Barcode Bulletin. 2020 Jul 10;10.
- 9. Dufour DL. Insects as Food: A Case Study from the Northwest Amazon. American Anthropologist. 1987;89(2):383–97.
- 10. Kremen C, Colwell RK, Erwin TL, Murphy DD, Noss RF, Sanjayan MA. Terrestrial Arthropod Assemblages: Their Use in Conservation Planning. Conservation Biology. 1993;7(4):796–808.
- 11.
Longino JT. Tropical Insect Diversity -How to Sample It. In: Claro KD, Oliveira PS, Rico-Gray V, editors. Tropical Biology and Conservation Management—Volume XI: Case Studies. EOLSS Publications; 2009. p.59–84.
- 12. Farwig N, Bendix J, Beck E. Introduction to the Special Issue “Functional monitoring in megadiverse tropical ecosystems.” Ecological Indicators. 2017 Dec 1;83:524–6.
- 13. Sattler T, Duelli P, Obrist MK, Arlettaz R, Moretti M. Response of arthropod species richness and functional groups to urban habitat structure and management. Landscape Ecol. 2010 Jul 1;25(6):941–54.
- 14. Chimeno C, Hübner J, Seifert L, Morinière J, Bozicevic V, Hausmann A, et al. Depicting environmental gradients from Malaise trap samples: Is ethanol-based DNA metabarcoding enough? Insect Conserv Divers. 2023;16(1):47–64.
- 15. Aylagas E, Borja Á, Irigoien X, Rodríguez-Ezpeleta N. Benchmarking DNA metabarcoding for biodiversity-based monitoring and assessment. Front Mar Sci [Internet]. 2016 [cited 2021 Apr 13]. Available from: https://www.frontiersin.org/articles/10.3389/fmars.2016.00096/full
- 16. Jinbo U, Kato T, Ito M. Current progress in DNA barcoding and future implications for entomology. Entomological Science. 2011;14(2):107–24.
- 17. Morinière J, Araujo BC de, Lam AW, Hausmann A, Balke M, Schmidt S, et al. Species Identification in Malaise Trap Samples by DNA Barcoding Based on NGS Technologies and a Scoring Matrix. PLOS ONE. 2016 May 18;11(5):e0155497. pmid:27191722
- 18. Hebert PDN, Cywinska A, Ball SL, deWaard JR. Biological identifications through DNA barcodes. Proc R Soc Lond B. 2003 Feb 7;270(1512):313–21. pmid:12614582
- 19. Baloğlu B, Clews E, Meier R. NGS barcoding reveals high resistance of a hyperdiverse chironomid (Diptera) swamp fauna against invasion from adjacent freshwater reservoirs. Frontiers in Zoology. 2018 Aug 14;15(1):31. pmid:30127839
- 20. Hartop E, Srivathsan A, Ronquist F, Meier R. Large-scale Integrative Taxonomy (LIT): resolving the data conundrum for dark taxa. bioRxiv. 2021 May 9;2021.04.13.439467.
- 21. Hausmann A, Godfray HCJ, Huemer P, Mutanen M, Rougerie R, Nieukerken EJ van, et al. Genetic Patterns in European Geometrid Moths Revealed by the Barcode Index Number (BIN) System. PLOS ONE. 2013 Dec 17;8(12):e84518. pmid:24358363
- 22. Yeo D, Puniamoorthy J, Ngiam RWJ, Meier R. Towards holomorphology in entomology: rapid and cost-effective adult-larva matching using NGS barcodes: Life-history stage matching with NGS barcodes. Syst Entomol. 2018 Oct;43(4):678–91.
- 23. deWaard JR, Ratnasingham S, Zakharov EV, Borisenko AV, Steinke D, Telfer AC, et al. A reference library for Canadian invertebrates with 1.5 million barcodes, voucher specimens, and DNA samples. Sci Data. 2019 Dec 6;6(1):308.
- 24. Geiger MF, Astrin JJ, Borsch T, Burkhardt U, Grobe P, Hand R, et al. How to tackle the molecular species inventory for an industrialized nation-lessons from the first phase of the German Barcode of Life initiative GBOL (2012–2015). Genome. 2016 Sep;59(9):661–70. pmid:27314158
- 25. Karlsson D, Hartop E, Forshage M, Jaschhof M, Ronquist F. The Swedish Malaise Trap Project: A 15 Year Retrospective on a Countrywide Insect Inventory. Biodiversity Data Journal. 2020 Jan 21;8:e47255. pmid:32015667
- 26. Matthews RW, Matthews JR. The Malaise Trap: Its Utility and Potential for Sampling Insect Populations. 2017;4(4):7.
- 27. Cancian de Araujo B, Schmidt S, Schmidt O, Rintelen T von, Rintelen K von, Floren A, et al. DNA barcoding data release for Coleoptera from the Gunung Halimun canopy fogging workpackage of the Indonesian Biodiversity Information System (IndoBioSys) project. Biodiversity Data Journal. 2019 Jan 15;7:e31432. pmid:30686928
- 28. Cancian de Araujo B, Schmidt S, Schmidt O, Rintelen T von, Ubaidillah R, Balke M. The Mt Halimun-Salak Malaise Trap project—releasing the most species rich DNA Barcode library for Indonesia. Biodiversity Data Journal. 2018 Dec 19;6:e29927.
- 29. Basset Y, Cizek L, Cuénoud P, Didham RK, Guilhaumon F, Missa O, et al. Arthropod Diversity in a Tropical Forest. Science. 2012 Dec 14;338(6113):1481–4. pmid:23239740
- 30. Stork NE. How Many Species of Insects and Other Terrestrial Arthropods Are There on Earth? Annu Rev Entomol. 2018 Jan 7;63(1):31–45. pmid:28938083
- 31. Ivanova NV, Dewaard JR, Hebert PDN. An inexpensive, automation-friendly protocol for recovering high-quality DNA. Molecular Ecology Notes. 2006;6(4):998–1002.
- 32. Ratnasingham S, Hebert PDN. BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Molecular Ecology Notes. 2007 Jan 24;7(3):355–64.
- 33.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Journal. 2012;
- 34.
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, et al. vegan: Community Ecology Package [Internet]. 2020 [cited 2021 Oct 6]. Available from: https://CRAN.R-project.org/package=vegan
- 35.
Hsieh TC, Chao KHM and A. iNEXT: Interpolation and Extrapolation for Species Diversity [Internet]. 2020 [cited 2021 Oct 6]. Available from: https://CRAN.R-project.org/package=iNEXT
- 36.
Chao A, Ma KH, Chiu TCH and CH. SpadeR: Species-Richness Prediction and Diversity Estimation with R [Internet]. 2016 [cited 2021 Oct 8]. Available from: https://CRAN.R-project.org/package=SpadeR
- 37.
Hörren T, Sorg M, Hallmann CA, Zizka VMA, Ssymank A, Noll NW, et al. A universal insect trait tool (ITT, v1.0) for statistical analysis and evaluation of biodiversity research data [Internet]. Ecology; 2022 Jan [cited 2022 Sep 7]. Available from: http://biorxiv.org/lookup/doi/10.1101/2022.01.25.477751
- 38. Meier R, Blaimer BB, Buenaventura E, Hartop E, Rintelen T, Srivathsan A, et al. A re‐analysis of the data in Sharkey et al.’s (2021) minimalist revision reveals that BINs do not deserve names, but BOLD Systems needs a stronger commitment to open science. Cladistics. 2022 Apr;38(2):264–75.
- 39. Puillandre N, Brouillet S, Achaz G. ASAP: assemble species by automatic partitioning. Mol Ecol Resour. 2021 Feb;21(2):609–20. pmid:33058550
- 40. Meier R, Shiyang K, Vaidya G, Ng PKL. DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol. 2006 Oct;55(5):715–28. pmid:17060194
- 41. Denlinger DL. Dormancy in Tropical Insects. Denlinger, D L (1986) Dormancy in Tropical Insects Annual Review of Entomology, 31(1), 239–264 1986;31(1):239–64. pmid:3510585
- 42. Kishimoto-Yamada K, Itioka T. How much have we learned about seasonality in tropical insect abundance since Wolda (1988)? Entomological Science. 2015;18(4):407–19.
- 43. Basset Y, Cizek L, Cuénoud P, Didham RK, Novotny V, Ødegaard F, et al. Arthropod Distribution in a Tropical Rainforest: Tackling a Four Dimensional Puzzle. PLOS ONE. 2015 Dec 3;10(12):e0144110.
- 44. Ozanne CMP, Anhuf D, Boulter SL, Keller M, Kitching RL, Körner C, et al. Biodiversity Meets the Atmosphere: A Global View of Forest Canopies. Science. 2003 Jul 11;301(5630):183–6. pmid:12855799
- 45. Novotny V, Drozd P, Miller SE, Kulfan M, Janda M, Basset Y, et al. Why Are There So Many Species of Herbivorous Insects in Tropical Rainforests? Science. 2006 Aug 25;313(5790):1115–8. pmid:16840659
- 46. Milton Y, Kaspari M. Bottom-up and top-down regulation of decomposition in a tropical forest. Oecologia. 2007 Aug 1;153(1):163–72. pmid:17375326
- 47. Sayer EJ, Sutcliffe LME, Ross RIC, Tanner EVJ. Arthropod Abundance and Diversity in a Lowland Tropical Forest Floor in Panama: The Role of Habitat Space vs. Nutrient Concentrations. Biotropica. 2010;42(2):194–200.
- 48. Stork NE. Guild structure of arthropods from Bornean rain forest trees. Ecol Entomol. 1987 Feb;12(1):69–80.
- 49. Farfán-Beltrán ME, Chávez-Pesqueira M, Hernández-Cumplido J, Cano-Santana Z. A quick evaluation of ecological restoration based on arthropod communities and trophic guilds in an urban ecological preserve in Mexico City. Rev Chil de Hist Nat. 2022 Dec;95(1):4.
- 50. Krell FT. Parataxonomy vs. taxonomy in biodiversity studies—Pitfalls and applicability of “morphospecies” sorting. Biodiversity and Conservation. 2004 Feb 4;13(4):795–812.
- 51. Lewinsohn TM, Roslin T. Four ways towards tropical herbivore megadiversity. Ecol Letters. 2008 Apr;11(4):398–416. pmid:18248447
- 52. Stuntz S, Ziegler C, Simon U, Zotz G. Diversity and structure of the arthropod fauna within three canopy epiphyte species in central Panama. J Trop Ecol. 2002 Mar;18(2):161–76.
- 53. Chan-Canché R, Ballina-Gómez H, Leirana-Alcocer J, Bordera S, González-Moreno A. Sampling of parasitoid Hymenoptera: influence of the height on the ground. Journal of Hymenoptera Research. 2020 Aug 31;78:19–31.
- 54. Steinke D, Braukmann TW, Manerus L, Woodhouse A, Elbrecht V. Effects of Malaise trap spacing on species richness and composition of terrestrial arthropod bulk samples. Metabarcoding and Metagenomics. 2021 Apr 9;5:e59201.
- 55. Coddington JA, Agnarsson I, Miller JA, Kuntner M, Hormiga G. Undersampling Bias: The Null Hypothesis for Singleton Species in Tropical Arthropod Surveys. Journal of Animal Ecology. 2009;78(3):573–84. pmid:19245379
- 56. Lim GS, Balke M, Meier R. Determining Species Boundaries in a World Full of Rarity: Singletons, Species Delimitation Methods. Systematic Biology. 2012 Jan 1;61(1):165–9. pmid:21482553
- 57. Srivathsan A, Ang Y, Heraty JM, Hwang WS, Jusoh WFA, Narayanan S, et al. Convergence of dominance and neglect in flying insect diversity. Nature Ecology & Evolution. 2023; 7: 1012–1021. pmid:37202502
- 58. Puillandre N, Modica MV, Zhang Y, Sirovich L, Boisselier MC, Cruaud C, et al. Large-scale species delimitation method for hyperdiverse groups. Molecular Ecology. 2012;21(11):2671–91. pmid:22494453
- 59. Chimeno C, Rulik B, Manfrin A, Kalinkat G, Hölker F, Baranov V. Facing the infinity: tackling large samples of challenging Chironomidae (Diptera) with an integrative approach. PeerJ. 2023 May 22;11:e15336. pmid:37250705
- 60. Wührl L, Pylatiuk C, Giersch M, Lapp F, von Rintelen T, Balke M, et al. DiversityScanner: Robotic handling of small invertebrates with machine learning methods. Molecular Ecology Resources. 2022;22(4):1626–38. pmid:34863029
- 61. Creedy TJ, Ng WS, Vogler AP. Toward accurate species-level metabarcoding of arthropod communities from the tropical forest canopy. Ecology and Evolution. 2019;9(6):3105–16. pmid:30962884
- 62. Vasilita C, Feng V, Hansen AK, Hartop E, Srivathsan A, Struijk R, et al. Express barcoding with NextGenPCR and MinION for species-level sorting of ecological samples [Internet]. bioRxiv; 2023 [cited 2023 Jun 13]. p. 2023.04.27.538648. Available from: https://www.biorxiv.org/content/10.1101/2023.04.27.538648v1
- 63. Hebert PDN, Ratnasingham S, Zakharov EV, Telfer AC, Levesque-Beaudin V, Milton MA, et al. Counting animal species with DNA barcodes: Canadian insects. Philosophical Transactions of the Royal Society B: Biological Sciences. 2016 Sep 5;371(1702):20150333. pmid:27481785
- 64. Morinière J, Balke M, Doczkal D, Geiger MF, Hardulak LA, Haszprunar G, et al. A DNA barcode library for 5,200 German flies and midges (Insecta: Diptera) and its implications for metabarcoding-based biomonitoring. Molecular Ecology Resources. 2019;19(4):900–28. pmid:30977972
- 65. Pentinsaari M, Blagoev G, Hogg I, Levesque-Beaudin V, Perez K, Sobel C, et al. A DNA Barcoding Survey of an Arctic Arthropod Community: Implications for Future Monitoring. Insects. 2020 Jan 9;11. pmid:31936447
- 66. Seymour M, Roslin T, deWaard J, Perez K, D’Souza M, Ratnasingham S, et al. Arthropod beta-diversity is spatially and temporally structured by latitude [Internet]. In Review; 2022 Oct [cited 2023 Jan 25]. Available from: https://www.researchsquare.com/article/rs-2180975/v1
- 67. Carstens BC, Pelletier TA, Reid NM, Satler JD. How to fail at species delimitation. Molecular Ecology. 2013;22(17):4369–83. pmid:23855767
- 68. Fujisawa T, Noguerales V, Meramveliotakis E, Papadopoulou A, Vogler AP. Image-based taxonomic classification of bulk insect biodiversity samples using deep learning and domain adaptation. Systematic Entomology. 2023;
- 69. Klink R van, August T, Bas Y, Bodesheim P, Bonn A, Fossøy F, et al. Emerging technologies revolutionise insect ecology and monitoring. Trends in Ecology & Evolution. 2022 Oct 1;37(10):872–85. pmid:35811172
- 70. Srivathsan A, Lee L, Katoh K, Hartop E, Kutty SN, Wong J, et al. ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone. BMC Biol. 2021 Sep 29;19(1):217. pmid:34587965