Insect diversity in the Saharo-Arabian region: Revealing a little-studied fauna by DNA barcoding

Although insects dominate the terrestrial fauna, sampling constraints and the poor taxonomic knowledge of many groups have limited assessments of their diversity. Passive sampling techniques and DNA-based species assignments now make it possible to overcome these barriers. For example, Malaise traps collect specimens with minimal intervention while the Barcode Index Number (BIN) system automates taxonomic assignments. The present study employs Malaise traps and DNA barcoding to extend understanding of insect diversity in one of the least known zoogeographic regions, the Saharo-Arabian. Insects were collected at four sites in three countries (Egypt, Pakistan, Saudi Arabia) by deploying Malaise traps. The collected specimens were analyzed by sequencing 658 bp of cytochrome oxidase I (DNA barcode) and assigning BINs on the Barcode of Life Data Systems. The year-long deployment of a Malaise trap in Pakistan and briefer placements at two Egyptian sites and at one in Saudi Arabia collected 53,092 specimens. They belonged to 17 insect orders with Diptera and Hymenoptera dominating the catch. Barcode sequences were recovered from 44,432 (84%) of the specimens, revealing the occurrence of 3,682 BINs belonging to 254 families. Many of these taxa were uncommon as 25% of the families and 50% of the BINs from Pakistan were only present in one sample. Family and BIN counts varied significantly through the year, but diversity indices did not. Although more than 10,000 specimens were analyzed from each nation, just 2% of BINs were shared by Pakistan and Saudi Arabia, 4% by Egypt and Pakistan, and 7% by Egypt and Saudi Arabia. The present study demonstrates how the BIN system can circumvent the barriers imposed by limited access to taxonomic specialists and by the fact that many insect species in the Saharo-Arabian region are undescribed.

Introduction Although Malaise traps are not an all-purpose collection method [36], they are generally accepted as the most cost-and time-effective [37], leading to their frequent use in biodiversity assessments [38]. Reflecting this fact, the Global Malaise Trap Program (http://globalmalaise. org/) has adopted this sampling method, and coupled it with DNA barcoding to advance understanding of insect diversity. As one element of this program, the present study deployed Malaise traps at four locations in three countries. One trap was installed at the Pakistan Museum of Natural History, Islamabad (33.686˚N, 73.076˚E) in the Shakarparian forest. Insects were collected from 7 February-13 December 2012 excepting 10 days in August/September and one week in July when samples were lost due to storm damage producing 39 collection events (weeks).

Data analysis
Sequences meeting quality criteria (>507 bp, <1% Ns, no stop codon or contamination flag) were assigned to a BIN by the Refined Single Linkage (RESL) algorithm on BOLD [16] which runs monthly on all eligible sequences. Shorter sequences (<507 bp), meeting all the other criteria, were assigned to the matching pre-existing BINs containing longer sequences. Results of this analysis are accessible through individual BIN pages. With few exceptions, each specimen was assigned to a family by sequence matches or by morphological analysis to existing records on BOLD (http://www.boldsystems.org/index.php/IDS_OpenIdEngine). The family level assignment of specimens was based on 90% or higher match of the unknown sequence with the known sequence. Sequences and their associated taxonomic data were subsequently downloaded from BOLD for analysis. Diversity indices, BIN/family incidences over time, and weather-diversity relationships were only examined for Pakistan since this was the sole location with collections for >75% of the year. Monthly mean temperature and relative humidity (RH) values for this site were obtained from the Pakistan Meteorological Department, Islamabad. Diversity indices, Simpson's D [44] and Shannon's H [45] were calculated for each collection event. The significance of diversity variation among the collection events was determined with a χ 2 test. BIN overlap among sites was calculated in Excel while BIN accumulation curves were generated on BOLD.
The BIN/specimen ratio (0.06 vs 0.23) was lower for the most frequent order, Diptera (n = 22,355) than for the least frequent, Coleoptera (n = 2,262) ( Table 1). This pattern was also evident at a family level as the three commonest families had lower BIN/specimen ratios than many of the infrequently collected families. For example, the Formicidae (n = 3,997, ratio = 0.02), Chironomidae (n = 2,593, ratio = 0.03) and Cicadellidae (n = 2,488, ratio = 0.04)

Diversity analysis at the Pakistan site
The 19,068 specimens from Pakistan with barcodes included representatives of 15 orders, 214 families, and 2,248 BINs. Specimens of Diptera (11,126), Hymenoptera (4,728), Hemiptera (1,026), Lepidoptera (970), and Coleoptera (877) dominated the collections with only two other orders contributing more than 100 records (Orthoptera and Psocodea were each represented by six families and by 29/16 BINs respectively). More specimens (16,031 versus 6,593) and BINs (1,592 versus 1,155) were collected from February-June than from July-December, a difference associated with the higher temperatures and lower humidity in the first half of the year (Fig 3).
Most families (54) and BINs (1,090) were encountered only once in the 39 collection events, but two families (Agromyzidae, Formicidae) and one BIN (ACF1938 -Formicidae) were detected in most weeks (38 and 28, respectively) (Fig 4A and 4B). Just 27 BINs were represented by >100 records and only 26 occurred in more than a third of the collection events (S1A and S1B Fig). The number of families and BINs varied significantly (P<0.01) among collection events ( Table 2). Diversity indices for family and BIN richness in each collection event were determined by Simpson's D and Shannon's H and compared among collection events ( Table 2)

BIN overlap among collection sites
BIN assignments allowed quantification of the overlap in species assemblages among the four sites. Among the 2,248 BINs collected in Pakistan and the 728 in Saudi Arabia, just 2% were shared (Fig 5). The overlap between Egypt (991 BINs from the two trap sites) and Saudi Arabia was 7% while that between Egypt and Pakistan was only 4%. The BIN overlap between the two Egyptian sites was higher (19%), but the sites were just 45 km apart. The BIN accumulation curves showed no sign of reaching an asymptote at any of the sites (Fig 6).

Discussion
Comprehensive assessments of the insect species present at even a single locale have, until now, been impossible. For example, the analysis of 129,494 insects from a single 0.5 hectare site in Panama required contributions from 102 taxonomists over an 8-year interval to place Chi-square P < 0.01 P < 0.01 P < 0.01 (P >1 / P > 1) P < 0.01 (P >1 / P > 1) D = Simpson's diversity index (Simpson 1949); H = Shannon's diversity index (Shannon and Weaver 1948).
https://doi.org/10.1371/journal.pone.0199965.t002 them to 6,144 putative species [47]. Despite this massive effort, just 23% of the specimens gained a formal species assignment, and nearly half of all specimens had to be excluded from the identification effort because they belonged to groups lacking taxonomic specialists. Viewed from this perspective, the capacity of a single Malaise trap to gather 30,000-50,000 specimens per year represents an overwhelming challenge for any identification effort employing morphological approaches. The present study overcame this barrier by using DNA barcoding and the BIN system to assign each specimen to a putative species. Although prior studies have demonstrated the power of this approach for work on well-known faunas [19,48,49], the present study breaks new ground by employing this method to explore insect diversity in a region with limited biodiversity knowledge. This study examined 53,000 specimens derived from single traps deployed for 17-39 weeks at sites in Egypt, Pakistan, and Saudi Arabia. As these deployments totaled 93 weeks, trap catches averaged 570 specimens a week. As reported in prior studies [19,48,50], Diptera and Hymenoptera dominated the catch. BIN compliant sequences were recovered from 84% of the specimens, a value similar to those reported for specimens from Malaise traps deployed in Canada [19] and Europe [48]. Recovery did vary among insect orders, being higher for Diptera and Lepidoptera than for Coleoptera, Hemiptera and Hymenoptera, a pattern documented in prior studies that likely reflects primer mismatches [19]. Targeting shorter sequence fragments or employing HTS approaches [51] may alleviate the issues related to primer mismatches.
The abundance and diversity of collected insects varied over the collection period. The pattern of bigger and diverse catches in earlier than the later months of the year coincided with a rise in temperature from March to June and a fall from September to December. Weather is known to influence both spatial and temporal patterns of insect communities [52]. It is also known that insect emergence is driven by temperature that also affects their development, survival and abundance [53].
Analysis indicated that the 44,000 specimens which generated sequences belonged to 3,682 BINs with representatives of 254 families and 17 insect orders. These samples could not have been processed morphologically because such a large fraction of the Saharo-Arabian insect fauna is undescribed. For example, 20% of the species encountered in a recent survey of the United Arab Emirates were new, even though the groups analyzed were among those with the best taxonomy [27]. Because of this taxonomic barrier, the species present in Malaise trap samples from the Saharo-Arabian region have never been comprehensively assessed. Despite this lack of directly comparable data, the present study has shown that the barcode analysis of specimens from brief sampling of a few sites recovered half as many insect species as reported from all prior studies in these nations. The present results further indicated that more sampling is required to ascertain the number of species in each nation and the extent of overlap in their faunas. Certainly, the BIN overlap values (2-7%) reported in this study are underestimates of actual faunal overlap because of under-sampling. Because measures of endemism [54] play such an important role in conservation planning [55,56], further surveys of insect diversity in the Saharo-Arabian region are needed. These surveys can potentially employ several other methods of passive collection such as emergence traps, light traps, pitfall traps and coloured pan traps. The lower level of sequence recovery observed in samples from one of the study locations could be related to poor preservation or storage condition of specimens [46]. The issue of deteriorating DNA quality in preserved samples may be resolved by using better preservatives, such as 95% ethanol [57]. While the sampling effort required to properly estimate overall diversity and overlap values cannot be determined without further sampling, the analysis of a Malaise trap catch from each ecoregion within these nations would represent an important first step.
This present study has demonstrated how the BIN system can circumvent the barriers imposed by limited access to taxonomic specialists and by the fact that many insect species in the Saharo-Arabian region are undescribed. As such, it demonstrates how biodiversity analysis can be accelerated in regions that have seen little exploration [58]. Because it remains critical to extend the DNA barcode reference library through specimen-based analysis [59], it is important that new sequencing platforms are leading to substantial reductions in analytical costs [51]. Moreover, once libraries are well parameterized, metabarcoding studies [50,60] will permit detailed tracking of biodiversity trajectories at scales that would otherwise be impossible.