DNA in a bottle—Rapid metabarcoding survey for early alerts of invasive species in ports

Biota monitoring in ports is increasingly needed for biosecurity reasons and safeguarding marine biodiversity from biological invasion. Present and future international biosecurity directives can be accomplished only if the biota acquired by maritime traffic in ports is controlled. Methodologies for biota inventory are diverse and now rely principally on extensive and labor-intensive sampling along with taxonomic identification by experts. In this study, we employed an extremely simplified environmental DNA (eDNA) sampling methodology from only three 1-L bottles of water per port, followed by metabarcoding (high-throughput sequencing and DNA-based species identification) using 18S rDNA and Cytochrome oxidase I as genetic barcodes. Eight Bay of Biscay ports with available inventory of fouling invertebrates were employed as a case study. Despite minimal sampling efforts, three invasive invertebrates were detected: the barnacle Austrominius modestus, the tubeworm Ficopomatus enigmaticus and the polychaete Polydora triglanda. The same species have been previously found from visual and DNA barcoding (genetic identification of individuals) surveys in the same ports. The current costs of visual surveys, conventional DNA barcoding and this simplified metabarcoding protocol were compared. The results encourage the use of metabarcoding for early biosecurity alerts.


Introduction
Biosecurity issues derived from introduced biota are increasing concerns in the marine realm because precious marine biodiversity is at risk [1,2,3,4]. In addition, the introduction of nonindigenous species has economic consequences because it may affect seafood production. For example, the invasive species Crepidula fornicata almost destroyed the oyster farms in Brittany [5]. Ports and marinas are perhaps the keystones in maritime biosecurity [6,7,8,9,10]. They are hubs of maritime traffic where vessels of all the continents stop for days or months. Therefore, the accompanying biota may leave the vessel, settle down in the port and eventually depart for other areas on other ships. Ballast water [11], hull fouling [12,13] and even bilge water [14] are the main ship compartments where accompanying biota can survive. In a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Sampling and sampling locations
In a previous study, a DNA barcoding survey for intertidal fouling invertebrates was conducted in eight ports of different sizes and uses [9]. The considered ports were selected from West to East (Figueras, Luarca, Cudillero, Aviles, Gijon, Villaviciosa, Ribadesella and Llanes (Fig 1)) and are in the Asturias region (43˚20 0 N 6˚00 0 W) of the Cantabrian Sea coast (Bay of Biscay) in the northern Iberian Peninsula. Aviles and Gijon are commercial ports under national Spanish authority that receive large international cargo vessels and have adjacent fishing ports and marinas. The other six locations are fishing ports and the associated marinas are under Asturias regional authority, which oversees local maritime traffic, arrival of fishing catches (from national and international waters) and recreational boating [9].
In that study, artificial structures in three points within each port were sampled in a twoweek sampling period. The detailed description of the sampling methodology can be found in [9]. For eDNA analysis, water samples were taken close to the surfaces previously sampled for classic taxonomy and DNA barcoding studies. All water samples were collected from public use portions of these ports. Therefore, no specific permissions were required for collection. Only the upper layer of the water mass (30 cm deep in water as in other studies, e.g., [31]) was sampled to avoid diving or complicating the sampling procedure. Sterile plastic bottles and gloves for preventing contamination with researcher's DNA were used. In total, 24 1L water samples were taken from 3 sampling points in 8 ports, and the points were separated by approximately 200 m with one point near the port mouth, one in the inner section, and onehalf way between those two points.
Water samples were cooled while they were transported to the lab and since DNA extractions could not be done in the field, the samples were immediately frozen at -20˚C until DNA extractions was conducted as recommended [33]. The eDNA extractions were carried out 15 days later due to logistic and organization issues. Hinlo et al. [33] found no significant differences in eDNA yield recovery after freezing the water samples for four days and posterior DNA extractions using the PowerWater 1 DNA Isolation Kit (MOBIO Laboratories, USA) with the DNeasy-Freeze combination, which recovered the highest eDNA yield out of five different methods tested to preserve and extract eDNA. At the same time, after 10 days of refrigerated or frozen storage of the samples, there were no significant differences among them in terms of eDNA yield although both methods showed decreases in the eDNA yield that was recovered [33].

The eDNA extraction
After unfreezing the water samples at room temperature, they were filtered through a 0.2-μm Nuclepore™ membrane and DNA was extracted from the filters (1 filter by 3 L water sample). DNA extraction replicates could improve diversity estimates as well as the ability to separate samples with different characteristics [37]. In our study, we replicated samples (3) within the ports to increase the chances of detecting NIS. Total genomic DNA was extracted using the PowerWater 1 DNA Isolation Kit (MOBIO Laboratories, USA), which yields high quality DNA for DNA barcoding or meta-barcoding applications. The manufacturer's instructions were followed. DNA extractions were made in an exclusive sterile room in a different building. Moreover, DNA extractions were conducted using negative controls and on different days for samples using sterile technique inside a laminar air flow chamber continuously disinfected by UV light, absolute ethanol and 10% bleach solution cleaning to prevent contamination. The DNA samples from ports were quantified using the Picogreen method and Victor-3 fluorometry. DNA samples were finally analyzed using two different metabarcodes.

Polymerase chain reaction (PCR), massive sequencing and bioinformatics analyses
The PCR reactions were performed by Macrogen Korea using negative controls to monitor possible contamination as well as Roche FastStart™ High Fidelity Taq DNA Polymerase and the protocols described in the Amplicon Library Preparation Manual (Roche 2010; GS FLX Titanium Series). Geller et al. [38] primer pairs for the Cytochrome Oxidase I (COI) gene and Machida and Knowlton's [39] for the 18S rRNA gene (18S, designated primers #1 and #2_RC) were used. Thermocycling conditions were 1x: 94˚C for 3 min; 35x: 94˚C for 15 sec, 55˚C for 45 sec and 72˚C for 1 min; and finally, 1x: 72˚C for 8 min and 4˚C on hold. Library construction included quality controls for size (Agilent Technologies 2100 Bioanalyzer using a DNA 1000 chip) and quantity (Roche's Rapid library-standard quantification solution and calculator). The bands of expected sizes (800 bp in COI and 500 bp in 18S) were sequenced in the 1/8 plate GS-FLX run (Roche/454 Life Sciences, Branford, USA).
The multiplexed reads were assigned to samples while accounting for their nucleotide barcodes (demultiplexing). Zero base errors were allowed in this sorting by a tag step. CD-HIT-OTU [40] was used to filter out erroneous and chimeric reads by combining sequence clustering and statistical simulations. Quality filters based on the characteristics of each sequence were applied to remove short (<100 bp) and low-quality reads (<20 Phred values) as well as extra-long tails. Primer pairs were trimmed. Filtered reads were aligned and clustered at 100% identity using CD-HIT-DUP, and the chimeric reads were identified and eliminated from the duplicate clusters (CD-HIT-OTU User's Guide (http://weizhong-lab. ucsd.edu/cd-hit-otu)). Secondary clusters were then recruited into the primary clusters, and the remaining representative reads from the non-chimeric clusters were grouped into OTUs using a greedy algorithm with a 97% cut-off for 18S sequences (e.g., at a species level following the method of Stackebrandt and Goebel [41] for ribosomal sequences) and 98% for COI sequences (see Ratnasingham and Hebert [42,43] for useful discussions about this). This result was used to avoid false OTUs because of PCR errors, sequencing errors and other technical errors. The OTUs were then BLASTed against the NCBI database for the case of the COI reads with e-value threshold of 0.01, !97% sequence homology and >90% sequence coverage for accepting hits. The remaining 18S reads were aligned using UCLUST [44] in QIIME [45] and the SILVA database [46] was used for obtaining the OTU list.
The sequences of OTUs taxonomically assigned to genera of interest due to the occurrence of invasive species within a genus, and/or occurrence of species of a genus within the conventional sampling biota study [9], were extracted from the raw FASTA files and checked again against GenBank using conventional BLAST software. The identity, coverage and Evalue with the best match reference sequence were retrieved. We followed WORMS [47] for taxonomic names and classification. The references and retrieved OTUs that used alternative nomenclature were named after WORMS in this study.

Statistical analysis
The number of sequences (reads) of a multicellular species obtained from metabarcoding procedures is not proportional to the number of individuals of such species [30]. For this reason, metabarcoding data were scored as presence / absence for each OTU and measured as 1 / 0, respectively. PAST version 3 software [48] was used for obtaining Alpha-diversity measures used here: taxa-S (species richness or number of species, in this case OTUs), Margalef's richness index, and finally, Shannon-Weaver (H) (see Harper [49] for indices details). Global betadiversities were also estimated through Whittaker and Routledge indices (details in Koleff et al. [50]) using the same software. Several studies have found that metabarcoding accurately recovers alpha-diversity (species richness) and beta-diversity (species turnover) information, in addition to generating the same management recommendations as morphological biodiversity datasets (e.g., [37,51,52]).
For a comparison of OTU results (presence/absence from the total list of OTUs obtained for the two genes) among the genes and the ports, a Non-metric Multidimensional Scaling analysis (MDS) and ANOSIM analyses were conducted, using Euclidean distances and 9 999 bootstraps using PAST version 3 [48]. At the same time, Scatter and Shepard plots were constructed in PAST to visualize the relationships among port OTUs found from the two DNA barcodes. Stress and squared r for the two axes were also calculated.
Comparisons between diversity indices of fouling invertebrates in the region (considering all ports together) obtained from the conventional sampling + DNA barcoding method used in the Miralles et al. [9] study and this work (simplified sampling (water) + metabarcoding) was done using only presence-absence data, and the metabarcoding subset of fouling organisms for comparable results. Alpha diversity indices were compared using Diversity Permutation Tests. This module computes several diversity indices for two samples and then compares the diversities using random permutations. A total of 9999 random matrices with two columns (samples) are generated, each with the same row and column totals as in the original data matrix (see manual of the PAST software). The analyses mentioned above were completed using the free software PAST version 3 [48].

Cost estimates
Costs of barcoding and visual analysis of animal specimens have been previously estimated according to Spanish standards (i.e., [9,53]), and the present calculations are based on them. The costs of labor (proportional part of the salary for the time dedicated to different tasks) were estimated from Spanish official technician wages for the salaries (Resolution 2000 Boletín Oficial del Estado 49 of 26 of February of 2015) since the study was carried out in Spain. Sampling water from each port took no longer than 30 minutes (10 minutes per sampling point within the port), while sampling invertebrates from each port from conventional methodology required approximately 6 hours (2 hours per sampling point).
For consumables and external services, the real costs of barcodes in Miralles et al. [9] were 5€/individual sample (extraction kit + PCR products + external sequencing services). For the metabarcodes obtained in this work the cost was 194€/sample (extraction kit + external services of library preparation, sequencing and bioinformatics). The travels for sampling between the ports and the laboratory were logically the same whatever analytical methodology was employed, and the results were therefore excluded from the comparative estimations.

Results
The quantity of total DNA obtained from the 3-L water bottles obtained from each port ranged between 0.457 ng/μL and 5.552 ng/μL (Table 1). Amplicon Libraries after PCRs and posterior NGS analysis were conducted and data by samples are now accessible in Genebank BioProject IDs: SAMN07345428, SAMN07345429, SAMN07345430, SAMN07345431, SAMN07345432, SAMN07345433, SAMN07345434, SAMN07345435. NGS with COI primers provided a total of 164,563 reads and average length reads of 664.6 bps. The quality-check process removed sequences due to presence of short (34,997) and ambiguous sequences (6,726) as well as possible chimera and homopolymer appearances (2,732) and that filtering left 120,111 reads that passed the quality filters (mean length = 615.8 bp) from which a total of 24,945 OTUs were assigned down to a family, genus or species level (S1 Table). For 18S primers, the samples from the Cudillero and Villaviciosa ports failed to provide a reliable 18S amplicons library (Table 1). NGS analyses provided a total of 144,294 reads with a mean length of 405.1 bps from the eDNA of six ports. The quality-check process eliminated the following types of sequences: too short (48,685); ambiguous (2,034); chimeras/homopolymers (2,850). The quality-check process left 90,725 reads (mean length = 374.5 bps). A total of 8,490 OTU counts were assigned down to a family, genus or species level (S1 Table). Rarefaction curves for the two metabarcodes and samples are provided in S1 Fig.
The two DNA metabarcodes that were employed provided different taxonomic resolutions in this dataset ( Table 2); COI yielded more taxa on average (mean OTUs per port of 26.30, SD 9.45) than 18S rDNA (mean 18.17, SD 15.95). The non-metric Multidimensional Scaling had a stress of 0.057 and r 2 of axis 1 was 0.824 with the same value for axis 2 being 0.656 in the Shepard plot (Fig 2a). The COI metabarcodes of the analyzed ports were more similar to each other than 18S rDNA samples, and the samples grouped together in the MDS except for Ribadesella (Fig 2b) while the 18S rDNA results were more scattered, with some clear differences between Gijon and Llanes. In congruency with MDS analysis, alpha-diversities obtained in the eight ports for the two metabarcodes were quite different (ANOSIM p-value = 0.0006 after 9999 permutations) ( Table 2). Gijon and Llanes had the most diverse Metabarcode for COI and 18S rDNA respectively, while the least diverse Metabarcode corresponded to Llanes and Ribadesella respectively. Whittaker and Routledge's beta-diversities were 3.8081 and 0.4702, respectively. The species list obtained from each of the two metabarcodes as merged into a global taxa list. The differences between ports for the number of OTUs were marked because in Cudillero and Villaviciosa, only COI data were available. Taking into account only marine taxa, they ranged between 19 in Villaviciosa to 61 in Llanes (Table 3). Most taxa were plankton microalgae, such as diatoms and protozoans such as ciliates (Table 3). A few OTUs corresponded to vertebrates (fishes Albula, Clinostomus, Cyprinella, Dypturus, Oregonychthys; the Anatidae Chloephaga). Invertebrates, which are the main focus of this study about invasive species, were a minority in all the ports ranging between 4 in Luarca and Aviles to 12 in Eo (Table 3).
Compared to two different methods (conventional sampling + DNA barcoding and simplified sampling + metabarcoding), we found that in the 18S rDNA metabarcodes three genera with potential invasive species were shared with the dataset obtained from conventional sampling of fouling invertebrates in the same sampling locations [9], including the barnacle Austrominius (old genus name was Elminius), the tubeworm Ficopomatus and the annelid worm Polydora. They were found in Aviles (Austrominius) and Llanes (Ficopomatus and Polydora) ports ( Table 3). The sequences of these OTUs were retrieved from the metabarcoding FASTA files and compared online with the GenBank database using BLAST nucleotides. The sequences retrieved with the best match corresponded to the invasive species recorded from conventional sampling from the same ports by Miralles et al. [9], including Austrominius   (Table 4). These three species represented, on average, approximately 20.8% of the invertebrate species found in the two ports from metabarcoding (38 OTUs in total). The average percentage of individuals of these species in the same ports found through conventional barcoding in the Miralles et al. [9] study was similar (19.2%). The diversity of fouling invertebrates found in this study for all the ports together was indeed higher when measured from specific sampling of fouling biota + DNA barcoding (S2 Table; data of fouling biota sampling were taken from Miralles et al. [9]) compared to the simplified protocol (water) + metabarcoding employed here ( Table 5). The difference, however, was not statistically significant from the diversity permutation test p-values ( Table 5).
The costs of the two methods were estimated from Spanish official technician wages for the salaries (the study was carried out in Spain), in 8-h working days and the real costs from 2016 for barcodes and metabarcodes (Tables 6 and 7). The travels between the ports and the laboratory are the same and were excluded from the calculations. Sampling water from each port took no longer than 30 minutes (10 minutes per sampling point within the port), while sampling each port from conventional methodology needed approximately 6 hours (2 hour per sampling point). The total cost estimated for 671 barcodes was approximately 6,701 EU (~10 EU by sample) in the work by Miralles et al. [9]. Metabarcoding costs were split here into molecular analysis and bioinformatics, and they were 2,722.0 EU in total (for Table 3 Table 5. Alpha-diversities obtained at regional level (the eight ports together) for simplified sampling + metabarcoding (= metabarcoding; two metabarcodes combined) and for conventional sampling + barcoding analysis (= barcoding; calculated from Miralles et al. [9]). Permutation P values for the comparison of the regional diversity estimates using Diversity permutation test available in PAST version 3 (Perm p, 9 999 permutations). detecting 102 different taxonomical identities and 33,435 OTUS in 14 samples (8 using COI +6 samples using 18S rDNA as barcodes)). The total sum was higher for conventional sampling + barcoding. A rough approximation was conducted to estimate the cost-benefit efficiency of each method for finding exotic species. We have estimated the cost for the identification of 38 exotics specimens of the three exotic species found in the present study using three different methods. The methods included Visual (morphology-based identification by a specialized taxonomist), conventional sampling + DNA barcoding (as in Miralles et al. [9]) and simplified sampling (water) and metabarcoding (this study) used as references for previous cost estimations (i.e., [9,53]) (Tables 6 and 7). Visual identification required more time for sampling + analysis (620 min) than barcoding (494 min), and metabarcoding required less sampling effort and laboratory processing with a total of 75 min (Table 6). In contrast, consumables and sequencing analyses were more expensive for metabarcoding than for the two other methods (Table 7). Considering the time required for the analysis, consumables, and external sequencing (metabarcoding) the total cost estimates were 451.5 EU, 519.5 EU and 438 EU for visual, DNA barcoding and simplified sampling + metabarcoding (this work), respectively (Table 7).

Discussion
This study provides evidence of the utility of a very simple metabarcoding-based methodology for detecting marine exotic species, even if they are at very low densities, as was the case for Austrominius modestus in Aviles and Polydora triglanda in Llanes where only one individual of DNA in a bottle for early alerts of invasive species each species was found in the visual sampling [9]. In this study, we found three NIS (38 OTUs) from 3-L water samples collected only once at each port without resampling. The same species were found in the visual and barcoding surveys in the same ports [9]. The cost of metabarcoding did not significantly surpass the other method (classic sampling and DNA barcoding) ( Table 7), and the technical expertise required for the laboratory analysis carried out by the researchers was minimal. These facts support recommending the use of a metabarcoding approach for routine surveys in ports, but currently it would probably work best as an exploratory method for an early alert system. It is worth mentioning that the number of individuals cannot be fully determined using metabarcoding, which only counts DNA molecules. Despite this limitation, it seems this issue will not persist for long. Quantitative metabarcoding will likely become feasible in the near future [30,54,55]. Currently, the biodiversity based on individual counts cannot be properly estimated and only the presence of a species can be confirmed. For this reason, this technique is increasingly being employed for biodiversity inventories and should not be used as a way to compensate for the current decrease in the number of taxonomic experts [27,53,56]. The discipline of taxonomy is needed now more than ever now, especially for marine biota. DNA databases of references, such as GenBank and BOLD (Barcoding of Life Diversity) [42] rely on good complete taxonomic information for the voucher specimens. The absence of such information is a drawback of current barcoding projects and hampers the use of DNA-based methodologies (e.g., [21,57,58]). Although metabarcoding has been recommended for port surveys [59,60,61] some improvements are necessary. One improvement should be the use of different types of environmental samples, not only water. In our study, the water samples that were analyzed primarily contained plankton species, which is logical because the water was sampled from the sea surface. Surveys of potential biological invasions should also consider fouling biota [10]. Sediments should be sampled from port walls as well (both artificial surfaces and natural rocks) for detecting early adherence of fouling individuals [62]. Moreover, it seems that eDNA is better preserved in sediments than in water [63]. Another improvement would be to conduct more extensive sampling, including targeting more points within each port vase. Sampling at different depths would complete the port landscape and likely provide a representative view of the present biota [61,64].
Another important issue is the marker choice. In this particular study, the COI gene provided more OTU counts than 18S rDNA. however, most taxa detected from COI were planktonic microalgae and protozoans, while the 18S rDNA revealed the three invasive species. This result, however, cannot be extrapolated. Different studies have shown a greater utility of some Barcodes depending on the particular case study [10,29,65,66,67]. The complexity of marine communities would make it necessary to use two genes for a more complete view of diversity. Multiplexing allows for sequencing two genes simultaneously in the same run [68] and could be conduct in routine surveys at a cheaper cost and for faster results.

Conclusion
An extremely simplified eDNA sampling methodology based on only three 1-L bottles of water per port, followed by NGS metabarcoding using 18S rDNA and COI as genetic barcodes, in eight Bay of Biscay ports could detect three invasive invertebrates: the barnacle Austrominius modestus, the tubeworm Ficopomatus enigmaticus and the polychaete Polydora triglanda. The latter species occurred at very low density in visual inventory, despite minimal sampling efforts. The same species had been previously found in visual and DNA barcoding surveys in the same ports. Comparisons among the current costs of visual surveys, conventional barcoding and this simplified metabarcoding protocol indicate the use of metabarcoding for early biosecurity alerts would be beneficial.
Supporting information S1 Table. Taxa found in each port with 18S rDNA and Cytochrome Oxidase I metabarcodes. In red exotics species. (DOCX) S2 Table. Taxa