Characterization of ecto- and endoparasite communities of wild Mediterranean teleosts by a metabarcoding approach

Next‐generation sequencing methods are increasingly used to identify eukaryotic, unicellular and multicellular symbiont communities within hosts. In this study, we analyzed the non-specific reads obtained during a metabarcoding survey of the bacterial communities associated to three different tissues collected from 13 wild Mediterranean teleost fish species. In total, 30 eukaryotic genera were identified as putative parasites of teleosts, associated to skin mucus, gills mucus and intestine: 2 ascomycetes, 4 arthropods, 2 cnidarians, 7 nematodes, 10 platyhelminthes, 4 apicomplexans, 1 ciliate as well as one order in dinoflagellates (Syndiniales). These results highlighted that (1) the metabarcoding approach was able to uncover a large spectrum of symbiotic organisms associated to the fish species studied, (2) symbionts not yet identified in several teleost species were putatively present, (3) the parasitic diversity differed markedly across host species and (4) in most cases, the distribution of known parasitic genera within tissues is in accordance with the literature. The current work illustrates the large insights that can be gained by making maximum use of data from a metabarcoding approach.


Introduction
Parasites are extremely diverse and omnipresent in all environments, making this lifestyle one of the most successful on Earth [1]. Parasites are usually classified as ectoparasites or endoparasites, with or without direct contact with the external environment respectively [2]. Parasites also differ in their life cycle: some parasites require only one host to complete their life cycle (direct life cycle) while others need intermediate hosts (indirect life cycle), in which they change their morphology and biology (mobility, reproduction, . . .) [2]. Fish are well known to be parasitized by many eukaryotic organisms, unicellular or multicellular [3][4][5][6]. Fish gills and a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 be a cheap way to gain at least some knowledge about the hidden diversity of eukaryotic parasites.

Ethics statement
The Observatoire Océanologique de Banyuls-sur-Mer holds the authorization for fishing and housing wild Mediterranean teleosts (Decision n˚100/2019, Direction Interrégionale de la Mer Méditerranée). Wild fish were caught (see below for details) by a competent person and in accordance with the European Union Regulations concerning the protection and welfare of experimental animals (European directive 91/492/CCE).

Fish sampling
Thirteen teleost fish species were collected in June 2017 (42˚29'4.618"N, 3˚8',35.39"E) and in October 2017 (42˚29'15.073"N, 3˚7',49.688"E) in the Bay of Banyuls-sur-Mer (northwest Mediterranean, France) ( Table 1). Between two and five teleost specimens were collected for each species (Table 1). A gill fishing net was placed overnight in Banyuls bay between 0 and 6 m deep. About 6 hours later, fish were collected dead on the net, handled with gloves and put into individual plastic bags right after collection from the net. They were immediately brought from the vessel to the laboratory for dissection. Three different tissues were collected per fish individual and put into sterile tubes: skin mucus, gill mucus (with two gill arches) were collected with a sterile spatula and scissors, and for species that did not have mucus on their skin, we collected a portion of skin close to the lateral line of the fish (around 3 cm 2 ) by using ethanol-rinsed scissors and tweezers; a fragment (between 3 and 5 cm long) of the posterior region of the intestine was also sampled (depending on the size of fish specimens). Samples were frozen at -80˚C until DNA extraction. As mentioned before, aquatic organisms are constantly in contact with the surrounding water. In skin mucus, gill mucus and intestine, not only symbionts and DNA from hosts were recovered but also what is present in the surrounding environment (food and free-living organisms) and most likely traces of DNA leaved by all organisms, reflecting their current or past presence (i.e. environmental DNA) [34]. In order to verify that our sampling techniques were appropriate for the identification of parasitic taxa specifically associated to fish species (i.e. not present in the surrounding water), seawater was collected using a sterile container at each sampling date next to the gill net fishing to act as negative controls. Two liters of seawater were filtered onto a 0.2 μm nitrocellulose filter (Pall Corporation, U.S.A). Filters of each sampling date were frozen at -80˚C until DNA extraction.

DNA extraction and amplification
DNA was extracted by using the Quick-DNA Fecal/Soil Microbe MiniPrep Kit (Zymo Research, Orange, California) following manufacturer's instructions. Samples were frozen at -80˚C. PCR amplification was carried out in triplicate and performed using primers targeting the hypervariable V4-V5 region of the 16S rRNA gene: 515F-Y (5'-GTGYCAGCMGCCGCGGT AA) and 926R (5'-CCGYCAATTYMTTTRAGTTT) [43]. The PCR mix contained 1X KAPA 2G Fast Ready Mix (Sigma-Aldrich, France), 0.2 μl of each primer (concentration of 0.2 μM), 3.6 μl of ultrapure water and 1 μl of DNA in a final volume of 10 μl. After 3 min of initial denaturation at 95˚C, the following conditions were applied: 22 cycles of 95˚C for 45s (denaturation), 50˚C for 45s (annealing) and 68˚C for 90s (extension), with a final extension at 68˚C for 5 min. For each sample, three PCRs were performed in the same conditions, to increase the DNA quantity, but also to avoid bias due to each reaction. Then, the product of each PCR was run on a 1% agarose gel at 100V for 20 minutes in an electrophoresis chamber (BIO-RAD) to visualize the presence of high molecular weight DNA. The visualization was carried out in a GelMaxTM photodocumenter (UVP 1 ). When the DNA was visible in the gel, amplifications from the same sample were pooled. Individual barcode sequences were added to each mix during a second PCR. The second PCR mix contained 1X KAPA 2G Fast Ready Mix (Sigma-Aldrich, France), 0.5 μl of each barcode (Nextera Index Sequences in http://seq.liai.org/204-2/ ), 10.5 μl ultrapure water and 1 μl of DNA for a final volume of 25 μl. PCR conditions were as follows: initial denaturation at 98˚C for 30s, 8 cycles of 98˚C for 10s, 60˚C for 20s, 72˚C for 30s and a final extension at 72˚C for 2 min. Incubation (37˚C for 30min, 85˚C for 15 min) with USB ExoSAP-IT PCR Product Cleanup (Thermofisher, France) was then performed to degrade unincorporated primers. All PCR products were normalized with a 96 well SequalPrep Normalization Plate (Thermofisher, France) and the normalized amplicons were concentrated by using the Wizard SV Gel and PCR Clean up Kit (Promega, France) (final concentration around 8 ng/μl). Finally, the concentration of each DNA sample was measured using the Quant-iT TM PicoGreen (Thermofisher, France) and amplicons were sequenced using Illumina 2×250 MiSeq sequencing (FASTERIS SA, Switzerland). Samples were run in two sessions and divided randomly. All the categories (fish species and body parts) were represented in both runs.

Sequence analyses
Sequence analysis was realized using an in-house pipeline based on Needham et al., 2018 [44] and scripts in both Usearch9 [45] and Qiime V. 1.9.1 [46]. A shell script 18Sclean.sh (see S1 Text) was used to pre-treat demultiplexed forward and reverse reads. Basically, as Needham and collaborators, we used the strategy of using non-overlapping reads to identify eukaryotic sequences among sequences amplified with the Parada and collaborator's primers [43]. Unassembled reads, identified using PEAR v.0.9.6 [47], were quality trimmed using the fastq_filter option of usearch9. Sequences in fasta format with both reads for each sample were merged with an "N" between the forward and the reverse-complemented reads. Sequences names were modified to include a sample label. Samples from all reads were merged in a single fasta file. At this stage, primer sequences were removed and the reads were dereplicated. The number of singletons was determined and they were denoised using usearch9 -unoise with minsize = 1. Since denoising of concatenated sequences appear to yield a large proportion of singletons, denoised sequences were clustered in Operational Taxonomic Units (OTUs) using usearch9 -cluster_otus using a stringent radius of 0.75 and minsize = 1. OTU tables were generated by usearch9 -usearch_global using -id = 0.97.
All OTUs were identified using QIIME's assign_taxonomy.py -m rdp and the "SILVA_132_ QIIME_release/taxonomy/18S_only/99/majority_taxonomy_7_levels.txt" file distributed by the SILVA project [48][49]. Finally, this OTU table was filtered to remove fish ("D_12__Teleostei") OTUs. Some sequences were assigned to taxonomic groups but not to a specific genus due to the poor representation of eukaryotic symbionts within the SILVA database [50-51].
To alleviate this problem, we used the program Basic Local Alignment Search Tool (BLAST, [52]) to assign them to a described genus whenever possible. Sequences with more than 98% similarity to a genus were kept and assigned. If a sequence matched with two or more genera, we used the OTU classification to choose the appropriate genus (with at least 98% similarity). In the absence of support from the OTU classification, or in the case of less than 98% similarity, the sequences were not assigned more precisely and not used in the analysis. Finally, all sequences unassigned to any taxonomic group (only 0.4% of all reads) were excluded. The remaining OTUs were used to assess the total eukaryotic diversity.

Results
A total of 146 samples were collected from 59 fish specimens in the Bay of Banyuls-sur-Mer ( Table 1). The species Scorpaena notata did not have any mucus on its skin, so we collected a portion of the skin. The skin was treated as skin mucus for further analyses. A total of 3,064,621 sequences of minimum 400 bp length were obtained for the 146 samples. Around 76.2% of these reads corresponded to Bacteria and Archaea, whereas the rest of the sequences were assigned as eukaryotes. We investigated 44,258 sequences in total (representing 6.1% of eukaryotic reads) as potentially fish-associated symbionts (mutualistic, parasitic and commensal eukaryotic organisms; 93.8% of eukaryotic reads are fish sequences). The mean number of reads of potential symbionts per sample is 303.2 (standard deviation = 591.9).
For each species and for each tissue, we computed the mean value of the proportions in the samples. All sequences assigned to fish were removed. Among Eukaryota, representatives of clades Opisthokonta and SAR (Stramenopiles, Alveolates and Rhizaria) were detected in all tissues (skin mucus, gills mucus and intestine) of all the fish species. They were the most abundant groups, representing 88.16 to 100% of eukaryote-affiliated reads (Fig 1). The opisthokonts are a broad group of eukaryotes, composed of fungi, Ichthyosporea and metazoans. All of these three groups are represented in our analyses, but metazoans are ubiquitous and the most abundant in most samples (Fig 1). Within the SAR group, alveolates were detected in all tissues. The supergroups Stramenopiles, Rhizaria, Fungi and Ichthyosporea were mainly present on the external surfaces of fish (13.95% and 5.06% on average for skin mucus and gill mucus respectively compared to 1.53% for intestine) (Fig 1). The four other supergroups, Haptophyta, Excavata, Archeaplastidia and Amoebozoa together accounted for < 10% of the sequences in skin mucus and < 9% of the sequences in gill mucus. They were mostly absent from the intestinal samples, apart from supergroup Archaeplastida retrieved from Salpa salpa, which represented 6.1% of eukaryotic sequences from this fish species ( Fig 1C).
As this study focuses on parasitic communities of teleost fish species, sequences belonging to non-parasitic organisms were removed from the analysis. Therefore, we have eliminated most of photosynthetic organisms (i.e. green algae, red algae and green plants (Archaeplastida), some diatoms and dinoflagellates) [53], common dietary items and free-living  Table 1 organisms. Parasites likely coming from organisms eaten by fish were also discarded (such as from, very likely, plant or arthropod hosts). In order to determine parasitic genera associated to fish, bibliographic searches were carried out using PubMed, Google Scholar and google searches. For each parasitic genus, we used keywords: "parasite", "fish" / "fish species", "name of the parasitic genus". Finally, only parasitic genera that were present at least in two samples (among the 146 collected) and with a minimum of 10 reads were retained for further examination (S1 Table).
Despite the fact that no genus could be determined within the order Syndiniales in our study, this order was also retained because some species are known to parasitize fish [106].
The structure of the parasitic communities was further investigated for each fish species (Figs 2 and 3). For greater clarity, only parasitic taxa representing at least 0.5% of eukaryote reads (obtained for each species) were represented (Figs 2 and 3, Table 2). All fish species were associated to several parasitic genera but their distribution greatly differed among host species (Figs 2 and 3). The genera Aspergillus (Ascomycota), Eimeria (Apicomplexa) and Goussia (Apicomplexa), as well as the order Syndiniales (Dinoflagellata) were ubiquitous in all the analyzed teleost families (Figs 2 and 3, Table 2). However, some taxa seemed to be more group specific to some hosts. For example, sequences from Chondracanthus (Copepoda) were detected only on Serranus scriba (51.1% of putative parasitic sequences identified on Serranus scriba, Fig 3E). Similarly, sequences from Lepeophtheirus (Copepoda) and Polylabris (Platyhelminthes, Monogenea) were only found in Symphodus tinca (6.4%, Fig 3F) and Diplodus annularis (5.1%, Fig 2A) respectively. Moreover, 1625 reads (6.7% of potential sequences from parasites) belonging to the genera Taeniacanthus (Copepoda), Lamellodiscus, Microcotyle, Polylabris (Monogenea) and Opisthorchis (Digenea) were only detected within the fish family Sparidae (Fig 2). No parasitic genus was found specifically associated to the Gobiidae family.

Discussion
In this study, we performed a detailed analysis of eukaryotic sequences that were recovered from a metabarcoding study that was initially targeting bacteria associated to different tissues of teleost fish. The approach indeed yielded around 44,258 eukaryotic sequences (around 1.4% of total reads) that potentially open a window to the diversity of parasites associated with fish species.

Benefits of a metabarcoding approach
It is not surprising that we obtained many eukaryotic sequences with this primer set, from a metabarcoding approach initially targeting bacterial DNA. Indeed, the primer set was optimized to be universal for Bacteria and Archaea, but it has been also used to study the eukaryotic diversity, since in silico tests by Parada et al. 2016 [43] reportedly showed 0 mismatches to 86% of all eukaryotic sequences in the SILVA Database SSU r123 database. We repeated the analysis with 75,000 eukaryotic sequences present in the SILVA_132_SSURef_NR99_13_12_ 17_opt.arb tree of the Silva project [48-49] using the probe_match and the match function of arb_edit. The results show that only about 2200 sequences (c.a. 3%) had mismatches to the  3 Gobius bucchichi, 4 Gobius cruentatus, 5 Gobius niger, 6 Oblada melanura, 7 Pagellus bogaraveo, 8 Pagellus erythrinus, 9 Sarpa salpa, 10 Scorpaena notata, 11 Serranus scriba, 12 Spicara maena, 13  forward, and 4000 (c.a. 5%) to the reverse primer, with some groups having mismatches with both. Those mismatches were not against very broad groups; many mismatches were of a weak character (G-T mismatches) and in cases where a mismatch was to a broader group, these groups were not known to be marine, or associated with fish and thus we do not have any evidence that those primers would be particularly biased against a specific fish parasite group and invalidate our results. Moreover, the same primers have been also recently used to study symbiotic relationships including those of eukaryotes [44]. This approach gave us the opportunity to characterize, in addition to bacterial communities, the diversity of parasites associated with different fish tissues. These results highlight the fact that there are probably data on eukaryotic parasites in bacterial metabarcoding projects that are as yet unexploited.
Our results reveal diverse eukaryotic communities within several teleosts fish species that are far more taxonomically complex than anticipated by the current literature. This indicates that a metabarcoding approach allows for a faster and more complete identification of all parasites than traditional methods in parasitology. In the field of parasitology, description and identification of parasitic eukaryotic species are still dominated by morphological analyses [11]. Moreover, these studies generally focus on one or a few parasitic species of a specific phylum (e.g. arthropods, nematodes, copepods) [58, 65,81,87,98]. In the past few years, gene metabarcoding approaches have been booming in the field of parasitology [37][38]. However, recent studies using this approach have focused mainly on characterizing a single compartment of Characterization of parasite communities of wild Mediterranean teleosts by a metabarcoding approach biodiversity, including the nemabiome (or more broadly the helminthic community) of terrestrial species [39,[107][108][109]. To our knowledge, this is the one of the first study to attempt to identify, without a priori, all parasitic eukaryotic taxa present on teleost fish species using a universal metabarcoding approach [110]. Even if this approach was initially intended to characterize the bacterial communities of some fish species, it has allowed us to detect parasitic species belonging to different phyla (Table 2) [29,35,41]. While inventories of parasitic eukaryotic species are generally limited to the study of macro-organisms or organisms visible with a stereomicroscope, the metabarcoding approach also provides access to a broader fraction of the parasitic diversity, including microorganisms, intracellular and intra-organellar parasites, and those that are difficult to identify using morphological criteria [29,39,111]. However, the metabarcoding approach applied to parasitic communities is still subject to limitations and some aspects need to be improved [39,108]. First, this approach allowed us to identify 24274 reads originating from potential parasites of teleosts among 146 samples. Some parasite genera have a large number of reads (Acanthocheilus, Contracaecum, . . .) while others have only a few dozen, such as Aspergillus or Cladosporium (Ascomycota). However, none of the parasitic genera identified in fish tissues were found in water samples. It reinforces the claim that there was no contamination among our samples and that the reads we identified do not correspond to eDNA [34] and are well associated with the different fish tissues. Even if this approach provides reliable information on the presence of parasitic species on a host organism [39,35,41], the presence of a species with a small number of reads should be taken cautiously. In addition, none of these data should be considered quantitative [112]. Parasitic load is defined as the number of parasites in a host individual and it can be measured directly by dissecting organisms to count the number of adult parasites of each species [113] or indirectly by quantifying the number of parasitic eggs in faeces, but this number is not necessarily related to the number of parasitic individuals in the host [39,114]. Moreover, in high-throughput sequencing technologies, many technical factors (DNA extraction, PCR primers suspected of amplifying the DNA of some species at the expense of others, and sequencing technique) but also biological factors (such as the amount of DNA that varies according to the species, size and stage of the individuals) can influence the number of times a sequence is observed [108,115]. Metabarcoding can assess a larger fraction of the biodiversity but abundances obtained from sequencing cannot be interpreted as with morphology-based methods, since it does not reflect quantitative data [13,112]. As is already the case in the microbial domain, methods must be developed to obtain and interpret quantitative data from high throughput sequencing of eukaryotic symbiotic communities [109,116]. In addition, high throughput sequencing may not be used to identify all species [35,41,107]. Indeed, reference databases for taxonomic assignment are mostly incomplete [50][51]117]. In this study, we used the SILVA 132 database, but despite regular updates [118], some sequences could not be assigned and, because they were in very low numbers, they were not included in our analysis. In the coming years, more taxa will be added to databases, allowing the scientific community to use the full potential of metabarcoding approaches to characterize broader symbiotic communities [107,[119][120].

Distribution of potentially parasitic genera
In this study, 30 genera (distributed among 7 phyla) were identified as potential parasites of skin mucus, gills mucus and intestine of 13 wild Mediterranean teleost fish species. Their distribution greatly differed among host species: 6 to 17 parasitic taxa were identified depending on the fish species (Table 2). In total, based on the 30 parasitic genera found, we listed 137 fish-parasite associations ( Table 2). To our knowledge and according to the literature, 89 of these associations were new to the field of parasitology (see S2 Text). In most cases, the distribution of the different parasitic genera in the different fish tissues (skin mucus, gills mucus and intestine) is consistent with what has been described in the literature, for fungi, copepods, cnidarians, monogeneans, ciliates and most of the Apicomplexa. Some differences were nevertheless found for some parasitic taxa, in particular for nematodes and Digenea. As expected, the distribution of all monogenean genera in skin mucus and gills mucus is consistent with their life cycle [8,14]. Monogeneans are hermaphroditic and predominantly oviparous (including all genera identified here). The adults release eggs, from which ciliated larvae (oncomiracidium) hatch and swim freely. These larvae are attracted by fish mucus and colonize their skin [121][122][123]. Oncomiracidia then loose their ciliature and migrate from the skin to the gills of the host [122,[124][125]. It is therefore not surprising to find monogeneans on both gills (adult organisms) and skin (larvae) of their sparid hosts.
Within the class Digenea, three genera, Lecithochirium, Diphtherostomum and Cardiocephaloides, differed slightly from previous studies. Indeed, Cardiocephaloides had only been identified previously in skin and gill mucus [84-85] and the two other genera were reported only in organs of the digestive system and it is not the case in our study. Several hypotheses can be formulated to explain this distribution. The digenean life cycle is complex and composed of different stages, typically with two intermediate hosts. When eggs hatch (released from adult worms present on the definitive host), they release free-swimming larvae (miracidia) that detect a first intermediate host and penetrate its tissues and skin [126][127]. Different life stages then develop in this host: sporocysts and/or rediae, which in turn release the cercarial stage, a free-swimming organism which infect the second intermediate host. It is therefore not surprising to find digenean parasites on the surface's tissue of their hosts (skin and gills). Alternatively, the internal anatomy of teleost fishes could also have been a factor, particularly the position of the intestine which begins at the posterior edge of the gills and ends at the anus [128]. Due to its location, the intestine may have contaminated other organs, including the gills and skin.
While all parasitic genera belonging to the phylum Nematoda should be identified in or on the internal organs of their fish hosts, such as the intestine, some genera were identified on the skin mucus (Acanthocheilus, Dichelyne, Cystidicola, Hysterothylacium and Aonchotheca) and also on gills (Contracaecum, Cucullanus, Cystidicola and Aonchotheca). Among parasitic nematodes, some have an indirect life cycle, which means that they need several successive hosts to complete their life cycle and reach the adult stage [70,76,129]. This is the case for seven genera of nematodes found in the present study, identified on the gills or skin of fish (Table 3): the intermediate host is usually a crustacean (isopods, amphipods, copepods, decapods) and the final host is a teleost [69,76,[130][131][132][133][134]. Sequences identified on the skin mucus or on gills may then correspond to larval stages or eggs of nematodes, contained in crustaceans that could have been on the fish's surfaces.

Conclusion
In this study, we analyzed the eukaryotic sequences of a universal gene metabarcoding approach, initially targeting bacterial DNA, in order to identify potential endo-and ectoparasitic eukaryotic species associated with several wild Mediterranean teleost fish species. We illustrated the strong potential of this approach to reveal and identify an unexpectedly large spectrum of eukaryotic parasites within a host and therefore significantly increase our knowledge about the distribution of these organisms. The results obtained are consistent, for the most, with the literature, which highlights that data sets from metabarcoding approaches should therefore be used to the maximum, even beyond what they were originally intended for, as they can give us an overview of other communities in the studied environment. Thanks to this approach, we were able to detect a total of 30 parasitic genera distributed among 7 different phyla. This study also highlighted that each host has a different parasitic diversity. Despite its limitations, the metabarcoding approach provides a powerful, fast and readily available toolbox that could be applied for parasitological surveys. In general, DNA metabarcoding has an enormous potential to increase data acquisition in biodiversity surveys and therefore deliver key information to address many fundamental and applied research questions in ecology.