Unveiling the Diet of Elusive Rainforest Herbivores in Next Generation Sequencing Era? The Tapir as a Case Study

Characterizing the trophic relationships between large herbivores and the outstanding plant diversity in rainforest is a major challenge because of their elusiveness. This is crucial to understand the role of these herbivores in the functioning of the rainforest ecosystems. We tested a non-invasive approach based on the high-throughput sequencing of environmental samples using small plant plastid sequences (the trnL P6 loop) and ribosomal ITS1 primers, referred to as DNA metabarcoding, to investigate the diet of the largest neotropical herbivore, the lowland tapir. Sequencing was performed on plant DNA extracted from tapir faeces collected at the Nouragues station, a protected area of French Guiana. In spite of a limited sampling, our approach reliably provided information about the lowland tapir's diet at this site. Indeed, 95.1% and 74.4% of the plant families and genera identified thanks to the trnL P6 loop, respectively, matched with taxa already known to be consumed by tapirs. With this approach we were able to show that two families and eight new genera are also consumed by the lowland tapir. The taxonomic resolution of this method is limited to the plant family and genera. Complementary barcodes, such as a small portion of ITS1, can be used to efficiently narrow identifications down to the species in some problematic families. We will discuss the remaining limitations of this approach and how useful it is at this stage to unravel the diet of elusive rainforest herbivores and better understand their role as engineers of the ecosystem.


Introduction
Ongoing environmental threats such as forest fragmentation and unsustainable game hunting, in addition to the pervasive role of global changes, all lead to a decline of the large tropical rainforest fauna [1], [2]. Understanding the role of the large vertebrate herbivores in the functioning of the rainforest ecosystems is an important issue in ecology and conservation because these animals are thought to be important ecosystem engineers [3][4][5][6][7][8][9]. In spite of significant advances in the study of large vertebrate herbivores (e.g. [7], [10], [11]), the lack of qualitative and quantitative information on the relationships with the other organisms of their environment remains a major impediment to fully identifying their ecological role and to foresee the consequences of their local extinction.
Large rainforest herbivores are elusive animals, and the study of their foraging strategies must rely on invasive analyses or other indirect methods. These methods classically include browsing signs analysis (e.g. [12]), macroscopic and microscopic analysis of gut (e.g. [13], [14], [15]) and of faecal contents (e.g. [16], [17]). These techniques are often limited to few species and individuals and usually yield an incomplete quantification of the diet [17], [18].
Consequently, the range of plants impacted by large rainforest herbivores is still improperly known, as well for instance how herbivores select their food, how this food affects their spatial behaviour and resource partitioning [12], [19].
The scientific community is very excited about the potential uses of NGS to understand ecological processes. As with any novel technique, the expectations are high but there is limited evidence of how useful this tool really is. This study aims at contributing to our understanding of the real potential and limitations of DNA metabarcoding to understand diets in large tropical rainforest herbivores.
New target DNA fragments, such as the plastid trnL intron and its P6 loop, have been recently identified as useful DNA barcodes in case DNA is highly degraded after the passage through the digestive tract of herbivores [34]. Indeed, the P6 loop marker has universal primers in flowering plants, a short size of the target fragment (typically less than 100 bp) and high inter-specific variation in size and sequence [28]. If some a priori knowledge of the herbivore's diet is available, plant group-specific primers may also be used to refine the resolution of the analysis [20], [31], [33], [35]. Moreover, the recent development of high throughput sequencing technology [20] now allows the production of thousands of short-length sequences (e.g. [21], [25]). Combining short-length DNA barcodes coupled with next-generation sequencing of environmental samples has been termed "DNA metabarcoding" [36], [37].
We applied the method on the largest neotropical herbivore, the lowland tapir (Tapirus terrestris Linnaeus, 1758). Quite recently, our knowledge on the tapir's diet has been significantly extended for both the fruit and foliar or "browse" diet [17], [18]. This provides a solid knowledge basis for comparison with metabarcoding. Field work was performed at the Nouragues Ecological Research Station, in French Guiana, where a local reference database of plant species DNA barcodes has been developed [38]. Hibert et al. [18] showed that it is feasible to use a classical DNA barcoding (using the plastid DNA trnH-psbA region, of 500 bp on average) of faecal material to identify items of the tapirs' diet. However, the test was conducted on a limited number of individual plant fibre samples and the molecular approach was far from providing as much information on the diet as the combination of the other classical approaches used.
In the present contribution, we report results based on DNA metabarcoding of tapir faeces, in the hope to yield a better coverage of the diversity of the tapirs' diet by targeting many more DNA sequences extracted from mixed material (and not only preselected individual plant fibre samples). The tapirs' diet is known to seasonally vary in relation to fruit availability [14], [17]. We tested whether DNA metabarcoding yields a diet variation between seasons. We finally discuss the consistency of the plant taxa identified through metabarcoding with those indicated by alternative approaches and the remaining limitations of this approach to the understanding of the diet of tropical rainforest herbivores.

Study site
The tapir faeces were collected in the Nouragues Ecological Research Station (4u05'N, 52u40'W), in French Guiana. This station is included in a 1000 km 2 protected area, the Nouragues Natural Reserve, 25-430 m above sea level, and is characteristic of the primary rainforest of Northeastern South America ( [39], www.nouragues.cnrs.fr). The local flora includes over 1700 angiosperm species, a large proportion of which are endemic of the Guiana shield [40], [41]. Annual rainfall averages 2880 mm. A distinct dry season occurs from September to November (,100 mm per month), followed by a shorter drier period around March. Tree fruiting peaks in March-May and is minimal in August-September [42], [43].

Faecal samples collection and preparation
Since tapirs are known to preferentially defecate in water in French Guiana (Hibert pers. obs. in 4 captive tapirs, local interviews), we focused our research along the reserve streams in order to maximize the chance to encounter fresh tapir dung piles.
The encounter rate was not regular over time and in some months we collected more samples than others. In total we analysed samples from 39 tapir dung piles. For collection methods and stocking conditions, the reader is referred to the details provided in Hibert et al. [18]. All necessary permits were obtained for the described field studies. The research program and sampling collection were approved and validated by the Conseil Scientifique Régional du Patrimoine Naturel (CSRPN) and as well by the Conseil Scientifique of CNRS for the studies in the Research area of the Reserve. On average, dry dung piles weighed 178.8 g.
(quantiles: 0%: 15.4 g, 25%: 62.8 g, 50%: 180.0 g, 75%: 276.0 g, 100%: 353.9 g). To test the influence of collection and preliminary sorting , we prepared two samples of about 60 mg per dung pile (N = 39; 78 samples). The first sample, hereafter named the "sorted" sample, included only large homogeneous fragments (mainly parts of leaf limbs), while the second, hereafter named the "unsorted" sample excluded them (random smaller fragments). These samples were then separately dry ground with a Mixer Mill MM 200 (Retsh ß, Haan, Germany, frequency: 22 s-1 for 1 min) to prepare a homogenized product mix for further DNA extraction.

Molecular analyses
DNA extraction from faeces. Total DNA was extracted from each of the 60 mg faecal samples with the DNeasy Blood and Tissue Kit (QIAgen GmbH, Hilden, Germany), following the manufacturer's instructions. The DNA extracts were recovered in a total volume of 250 mL. Mock extractions without samples were systematically performed to monitor possible contaminations.
DNA amplification. Quality of the DNA extractions has been checked by electrophoresis on agarose gel (1.4%) before performing DNA amplification on a 100 x diluted DNA extract (4 mL in a final volume of 50 mL) to avoid high rates of PCR inhibitors. The amplification mixture contained 1 U of Ampli-TaqH Gold DNA Polymerase (Applied Biosystems, Foster City, CA), 10 mM Tris-HCl, 50 mM KCl, 2 mM of MgCl2, 0.2 mM of each dNTP, 0.1 mM of each primer, and 0.005 mg of bovine serum albumin (BSA, Roche Diagnostic, Basel, Switzerland). The mixture was denatured at 95uC for 10 min, followed by 45 cycles of 30 s at 95uC, and 30 s at 55uC; as the target sequences are usually shorter than 100 bp, the elongation step was removed. Samples were amplified using two primer-pairs ( Table 1). The first pair (g and h) amplifies the P6 loop region of the trnL (UAA) intron across flowering plants [34].
In order to increase the resolution of the analysis, and test the feasibility of a two-tiered approach using metabarcoding techniques, we also used another primer pair. This new primer-pair targets the first internal transcribed spacer (ITS1) of nuclear ribosomal DNA, for the Sapotaceae (ITS1-F and ITS1Sap-R). This plant family was chosen, since it is abundant in Amazonia, and its fruits are known to be consumed by tapirs [18]. All primers were modified by the addition of specific tags on the 59 end to allow the assignment of sequence reads to the relevant sample [24]. All the PCR products were tagged identically on both ends. These tags were composed of CC on the 5' end followed by nine variable nucleotides that were specific to each sample. The nine variable nucleotides were designed using the oligoTag program (www.prabi.grenoble.fr/trac/OBITools) with at least three differences among the tags, without homopolymers longer than two, and avoiding a C on the 5' end. All the PCR products from the different samples were first titrated using capillary electrophoresis (QIAxel, QIAgen GmbH, Hilden, Germany) and then mixed together, in equimolar concentrations, before the sequencing. DNA sequencing. The sequencing was carried out on an Illumina/Solexa Genome Analyzer IIx (Illumina, San Diego, California), using the Paired-End Cluster Generation Kit V4 and the Sequencing Kit V4 (Illumina, San Diego, California), following manufacturer's instructions. A total of 108 nucleotides were sequenced on each extremity of the DNA fragments.
Sequence analysis and taxon assignation. The sequence reads were analyzed using the OBITools suite of bioinformatics routines (www.prabi.grenoble.fr/trac/OBITools). First, the forward and reverse reads corresponding to a single molecule were aligned and merged using the solexaPairEnd program, taking into account quality data during the alignment and the consensus computation. Then, primers and tags were identified using the ngsfilter program. Only sequences with perfect match on tags and a maximum of two errors on primers were taken into account. The amplified regions, excluding primers and tags, were kept for further analysis. Strictly identical sequences were clustered using the obiuniq program, keeping the information about their distribution among samples. Sequences shorter than 10 bp for the P6 loop and 40 bp for the ITS region specific of the Sapotaceae, or containing nucleotides other than A, C, G and T, or with occurrence lower or equal to 10 were excluded using the obigrep program. Taxonomic assignation of the clusters was achieved using the ecoTag program [28].
EcoTag relies on an exact global alignment algorithm [44] to find highly similar sequences in the reference databases. These databases were built by extracting the P6 loop of the trnL intron or the ITS1 fragment from the European Molecular Biology Laboratory (EMBL) nucleotide library (release 107) using the ecoPCR program [45]. The P6 loop and the ITS1 databases contain 11,905 and 2596 unique sequences extracted from 54,239 and 5650 EMBL entries, respectively. EcoTag assigned a unique taxon to each unique sequence amplified from tapirs' faeces. This unique taxon corresponds to the last common ancestor node in the National Center for Biotechnology Information (NCBI) taxonomic tree of all the taxids annotating the sequences of the reference database that best matched on the whole length of the query sequence.
Dataset trimming. The sequencing produced 70552 unique P6 loop sequences longer than 10 bp and 15162 unique ITS sequences longer than 40 bp (Table 2). A number of these sequences could not be assigned to any known taxon. The P6 loop sequences that could not be identified up to the family level (41.9%) were less frequent than those identified at a finer taxonomic resolution ( Figure 1) and probably resulted from PCR artefacts.
To eliminate the artefact sequences, we automatically removed sequences that did not correspond to any plant sequences present in the EMBL database (homology , 0.95) except the sequences that had a perfect match with at least one of the DNA barcodes of the Nouragues databases that contain 158 species for the trnL intron and 75 species for the ITS1 region [38]. We also removed every sequence for which there was a very similar sequence (only one base different) but always in higher occurrence, since it was assumed to be an erroneous copy of the latter. Finally, we only kept for further analysis the P6 loop and ITS sequences with a total number of occurrences higher or equal to 480 or 59, respectively. This corresponds to 1/1000 or 1/100 of the most common sequences for the P6 loop and for the ITS1, respectively.

Statistical analyses
Importance of preconditioning the samples. To test whether the pairs of samples from same dung piles had more similar compositions than random samples from any dung piles, we first calculated the proportional sequence turnover bP [46] between pairs of samples using the presence-absence table of the P6 loop sequences in the dung pile samples. bP is calculated as the total sequence diversity (gamma) minus the mean sequence diversity (alpha) divided by gamma [46]. Then, we used a Monte-Carlo procedure with 1000 simulations to compare the mean genetic distance between samples from same dung piles to the distribution of mean genetic distances calculated between a same number of samples from random dung piles. We also tested whether the sorted samples (fewer but larger homogeneous fragments) yielded less diverse sequences than the unsorted samples.
Diet richness and sampling effort assessment. We used accumulation models to assess the completeness of the inventory method relative to the sampling effort invested, in terms of number of dung piles sampled, and to estimate the richness of the tapirs' diet in the study area. The analysis was performed with the unsorted samples because they were more diverse (see Results). The richness of the diet was estimated in terms of number of both P6 loop sequences and identified families. We generated rarefaction curves using 500 Mao Tau randomizations [47] with EstimateS Version 8.2 [48]. Total richness was estimated by the Chao2 estimator [49].
Seasonal variation in the diet composition. We measured the relative dependency of the diet composition patterns with season by running a Between-Class Analyses (BCA) [50] (1) on the presence-absence table of all the P6 loop sequences and (2) on the presence-absence table of the sequences of the identified genus in the dung piles using the months of sample collection as the grouping classes. Both BCAs yielded a between-months inertia percentage. We further tested these ratios with 9999 Monte-Carlo randomizations.
Telemetry studies [51] and field observations of presence indices (Hibert pers. obs) suggest that tapirs may spend a few days in some part of their home-range before moving to exploit another part. In the study area, the vegetation structure and nature varies spatially at the landscape scale [52]. If tapirs do not travel much throughout their whole homerange within the digestion time, the composition of their dung may reflect to some extent the spatial variation in the availability of consumed plants. We used a Mantel test based on Pearson's product-moment correlation between the composition distance matrix and the matrix of Euclidian spatial distance between the sampled dung piles. Statistical significance was assessed by a Monte-Carlo permutation test with 9999 simulations. The composition distance matrix between samples was computed with the bP measure applied on the presence-absence table of the P6 loop sequences in the dung piles.
The seasonal variations of the occurrences of the P6 loop sequences identified at the genus level were then analysed in regard to the mean monthly abundance of macroscopic fruit residuals of the same genus found in the same dung piles [18]. The coincidence of the sequence abundance peaks in the faeces with the fructification period was tested with a Pearson's correlation test after a log+1 transformation of the data.
The statistical analyses were performed with the R software, using the ade4 package [53]. The Mantel test was calculated with the function mantel.rtest and the Between-Class Analysis with the function between.

Overall diversity of plant items retrieved in the faeces
The NGS run produced 70588 and 15183 unique sequences with the g/h and ITS1 primers respectively. After cleaning the supposed artefacts, we ended with 117 P6 loop sequences, with a total count of 2598249 sequences corresponding to 80% of the initial number of reads, and 12 ITS sequences (20% of the initial number of reads) from the 39 dung samples ( Table 2). The cumulated number of P6 loop sequences and corresponding identified taxa both reached an asymptote suggesting that continued sampling of tapir faeces would not bring much more information about the richness of the diet (Figure 2).
Importance of the faecal sample treatment DNA extraction, amplification and sequencing succeeded in every sample, irrespective of the treatment. Samples from same dung piles had more similar compositions than random pairs of samples, with a mean composition distance between samples from same dung piles was, indeed, significantly smaller than those simulated between random samples (Monte-Carlo permutation test, p = 0.003). This result is confirmed by a positive correlation between the richness of the P6 loop sequences from the unsorted and sorted samples (Pearson's product-moment correlation, r = 0.54, p = 0.001). As expected, the sequence richness was on average significantly higher in the unsorted samples (58.1 6 14.2 (sd)) than in the sorted ones (53.1 6 14.6 (sd)) (difference = 5.1, 95%CI = [0.2,10.0]) for a same amount of treated faecal material.

Main identified taxa
The barcode reference databases allowed to identify 113 (96.6%) and 47 (40.2%) of the 117 unique P6 loop sequences up to the family and genus levels, respectively. They were assigned to 41 families, 39 genera and 3 species (Table S1, Figure 3). In each Figure 1. Abundance distribution of the P6 loop sequences at different taxonomic levels. ", family": sequences identified at a finer taxonomic resolution than the family, "family": sequences identified at the family level only, ". family": sequences that could not be identified up to the family level. N = 5488 unique sequences longer than 10 bp and with occurrence higher or equal to 11. doi:10.1371/journal.pone.0060799.g001 Table 2. Number of sequences assembled by NGS of vegetal DNA extracted from tapir faeces (N = 39, 79660 mg of faecal product), using plant universal primers targeting the P6 loop region of the trnL (UAA) intron and primers specific for the Sapotaceae family, targeting the first internal transcribed spacer (ITS1) of nuclear ribosomal DNA.
Primer pair g / h (1) ITS1-F / ITS1Sap-R (2) Number of properly assembled sequences 3246036 77884 Number of unique sequences 70588 15183 Number of unique sequences longer than 10 (1) or 40 (2) bp 70552 15162 Number of unique sequences longer than 10 (1) or 40 (2) bp, and with occurrence higher or equal to 11 5488 257 Number of unique sequences after cleaning of the artefacts 117 12 Corresponding  (Table S1). The most repeated sequence matched perfectly with three different genera of the Araceae family. The most frequent P6 loop sequences also matched well the most frequent genera known to be found in the dung piles (Pearson's product-moment correlation between the number of repetitions and frequency of the sequences, r = 0.62, p , 0.001, Figure 3). As expected, the ITS1 primers specially designed to amplify the Sapotaceae family yielded a refined determination resolution, with 9 (75.0%) and 8 (66.7%) of the 12 unique sequences directly attributed to a genus and to a species, respectively, after confrontation to the EMBL database. Coupling with the local ITS database of the Nouragues allowed to further refine the identification of one unidentified sequence to one more species. The list of identified taxa is given in Table 3.

Seasonal variation of diet composition
We did not detect any relationship between the variation the composition and the spatial distance between the dung piles (Mantel test, r = 0.09, p = 0.167). However, the richness of sequences in dung piles peaked between March and July (Figure 4), eventhough no significant difference of sequence richness could be detected between months (Tukey multiple comparisons of means, p . 0.062). The BCAs indicated that 28.9% (p = 0.001) and 31.1% (p = 0.001) of the total variation in the diet composition measured by present P6 loop sequences and identified plant genera in the dung piles, respectively, was affected to the between-months variation, suggesting a seasonal shift in the diet. Both BCAs primarily (first axis) opposed samples collected in March, June and July, containing rather sequences attributed to Pradosia, Vouacapa, Lacunaria, Mouriri, Strychnos, Pourouma and Norantea to those collected from August to December rather with Ficus and Piper and, secondarily (second axis), distinguished samples mainly collected between February and April with Carapichea, Geissospermum, Machaerium, Vantanea, Argostemma, Dalbergia, Coussarea from those collected between July and September with Carapa, Vouacpa, Mouriri and Ficus ( Figure 5).
The sequence occurrence of the identified genera in the dung piles varied much along the year, most of them presenting seasonal peaks, whereas some others were found all year round (Faramea, Miconia, Aechmea, Norantea). In three genera (Geissospermum, Lacunaria, Spondias), the peaking in the P6 loop sequence frequency coincided with the presence of fruit residuals observed in the dung piles (Table 4). Among the 17 genera macroscopically identified by their diaspore residuals in the 39 dung samples, ten were identified using the P6 loop sequence barcode, one (Moutabea) was not retrieved at all and six (Byrsonima, Heisteria, Micropholis, Pouteria, Salacia, Swartzia) were observed in too low frequency (repetitions of P6 loop #33, #361, #163, #11, #163, #49 respectively) to be considered as significant. Nevertheless, the use of the ITS marker for the Sapotaceae confirmed the presence (. 1000 repetitions) of both Micropholis and Pouteria at the same time a case for which the macroscopic fruit residuals were also observed.

Diversity of the diet indicated by the P6 loop marker
The high throughput sequencing technology allowed to produce thousands DNA sequences from every tapir dung pile tested despite the degraded state of faecal material. From these sequences, we discarded a large proportion suspected to be erroneous and the approach seems to yield a lot of waste. Nevertheless, this is not surprising since the production of a very large number of unique sequences with a very low frequency is typical from this type of experiment [22], [27] and results either from degraded template DNA in the faeces, from nucleotide misincorporation during DNA amplification, or from the sequenc-     ing process itself [24]. We actually kept not less than 80% of the total properly assembled P6loop sequences and our very conservative way to clean the statistical artefacts during the sequence analysis minimized the risk of considering erroneous sequences. Treating only about 2.34 g of faecal material (39 samples 6 60 mg) of faecal material was sufficient to distinguish 117 different plant DNA sequences (P6 loop of the trnL intron) and to circumscribe most of the annual plant sequence richness of tapirs' diet in the study area that was measurable by this approach. Interestingly, the samples, of about only 0.03% of the whole dung pile dry mass, were also sufficiently representative of the whole dung pile composition to indicate the variability among the dung piles (random samples from same dung piles were much more similar in composition than samples from random dung piles). Preliminary sorting of larger homogeneous fragments from faeces did not bring any added value in terms of DNA extraction or sequencing success but unsorted samples yielded a greater richness as expected. We stress that it is not only useless but also counterproductive to preliminary sort the fibres from faeces to isolate the apparently less degraded ones for conducting herbivore diet analysis from faeces content sequencing.

Taxonomic resolution
Compared to Hibert et al. [18], the high throughput sequencing of the P6 loop sequences from the tapir faeces allowed the identification of 51.3% (19/37) of the families and 24.1% (13/54) of the genera identified in the tapirs' diet. Hibert et al. executed this through both classical methods such as the macroscopic analysis of fruit residual in some of the same dung piles analysed here and botanical identification of browsing signs observed in the same area of the Nouragues Reserve during the same period. Nevertheless, 55.6% (10/18) of the families not retrieved here had only been found by Hibert et al. [18] in French Guiana and might not be plants very frequently eaten by tapirs. Some plants, consumed as fruits, may have been undetected with the trnL P6 loop barcode because it targeted plastid DNA that is usually rarer in fruit organs [28].
Only 40.2% of the P6 loop sequences were identified to the genus level, which is much less well resolved than in the study by  Table 4. Correlation between the sequence (P6 loop trnL) occurrence and the mean monthly abundance of macroscopic fruit residuals of the same genus found in the same dung piles.  [18]. Most of the remaining families are small plants, that could hardly be a resource for the tapir, such as small, often epiphytic or epilithic, herbaceous ferns (Grammitidaceae, Hymenophyllaceae, Oleandraceae, Nephrolepidaceae, Schizaeaceae, Vittariaceae) and rare plants or localized trees or lianas (Agavaceae, Hernandiaceae, Thymeleaceae, Trigoniaceae, Ixonanthaceae, Rhabdodendraceae). The local DNA barcoding database [38] allowed us to refine the determination of 14 sequences to the genus level. The incompleteness of the reference databases, however, explained only part of the low taxonomic resolution since among the 19 plant families with undetermined genus, 15 included genera from the Reserve with the trnL (UAA) intron sequenced in the EMBL. The low taxonomic resolution for 11 of these families, and for at least 6 more families (among 41), was explained by shared sequences between several genus. As an example, five genera of the Sapotaceae family carried the same P6 loop sequence.
As a result, tapirs may consume plant species with a low or no variation of the P6 loop. In such plant groups, in temperate regions [24], only closely related species are usually not resolved [34]. In our study, however, only three P6 loop sequence could be identified below the genus level by comparison with the EMBL database. We could check that both Spondias and Hieronyma had only one species listed in the Nouragues reserve. Ludwigia has several species in the reserve but two different species are sequenced in the EMBL and they differ, so the molecular marker appears to have a sufficient resolution to distinguish between the species in this genus. We believe that the species identification in these three cases can then be validated.
The low taxonomic resolution seems not limited to our study since when testing their database developed for the tropical trees of the Nouragues Reserve, Gonzalez et al. [38] could unambiguously identify only 63% of the genera using the full trnL intron. This resolution power is much lower than expected for a comprehensive database in temperate regions (90% according to Valentini et al. [24]). Consequently, it is likely, that the shorter P6 loop of the trnL intron is even less efficient in a tropical context to discriminate plant genera. The shared sequences between genera may be explained by a higher rate of lineage diversification and frequent explosive radiations in the tropics compare to temperate ecosystem [38].
Our study stresses the interest of combining complementary markers (see the two-tiered approach of Newmaster et al. [54] such as the internal transcribed spacer of the nuclear ribosomal genes to identify the consumed species (e.g. [33]). Indeed, in our study 9 different Sapotaceae species could be attributed to 75% of the sequences targeted by the ITS marker. Although the primers were designed for Sapotaceae, they are not fully specific for this family and also allowed the amplification of one species in the Burseraceae family. Therefore, further development of markers such as in ITS appears promising for gaining precision on the consumed taxa.

Main plant taxa identified
We can not totally discard the possibility of detecting traces of external DNA contaminating the dung pile sample after defecation with a method as sensitive as NGS. There are different contamination ways such as faecal cross-contamination mediated by dung arthropods that are visiting different piles or contamination by pollen grains from plant species with biparental cytoplasmic inheritance (see [55]) carrying non-diet chloroplast DNA. Nevertheless, we feel that the risk of faecal crosscontamination mediated by phytophageous and coprophageous organisms may be higher in the case of dung deposited on terra firme where dung beetles and earth worms and other litter arthropods are very active and the water may also have limited the penetration from anemophylous pollen. Another possible artefact would be the detection of DNA from pollen desposited on the leaves browsed by the tapirs [56]. Nevertheless, the quantity of pollen that might have been deposited on the dung pile is very low in comparison to the faecal material and so is the amount of possible pollen DNA compared to the amount of plant DNA treated. We chose a highly conservative way to select the useful sequences and this included to not consider the less abundant sequences. If there were any pollen DNA sequences produced, we believe that these would certainly be among the discarded sequences.
Our metabarcoding approach identified two plant families to date not known to be consumed by the lowland tapir: the Marcgraviaceae, with the genus Norantea, and the Marattiaceae, with the genus Danaea, and seven additional new genera: Terminalia (Combretaceae), Hieronyma (Euphorbiaceae), Adelobotrys (Melastomataceae), Aechmea (Bromeliaceae), Dalbergia (Fabaceae), Vantanea (Humiriaceae), Chimarrhis and Argostemma (Rubiaceae). Argostemma, however, is a genus naturally absent from the Guianas and this misidentification may result from the absence of the actually consumed Rubiaceae genus in the EMBL database that carries the same P6 loop. The same risk of misidentifation is possible for all the plant taxa that have not been sequenced in the EMBL yet and this is a major limitation of the approach using only this incomplete reference database.
However and despite a relative low taxonomic resolution in our study case, the metabarcoding approach using the P6 loop approach was well validated by the high concordance with the literature on lowland tapir diet. Out of the families and genera identified here, 95.1% (39/41) and 74.4% (29/39), respectively, were already known to be consumed by tapirs [18]. Moreover, our study allowed to retrieve not less than 41.9% and 10.2% of the plant families and genera, respectively, known to be consumed by lowland tapirs throughout their wide range [18]. This efficiency increased when focusing on the plants consumed in French Guiana, since we identified 51.8% (28/54) of the families and 21.9% (25/114) of the genera known to be consumed in this region. In addition, our approach confirmed that tapirs in French Guiana eat plants from eight families known up to now to be consumed only elsewhere.
If we compare with local information brought by botanical analysis of browsing signs and of fruit residuals in dung samples (including some of the samples treated in the present study) [18], our approach identified 18 other plant families and 20 other genera. This confirmed the interest of the metabarcoding to complement classical approaches of elusive mammal diet.
The most frequent and abundant P6 loop sequences identified in the dung piles belong to taxa among the most often cited in the literature [18]. The top five most abundant families were the Araceae, the Moraceae (notably with genera Bagassa and Ficus), the Melastomataceae (with Miconia and Mouriri), the Sapotaceae (with Pradosia) and the Urticaceae (with Pourouma). These families, together with the Fabaceae (with Vouacapoua and Eperua), the Rubiaceae (with Faramaea), the Cyclanthaceae, and the Arecaceae were also found in most, if not every, dung pile, all year round. The ITS barcoding also confirmed the appetence of tapirs for Sapotaceae, known to be consumed both as fruits and browse [14], [17], [18]. They consumed at least five of the eight genera, and nine of the 64 species listed in the Nouragues Reserve. We could retrieve four of the 12 Sapotaceae species known to be eaten by the lowland tapir throughout its range and extended this list to five more species.

Signature of a seasonality in the diet
Several authors have stressed that the frequency of the occurrence of the prey DNA sequences in the faeces gives a biased indication of the quantity of ingested material [20], [24], [25], but may broadly reflect the relative contributions of the used items both in carnivore and herbivore species [25], [26], [30]. Our study of a free ranging wild and elusive animal, in the natural conditions of the rainforest, is far from meeting the minimum controlled conditions to validate the quantitative information on the diet brought by the metabarcoding of faecal material and it has not this pretention. However, we could check here that abundant sequences correspond to plant taxa among the best known to be used by tapirs as food (cf. above). Rayé et al. [21] used a two season comparison of the frequencies of P6 loop sequences in the faeces to conclude to a seasonal evolution of the diet of chamois. Our dung samples collected all year round also displayed clear seasonal variations in both the presence and the abundance of some plant sequences, while some others seemed to be consumed all year round, such as Faramea, Miconia, Aechmea, Norantea, and might constitute regular browse forage.
The seasonality of the tapirs' diet also clearly appeared when considering the monthly DNA sequence richness. The observed pattern is very similar to the variation of the relative number of fruiting species around the year [42], despite an important offset towards the end of the fruiting season. As most of the species have short delayed germination, this offset could be explained by a higher amount of young seedlings browsed by tapir. Interestingly, in the cases of Geissospermum, Lacunaria and Spondias, known to be eaten as fruits by the tapirs [18], the abundance of sequence peaked concomitantly with the peaking fruit season. This suggests that the P6 loop can, in a few cases, provide a good signature of some fruits consumed by tapirs, but it can also miss many of them. This is not surprising since we targeted chloroplast DNA which is usually much less abundant in fruits than in leaves and stems. The absence of significant correlation between the abundances of DNA sequences and observed fruit residuals in the faeces in some taxa may also be due to the fact that their seeds had been too much degraded by mastication and digestion or spat out by tapirs for the fruits to be macroscopically identified. If the P6 loop is certainly far to be the best marker to identify the plants eaten as fruits, it is, however, very interesting to point at possible browsed candidate taxa, the browsed part being the less investigated in tapirs and rainforest herbivores in general.

Conclusions
The DNA of plants found in the faeces is known to be highly degraded by digestion and we opted for using the short P6 loop of the plastid trnL intron associated to high throughput sequencing in order to maximise the chance to capture short plant DNA fragments in the faeces. This was successful since we could retrieve DNA from more than 50 different plants in every faecal sample.
However, a major limitation to this approach is that the taxonomic resolution is mainly limited to the family or the genus, both because several species can have the same sequence and because the reference database are not yet complete. As long as the sequence reference database will not be taxonomically covered, the best taxa candidates can not all be interpreted as strictly confident records of the diet. We draw the attention of the reader that this approach, at this stage, only provides a list of hypothetical but likely plant taxa that can be eaten by tapirs.
Nevertheless, our study case in the lowland tapir showed that the resulting information on the diet is well consistent with that from classical methods. It confirmed the consumption of many taxa, extended the range of use of some others and also allowed to discover new likely ones. Our study also showed that further development of the approach and refining of the taxonomical resolution is also possible thanks to the combination of complementary markers.
The genetic reference databases are still under development, including DNA sequences from new organisms in new regions of the world (e.g. [57]). As more taxa are being sequenced a refined appreciation of the composition and richness of the diets should be possible.
In the meantime, we feel that the metabarcoding of faecal DNA can already complement the classical approaches of the diet of tropical herbivores in a non-invasive way. Telemetry studies and analysis of the diet by stable isotopes can help focusing on the areas and habitats and large plant categories (e.g. [58], [59]) in which elusive rainforest mammals forage. If the fruit part of the diet can be well analysed up to the genus and species levels through macroscopic identification of the diasporas residuals in the faeces, the barcoding approach has the advantage to point to many likely taxa composing the less inventoried browsed part of the diet of the rainforest mammals. This can be up to 100% of the diet in tapirs during some periods of the year [18]). The barcoding approach can both extend and refine the list of items compared to classical visual histological approaches (see review in Yoccoz [60]). To date, the browse part in tapir diet has been investigated with best taxonomic resolution by the botanical analysis of browsing signs. Browsing signs can be attributed to tapirs thanks to their characteristic foot prints but in lighter animals not letting identifiable tracks around the found browsed plants, the identification of the browser may be more problematic. In certain cases, the plants may also have been browsed to such an extent that they are no more identifiable nor visible. The faecal DNA barcoding can interestingly complement the information on the taxa browsed by the rainforest animals.
Hence it appears promising to identify key resources and associated habitats for endangered species, better characterize the dynamics of the resource partitioning in elusive herbivore communities, and more generally better appreciate the role of the herbivores in the functioning and the diversity of the rainforest ecosystem. However, we stress that similar tests using diet information from other approaches should be conducted in ruminants and fore-gut fermenters in which the digestive efficiency of fibres is different than in the hind-gut fermenter tapirs [13]. This could also be tested in other endangered groups of elusive species of the rainforest, such as primates and birds, in which information about the diet has been gathered through classic approaches [61] to extend it to groups in which information on the diet is still missing.

Data accessibility
Fasta files and filtered data have been deposited in the Dryad repository: doi: 10.5061/dryad.8h7k1.

Supporting Information
Table S1 List of plant taxa identified in tapir faeces of the Nouragues Reserve after matching of the P6 loop of trnL (UAA) sequences with the EMBL reference and local databases. "Indet." stands for indeterminate. The figures into brackets indicate the number of different sequences for the unidentified genera. Species names followed by "?" have been inferred a posteriori for genera with only one species listed in the Reserve (www.nouragues.cnrs. fr/). All of these genera but Bagassa, however, have several species listed in French Guiana and in the region. Presence of the taxa in the region was checked in Funk et al. (2007). Only one species of Machaerium has been listed in the Nouragues but two different sequences were found in the tapir dung. (DOCX)