Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Beyond Biodiversity: Fish Metagenomes

  • Alba Ardura ,

    Affiliation Department of Functional Biology, University of Oviedo, Oviedo, Spain

  • Serge Planes,

    Affiliations EPHE-URA CNRS 1453, Université de Perpignan, Perpignan Cedex, France, USR 3278 CNRS, EPHE Centre de Recherche Insulaire et Observatoire de l'Environnement (CRIOBE) BP, Papetoai, Moorea, Polynésie française

  • Eva Garcia-Vazquez

    Affiliation Department of Functional Biology, University of Oviedo, Oviedo, Spain

Beyond Biodiversity: Fish Metagenomes

  • Alba Ardura, 
  • Serge Planes, 
  • Eva Garcia-Vazquez


Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.

Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level.

Here we have applied the metagenome approach employing the Barcoding target gene COI as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.

Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.

We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.


Biodiversity found on Earth today is the result of 3.5 billion years of evolution. There are varied definitions of biodiversity, from “the totality of genes, species, and ecosystems of a region” to “the diversity of genes and organisms”. In 1992 the Conference of the United Nations on Environment and Development was celebrated in Rio de Janeiro, Brazil, also known as the “Summit of the Earth”. In this meeting the Convention on Biological Diversity (CDB), which focused on the conservation and the sustainable use of the biodiversity, was signed. CDB has three main goals: conservation of biodiversity, sustainable use of its components and fair equitable sharing of benefits arising from genetic resources. Ensuring Environmental Stability is one of the United Nations Millennium Development Goals established to end poverty, showing the great importance of the environment and biodiversity. Biodiversity is commonly referred to as the combination of species present in an ecosystem. Each species within an ecosystem exhibits in addition to other types of diversity, intra-specific genetic diversity. Genetic diversity is thus a crucial component of biodiversity and fundamental to species survival and to enabling appearance of new species. It is the basis of reproductive performance, resistance to diseases and capacity of adaptation to environmental changes [1][3]. Biodiversity and genetic diversity are dependent upon each other: diversity within a species is necessary to maintain diversity among species, and vice versa [4]. We thus infer that when ecosystems are subjected to exploitation or other alterations, complete estimates that combine both types of diversity are crucial for describing community conservation status.

The concept of metagenome includes both inter- and intra-specific genetic diversity because all the individual genomes present in an environmental sample are considered as a unit. Metagenomics is the culture-independent genomic analysis of a community of microorganisms, generally aimed at community-wide assessment of metabolic functions [5]. The analysis of metagenomic data provides a way to identify new organisms and isolate complete genomes from uncultured species that are present within an environmental sample [6]. Therefore, a metagenome could also be defined as an assemblage of genomes that occupies an ecological niche. To date, the metagenome concept is reserved for microorganisms: ruminant metagenome [7], marine metagenome [8], metazoan metagenome [9] etc. ( Although it is difficult to extend the perspective to higher eukaryotes like vertebrates, given the huge size of some genomes, a shortcut is possible if we focus on one or a few genes. The international Barcoding initiative [10] can help in this task. DNA Barcoding is based on the use of a standard region employed to catalogue the world's biota, including fish: FISH-BOL is the campaign aimed at DNA barcoding all fish species [11] ( The mitochondrial COI gene targeted by Barcoding projects is useful for species identification, and also exhibits intra-specific polymorphism. Within-species genetic diversity being crucial for this approach, COI intra-specific variation could be advantageous over more conserved genes like the 18S rRNA, which has been employed in metagenome approaches applied to lower Eurkaryotes [9]. The typical sequence information gathered for DNA barcoding can provide an early insight into the patterning of genomic diversity within a species, facilitating comparative studies of genetic diversity in different species or ecological settings [12].

Fisheries have been identified as one of the main causes for the loss of animal genetic diversity [13]. Fisheries exploit diversity, both species biodiversity and intra-specific diversity. Most targeted species are predators which as a consequence are declining dramatically and altering the rest of species in the trophic chain [14], [15]. Intra-specific diversity can also be drastically altered by fisheries. For example, changes in life histories from the systems results in remaining breeders becoming increasingly smaller [16], [17]. On the other hand, overfishing on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity, with unpredictable effects on species biodiversity [4].

The tropical Amazon River is an ecologically critical reservoir of Earth diversity [18]. Its fish community is exploited by artisanal fisheries and basic control tools are being developed now [19]. Recent population declines of targeted species [20] suggest fisheries overexploitation, and could be taken as a warning to assess the current levels of genetic diversity of the fish community and, if necessary, rapidly address conservation measures. Our objective being a rapid evaluation of diversity of the exploited species, we have analyzed a sample of commercial fish representing 65% of the Amazonian catch (in annual tons) in the central region of Manaus (Table 1), applying a metagenome perspective with Barcoding sequences as a tool. To our knowledge this is the first time this procedure has been followed in fishery science. To expand this idea with other contrasting case studies we have chosen three different well known fisheries from European regions, one continental (freshwater) and two marine. The freshwater fishery practiced in north Spanish rivers is sportive (angling) and strongly targeted on Salmonids. The two marine areas considered were: the Mediterranean biodiversity hotspot where populations are also declining [21], [22] (Roussillon south French area); the Cantabric region, where fisheries target large predators similar to most marine commercial fisheries worldwide [15] (north Spanish Atlantic area near the Bay of Biscay).


The Amazon River sample contained more species (seven) than samples from the Mediterranean and Cantabric marine fisheries (four species each), whereas Spanish rivers samples only contained two species (Table 1). The DNA sequences obtained for all species were submitted to the GenBank ( and are available with the accession numbers shown in Table 2. The average trophic level of the catch (Table 3) was lower in the Amazon River than in the other locations (2.636 versus 3.058, 3.719 and 3.738 respectively). The species in our samples represented 0.56% to 12% of the total number of fish species inventoried from the respective ecosystems (Table 3).

Table 2. Fish species analyzed in each region: common name; specific name; n° haplotypes of each species; region; GenBank Accession Number.

Table 3. Genetic diversity and biodiversity in the four case studies considered.

Considering all Barcoding sequences obtained from each catch as a unit –the metagenomic diversity-, the highest number of haplotypes (and accordingly hyplotypic diversity) and nucleotide diversity corresponded to the Cantabric catch, followed by the Mediterranean, Amazonian and Spanish rivers catches (Table 3; 17, 15, 14 and 4 haplotypes respectively, represented as yellow dots in Fig. 1R). From this perspective, the Amazonian catch, although containing more species, exhibited only moderate metagenomic diversity.

Figure 1. At right (R), haplotype networks obtained for the COI gene from Amazonian (a), Mediterranean (b), Cantabric (c) and Spanish freshwater (d) fish.

At left (L), phylogenetic trees constructed based on COI protein sequences of the same samples. In the haplotype networks, yellow dots correspond to real haplotypes and red dots are internal nodes representing hypothetical intermediate mutations. For Spanish freshwater fish a phylogenetic tree cannot be constructed because the protein sequence is identical for the two species (Salmo trutta and Salmo salar).

From the point of view of final gene products (polypeptides), because the COI protein is highly conserved, intraspecific diversity was mostly due to synonymous substitutions. Accordingly (Fig. 1L), only one putative protein was obtained per species (per genus in the case of Atlantic salmon Salmo salar and brown trout S. trutta, with identical proteins), except for the anchovy Engraulis encrasicolus, with two proteins corresponding to different lineages or cryptic species [23]. The number of DNA sequences (haplotypes) per protein variant, as an indicator of the degree of subjacent genetic diversity, ranged from a value of 2 for the Amazonian sample to a value of 4.25 haplotypes per polypeptide for the Cantabric catch (Table 3), further highlighting the low genetic diversity in the exploited Amazonian fish examined here.

Limited genetic diversity was not associated with low biodiversity of the analyzed Amazonian catch. Ecological, taxonomic and phylogenetic diversity indices, as well as haplotype network complexity, reflect the exploited biodiversity as the number of species caught combined with their evolutionary interrelations. Biodiversity indices (ecological, taxonomic and phylogenetic) of the Amazonian catch were the highest of the four case studies considered (Table 3). The Amazonian haplotype network was also the most complex (Fig. 1R) because it contained more species belonging to different families, with considerably high distances between sequences. This network contained more internal nodes (21, in red) separating haplotypes, while the shape of the Mediterranean and Cantabric marine networks exhibited only six and four internal nodes respectively, indicating that these haplotypes were connected by less mutational steps and were phylogenetically closer than Amazonian genetic variants. Low metagenomic diversity despite high biodiversity can be considered a strong signal of depleted genetic diversity at the community level, and emphasizes the urgency of revising management of Amazonian fisheries.


The examples presented in this study reveal the potential of Barcoding data for rapid evaluation of diversity and, in a larger scope, for comparative studies of genetic diversity in different ecological settings [17]. Although DNA barcoding may not be sufficient to rigorously address population-level questions [24], it may be an ideal tool for early detection of genetic depletion of exploited species.

Fisheries overexploitation could be a possible cause of apparently reduced genetic diversity in the Amazonian catch [13], although likely not the only one. The patterns of commercial landings suggest overexploitation of Amazonian fisheries because large high-valued species declined significantly and were replaced by smaller, short-lived and lower-valued species [20]. Lower trophic level of Amazonian catch could be interpreted as an evidence of ‘fishing down the food web’ [25] and thus further independent evidence that Amazonian fisheries are overexploited. However, fishing pressure could not be the only possible reason for biodiversity loss. It is a well known fact that marine ecosystems possess higher levels of diversity than freshwater [e.g. 26], [27]; therefore, the differences found between the Amazon and the marine fisheries here analyzed could be a natural consequence of such ecosystem differences. Finally, this is an exploratory study and, as such, a limited number of sequences have been analyzed from each fishery. The possibility of a sampling artefact can not be excluded for explaining relatively low intra-specific diversity in Amazon samples, as well as biases in ecological, taxonomic and phylogenetic indices.

Treating Barcoding data [10], [11] like Eukaryote metagenomes will lead to a better understanding of the biodiversity exploited by humans. The importance of this type of metagenome-like approach becomes greater as long as the two levels of diversity are closely interrelated [4]. It could be applied not only to fishery science but also to a vast variety of ecological studies. We understand that further statistical and theoretical developments will be needed, and that the case studies depicted here are a simple example of the potential of this novel idea. Considering other genes of different degrees of conservation, large SNP datasets, and even whole genomes will surely be the next steps for analysing and understanding eukaryote metagenomes.

Materials and Methods

The number of species inventoried from each ecosystem in our study was taken from FishBase ( The same database was consulted to obtain the trophic level of the species considered.

For each region, a total of 40 COI sequences were obtained from random samples of the species that represent 65% of the catch in each region, as estimated from the official regional catch statistics that can be found at,, and for Amazonian (Manaus, Brazil), Mediterranean (Perpignan (Gulf of Lyon), Roussillon, France), Cantabric and north Spanish rivers (Narcea, Sella and Cares rivers; and different fish markets of Asturian region) respectively. Samples were obtained directly from local markets (or from fishermen in the case of north Spanish rivers, where sport catches are destined for personal consumption), sampled at random during at least five different weeks in 2010. In the case of Spanish rivers, Atlantic salmon catches are registered but the data for brown trout catch are based on surveys to anglers, being less reliable; by this reason we have expanded the sampling to >90% catch and included Atlantic salmon. The number of samples of each species was proportional to the relative weight of those species in the catch. The Amazonian sequences were obtained in the context of the Barcoding project [19].

Ecological (Shannon H), taxonomic (TTD) and phylogenetic (sΦ+) indices of each regional catch (always the main species corresponding to 65% of the total catch in tons) were calculated using PRIMER 6 (Software package from Plymouth Marine Laboratory, UK). The total taxonomic distinctness TTD was applied because these communities are spatially independent and may vary in their phylogenetic composition [28]. sΦ+ is the total variance of pairwise path lengths and can be interpreted as an index of the complexity of the hierarchical tree.

For obtaining the sequences we have employed the primers and methodology described by Ward et al [29]. Sequencing was performed with the DNA sequencing service GATC Biotech. Sequences were visualized and edited employing the BioEdit Sequence Alignment Editor software [30]. Sequences were aligned with the ClustalW application [31] included in BioEdit.

Conventional measures such as haplotype diversity have been employed because they potentially provide useful information in regard of the study of genetic diversity of species [12]. Sequence (nucleotide) diversity for each regional catch was estimated employing the DnaSP software [32], considering all sequences together without separating species.

The phylogenetic analysis was performed with the software MEGA 4.0 [33]. This software was also used to infer the putative protein (amino acid sequence) obtained from each COI sequence, and to construct the phylogenetic trees based on those amino acid sequences. The neighbor-joining (NJ) methodology was applied for phylogenetic inference, as is common in DNA barcoding studies [10]. The best suited model of protein sequence evolution and accompanying evolutionary parameter values for the data were determined using the PROTTEST [34], [35]. The best-fit evolutionary model of the amino acid sequences analyzed was JTT Matrix (Jones-Taylor-Thornton) [36], [37], with a gamma shape value of 4.59 for Amazonian, 4.60 for Cantabric Sea and 0.27 for Mediterranean Sea samples. Robustness of the NJ topology was assessed using 2,000 bootstrap replicates. Haplotypes networks were constructed with the program Network (, with default settings.


We are grateful to Ivan G. Pola for helping in laboratory tasks, and to Ana R. Linde, Angel Rosales, Gema E. Adan, Vanessa Gomes, Ione Ginuino, Aida Dopico, Eduardo del Rosal and Nathalie Tolou for collaboration in market sampling. Prof. Francis Juanes (University of Massachusetts) kindly revised the manuscript. The authors state that there is no conflict of interests regarding this study. AA is responsible for laboratory work and SP and EGV designed the study and analyzed the results. The three authors are co-responsible for the manuscript writing and fully agree with its contents. We are grateful to two anonymous reviewers who kindly helped to improve this article.

Author Contributions

Conceived and designed the experiments: AA SP EG-V. Performed the experiments: AA SP EG-V. Analyzed the data: AA SP EG-V. Contributed reagents/materials/analysis tools: AA SP EG-V. Wrote the paper: AA SP EG-V. Responsible for laboratory work: AA. Designed the study and analyzed the results: SP EG-V.


  1. 1. Frankham R (1995) Conservation genetics. Ann Rev Genet 29: 305–327.
  2. 2. Hedrick PW (2001) Conservation genetics: where are we now? Trends Ecol Evol 16(11): 629–636.
  3. 3. Wang SZ, Hard JJ, Utter F (2002) Genetic variation and fitness in salmonids. Conserv Genet 3(3): 321–333.
  4. 4. Lankau RA, Strauss SY (2007) Mutual feedbacks maintain both genetic and species diversity in a plant community. Science 317: 1561–1563.
  5. 5. Kennedy J, Marchesi JR, Dobson ADW (2008) Marine metagenomics: strategies for the discovery of novel enzymes with biotechnological applications from marine environments. Microb Cell Fact 7: 27.
  6. 6. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through eastern tropical Pacific. Plos Biology 5(3): e77.
  7. 7. Singh B, Gautam SK, Verma V, Kumar M, Singh B (2008) Metagenomics in animal gastrointestinal ecosystem: Potential biotechnological prospects. Anaerobe 14(3): 138–144.
  8. 8. Landgridge G (2009) Testing the water: marine metagenomics. Nat Rev Microbiol 7: 552.
  9. 9. Fonseca VG, Carvalho GR, Sung W, Jonson HF, Power DM, et al. (2010) Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature Comm 1: 98.
  10. 10. Hebert P, Cywinska A, Ball S, deWaard J (2003) Biological identification through DNA barcodes. P Roy Soc B-Bio Sci 270: 313–321.
  11. 11. Ward RD, Hanner R, Hebert PDN (2009) The campaign to DNA barcode all fishes, FISH-BOL. J Fish Biol 74: 329–356.
  12. 12. Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends Genet 23(4): 167–172.
  13. 13. Worm B, Barbier EB, Beaumont N, Duffy JE, Folke C, et al. (2006) Impacts of Biodiversity Loss on Ocean Ecosystem Services. Science 314: 787–790.
  14. 14. Myers R, Worm B (2003) Rapid worldwide depletion of predatory fish communities. Nature 423: 280–283.
  15. 15. Myers RA, Worm B (2005) Extinction, survival or recovery of large predatory fishes. Philos Transactions of the Roy Soc B 360(1453): 13–20.
  16. 16. Law R (2000) Fishing, selection, and phenotypic evolution. Ices J Mar Sci 57(3): 659–668.
  17. 17. Brown CJ, Hobday AJ, Ziegler PE, Welsford DC (2008) Darwinian fisheries science needs to consider realistic fishing pressures over evolutionary time scales. Mar Ecol-Prog Ser 369: 257–266.
  18. 18. Agostinho AA, Thomaz SM, Gomes LC (2005) Conservation of the Biodiversity of Brazil's Inland Waters. Cons Biol 19(3): 646–652.
  19. 19. Ardura A, Linde AR, Moreira JC, Garcia-Vazquez E (2010) DNA Barcoding for conservation and management of Amazonian commercial fish. Biol Cons 143: 1438–1443.
  20. 20. Garcia A, Tello S, Vargas G, Duponchelle F (2009) Patterns of commercial fish landings in the Loreto region (Peruvian Amazon) between 1984 and 2006. Fish Physiol Biochem 35: 53–67.
  21. 21. Bearzi G, Politi E, Agazzi S, Azzellino A (2006) Prey depletion caused by overfishing and the decline of marine megafauna in eastern Ionian Sea coastal waters (central Mediterranean). Biol Conserv 127(4): 373–382.
  22. 22. Ferretti F, Myers RA, Serena F, Lotze HK (2008) Loss of Large Predatory Sharks from the Mediterranean Sea. Conserv Biol 22(4): 952–964.
  23. 23. Borsa P (2002) Allozyme, mitochondrial-DNA, and morphometric variability indicate cryptic species of anchovy (Engraulis encrasicolus). Biol J Linn Soc 75(2): 261–269.
  24. 24. Moritz C, Cicero C (2004) DNA barcoding: Promise and pitfalls. Plos Biol 2(10): 1529–1531.
  25. 25. Pauly D, Christensen V, Dalsgaard J, Froese R, Torres F Jr (1998) Fishing down marine food webs. Science 279(5352): 870–873.
  26. 26. DeWoody JA, Avise JC (2000) Microsatellite variation in marine, freshwater and anadromous fishes compared with other animals. J Fish Biol 56(3): 461–473.
  27. 27. Carr MH, Neigel JE, Estes JA, Andelman S, Warner RR, Largier JL (2003) Comparing marine and terrestrial ecosystems: implications for the design of coastal marine reserves. Ecol Appl 13(1): S90–S107.
  28. 28. Schweiger O, Klotz S, Durka W, Kuhn IA (2008) A comparative test of phylogenetic diversity indices. Oecologia 157(3): 485–495.
  29. 29. Ward RD, Zemlak TS, Innes BH, Last PD, Hebert PDN (2005) DNA barcoding Australia's fish species. P Roy Soc B-Biol Sci 360: 1847–1857.
  30. 30. Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids S 41: 95–98.
  31. 31. Thompson JD, Higgins DG, Gibson TJ (1994) Clustal-W Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22): 4673–4680.
  32. 32. Librado P, Rozas J (2009) DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452.
  33. 33. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24: 1596–1599.
  34. 34. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21: 2104–2105.
  35. 35. Posada D (2008) jModelTest: Phylogenetic model averaging. Mol Biol Evol 25(7): 1253–1256.
  36. 36. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosc 8(3): 275–282 (1992).
  37. 37. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10: 512–526.