26 Nov 2014: The PLOS ONE Staff (2014) Correction: Taxonomic Reference Libraries for Environmental Barcoding: A Best Practice Example from Diatom Research. doi: info:doi/10.1371/journal.pone.0114758 View correction
DNA barcoding uses a short fragment of a DNA sequence to identify a taxon. After obtaining the target sequence it is compared to reference sequences stored in a database to assign an organism name to it. The quality of data in the reference database is the key to the success of the analysis. In the here presented study, multiple types of data have been combined and critically examined in order to create best practice guidelines for taxonomic reference libraries for environmental barcoding. 70 unialgal diatom strains from Berlin waters have been established and cultured to obtain morphological and molecular data. The strains were sequenced for 18S V4 rDNA (the pre-Barcode for protists) as well as rbcL data, and identified by microscopy. LM and for some strains also SEM pictures were taken and physical vouchers deposited at the BGBM. 37 freshwater taxa from 15 naviculoid diatom genera were identified. Four taxa from the genera Amphora, Mayamaea, Planothidium and Stauroneis are described here as new. Names, molecular, morphological and habitat data as well as additional images of living cells are also available electronically in the AlgaTerra Information System. All reference sequences (or reference barcodes) presented here are linked to voucher specimens in order to provide a complete chain of evidence back to the formal taxonomic literature.
Citation: Zimmermann J, Abarca N, Enk N, Skibbe O, Kusber W-H, Jahn R (2014) Taxonomic Reference Libraries for Environmental Barcoding: A Best Practice Example from Diatom Research. PLoS ONE 9(9): e108793. doi:10.1371/journal.pone.0108793
Editor: Bernd Schierwater, University of Veterinary Medicine Hanover, Germany
Received: May 13, 2014; Accepted: August 14, 2014; Published: September 29, 2014
Copyright: © 2014 Zimmermann et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper. The web link to AlgaTerra for all taxa, the DNA Bank number and the EMBL accession numberss are available in Table 2.
Funding: Deutsche Forschungsgemeinschaft (http://www.dfg.de/) grant INST 130/839-1 FUGG. Institution was funded. Deutsche Forschungsgemeinschaft (http://www.dfg.de/) grant GE 12 42/11-1 RJ; JZ is employed on this grant. German Federal Ministry of Education and Research (http://www.bmbf.de/) funding for microscopy data within the GBIF-D project 01 LI 1001 A. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Diatoms are unicellular and usually photoautotroph micro algae which are responsible for about 25% of global CO2 fixation – and contribute approximately 20% of the global net primary production .
Diatoms are important bioindicators for monitoring water quality because they are sensitive to changes in pollution, nutrient availability, acidity and salinity, e.g. , . They are the most ubiquitous group within the microscopic algae as they occur in all types of water bodies and play an important part in benthic and planktonic biocoenoses . They are routinely used as bioindicators within the EU Water Framework Directive (WFD) as well as in water quality monitoring worldwide –.
Each diatom cell is encased in two siliceous shells (frustules) that are connected by girdle bands –. Current identification of diatoms is based on a morphological and mostly descriptive species concept (Zimmermann et al. subm.) and relies exclusively on micro-characters of the frustule such as size, symmetry, shape, and sculpture which can be seen by light microscopy ; more detailed analyses of the siliceous structures lead to more and more refined differentiation of species, which is possible through the development of higher resolution techniques, e.g. electron microscopy.
Identification via microscopy is challenging and time consuming, especially for routine use , and relies on individual taxonomic expertise. Therefore different taxonomists could arrive at different conclusions, depending i.a. on the taxonomic concept, species with limited diagnostic morphological features, cryptic species, available reference floras and quality of microscopes used by each individual researcher  as well as unavailability of adequate descriptions.
The application of molecular markers for taxon identification – DNA barcoding – is an emerging method which has the potential to be faster, universally applicable and generate reliable identification. Furthermore, as it uses DNA sequences for identification, it is independent of pre-existing morphological species concepts and can be linked to any taxonomic concept . However, correct identification relies fundamentally on the quality of the reference library the DNA barcodes are checked against. DNA barcoding is based on the assumption that sequences of a certain marker locus exhibit enough variation between species to be discriminative for unambiguous species discovery , . DNA barcoding is also a useful tool to access concealed diversity e.g. –. DNA barcoding in combination with next generation sequencing techniques also allows for the description of community compositions through the large numbers of sequences generated by this approach e.g. , . A schematic overview on environmental DNA barcoding of diatoms and the establishment of a reference library is given in Fig. 1.
The requirement for reliable taxon identification by DNA barcode(s) is an unambiguous link between the genotype and the phenotype (or morphotype) to which the name of the species is attached. This means that a reference library consisting of taxon names belonging to specimens that have been identified by experts as well as providing descriptions together with barcode sequences, which were derived from well documented strains (e.g. voucher deposition, sampling localities and collectors, basic environmental data, high-resolution LM pictures, morphometrics, taxonomy and nomenclature, maps, literature and references to databases where this data is deposited) for every single species is necessary. For unicellular diatoms, clone cultures (strains) need to be established which offer enough material for sequencing as well as for identification by light and electron microscopy. Once established and linked to a taxonomic reference library, the DNA barcoding method could offer a time and cost efficient alternative/extension to microscopic identification for routine applications by limiting morphological taxonomy to critical groups which feature a distinct genetic aberration to known and identified organisms in the library.
Recently, the CBOL Protist Working Group  has designated the 18S V4 rDNA marker region as first or pre-barcode for Protist organisms. In this paper, we follow the 18S V4 protocols designed for diatoms by Zimmermann et al. , and present 70 strains for which this pre-barcode (18S V4) as well as a second widely used barcode, rbcL , , , has been generated. The reference library includes these two DNA barcodes, the respective taxon name, images, morphometric and geographic data as well as vouchers for further reference. Further data and additional images also of living cells are available electronically through the AlgaTerra Information System . We demonstrate the benefits of a well documented reference library for DNA barcoding for identification, taxonomy, phylogeny, and further scientific analyses on an exemplary group. This paper focuses on naviculoid diatom strains from Berlin waters since its diatom flora has been well studied for almost two centuries by light microscopy  and a recent diatom flora is available for water quality assessments .
Materials and Methods
Benthic samples from which the 70 strains were established were collected at 11 sites in the catchment area of Berlin (Fig. 2); one additional sample was from the River Elbe, downstream of the Berlin Rivers Spree and Havel. Conductivity of Berlin water ranges mostly between 400 to 900 µS cm−1, pH is frequently 6,5 to 9 (80% respectively 88% of about 300 measurements of Berlin water samples, Kusber unpubl. data). For samples, sites, dates, collectors of the samples and isolators of the strains see Table 1. No specific permissions were required for the sampled locations/activities. The field studies did not involve endangered or protected species.
The diatom cells were isolated from environmental water samples observed under a stereo light microscope using capillary glass pipettes. The respective cell was then transferred to a 5 cm diameter plastic petri dish containing autoclaved habitat water and/or culture medium (WC , Chu , AlgaGrow, Plagron, Weert, Netherlands) of adequate salinity and pH. In order to remove unwanted particles, this treatment was repeated several times until microscopic inspection confirming that a culture derived from one cell, but not axenic had been established. The cultures were grown at a temperature between 18–22°C and a 12 h day/night cycle.
Preparation of frustules
By the time of harvesting the cultures, one fraction was used for obtaining DNA (see below) and the other part was cleaned with H202 at 80°C and rinsed several times with H20. A few drops of the resulting suspension of diatom frustules were dried on a cover slip and embedded as slides in Naphrax for study in LM or on stubs if for SEM. Vouchers of each strain were deposited in the Herbarium Berolinense (B) (see Table 2).
Light and electron microscopy
The LM pictures were acquired with a Zeiss Axio Imager.M2 with an implemented AxioCam HRc (Zeiss, Oberkochen, Germany). SEM pictures were produced with Philips SEM 515 operating at 30 KV (Philips, Eindhoven, The Netherlands), and Hitachi 8010 Field Emission Electron Microscope (Hitachi, Tokyo, Japan).
The taxa were identified with Hofmann et al. , Krammer & Lange-Bertalot (1997) , Ettl & Gärtner (2013) , Lange-Bertalot (2001), Levkov et al. (2009) , Levkov et al. (2014)  as well as particular papers (vide infra) for selected species. For strain numbers, taxon names, voucher codes in the Herbarium Berolinense (B), EMBL Accession Numbers, images, and morphometric data for all strains see Table 2.
The harvested cultures were transferred to 1.5 ml tubes. DNA was isolated using Dynal DynaBeads (Invitrogen Corporation; Carlsbad, CA, USA), NucleoSpin Plant II Mini Kit (Machery and Nagel, Düren, Germany) or Qiagen Dneasy Plant Mini Kit (Qiagen Inc.; Valencia, CA) following the respective product instructions. DNA concentrations were checked using gel electrophoresis (1.5% agarose gel) and Nanodrop (PeqLab Biotechnology LLC; Erlangen, Germany). DNA samples were stored at −20°C until further use. DNA material was deposited in the Berlin collection of the DNA bank network .
The V4 region of the 18S locus was amplified in all strains with the primer pair M13F-D512 for 18S/M13F-D978rev 18S . The rbcL locus was amplified in two overlapping parts using two different primer pairs; Diat-rbcL-F and Diat-rbcL-iR as well as Diat-rbcL-iF and Diat-rbcL-R  for all strains. The polymerase chain reaction (PCR) for the V4 region was conducted after Zimmermann et al. (2011)  and for rbcL carried out after Abarca et al. (2014) . PCR products were visualised in a 1.5% agarose gel and cleaned with MSB Spin PCRapace (Invitek LLC; Berlin, Germany) following standard procedure. DNA content was measured using Nanodrop (PeqLab Biotechnology). The samples were normalised to a total DNA content >100 ng/µl using Nanodrop (PeqLab Biotechnology) for further sequencing.
The Sanger sequencing was conducted by Starseq (GENterprise LLC; Mainz, Germany). As sequencing primers the M13 tails ,  were used for the V4 region, following . The sequences were edited in PhyDE  aligned using MUSCLE , and alignments were manually improved in PhyDE .
The aligned sequences were compared to each other calculating uncorrected p distances in PAUP . Then they were blasted against existing INSDC entries for the respective taxa (accessed July 2013). All INSDC accessions with references are given in Appendix S1. Base pair differences were counted in overlapping parts of the sequences in Mega 5 . Results are summarised in Table 3.
To identify molecular relations between the here presented strains, trees were calculated with Mega 5 using the Neighbour Joining algorithm with gamma distributed rates among sites followed by a statistical test of the tree topologies with 10 000 bootstrap replications. Trees for the individual alignments of 18S V4 and rbcL sets as well as a concatenated dataset were calculated.
Furthermore, we created 18S V4 as well as rbcL datasets including INSDC sequences for the genera Amphora, Mayamaea, Planothidium and Stauroneis to exemplarily test the taxonomic consistency of available sequences as well as the placement of our new taxa. Each of these eight datasets was analysed under the aforementioned conditions.
The electronic version of this article in Portable Document Format (PDF) in a work with an ISSN or ISBN will represent a published work according to the International Code of Nomenclature for algae, fungi, and plants, and hence the new names contained in the electronic publication of a PLOS ONE article are effectively published under that Code from the electronic edition alone, so there is no longer any need to provide printed copies. The online version of this work is archived and available from the following digital repositories: PubMed Central, LOCKSS. http://edocs.fu-berlin.de/docs/content/below/index.xml.
The morphological identification of the 70 strains resulted in 37 taxa (see Table 2 and Figs. 3 and 4). 21 taxa were identified by only one strain but 10 taxa were represented by two strains, three taxa by three strains, one taxon by four strains, one taxon by five strains and one taxon by 11 strains.
Fig. 3.1. Achnanthidium saprophilum (H.Kobayasi & Mayama) Round & Bukht., Strain D06_036. Fig. 3.2.–3. Planothidium frequentissimum (Lange-Bert.) Lange-Bert., Strain D06_139. Fig. 3.4.–5. Planothidium lanceolatum (Bréb. ex Kütz.) Lange-Bert., Strain D06_047. Fig. 3.6. Karayevia ploenensis var. gessneri (Hust.) Bukt., Strain D03_034. Fig. 3.7. Luticola sparsipunctata Levkov, Metzeltin & Pavlov, Strain D06_029. Fig. 3.8. Amphora pediculus (Kütz.) Grunow, Strain D03_074. Fig. 3.9. Amphora sp. aff. atomoides Levkov, strain D54_002. Fig. 3.10. Amphora cf. pediculus (Kütz.) Grunow, Strain D03_082. Fig. 3.11. Amphora ovalis (Kütz.) Kütz., Strain Amph4. Fig. 3.12. Cocconeis pediculus Ehrenberg, Epitype-Strain D36_020. Fig. 3.13. Cocconeis placentula Ehrenberg, Epitype-Strain D36_012. Fig. 3.14. Sellaphora seminulum (Grunow) D.G.Mann, Strain D06_006. Fig. 3.15. Eolimna minima (Grunow) Lange-Bert., Strain D03_030. Fig. 3.16. Sellaphora pupula (Kütz.) Mereschk., Strain D06_060. Fig. 3.17.–18. Mayamaea permitis (Hust.) Bruder & Medlin, Strain D06_106. Fig. 3.19. Caloneis amphisbaena (Bory) Cleve, Strain Navi1. Fig. 3.20. Navicula tripunctata (O.F.Müll.) Bory, Strain D03_139. Fig. 3.21. Navicula rhynchotella Lange-Bert., Strain D06_093. Fig. 3.22. Navicula radiosa Kütz., Strain D06_102. Fig. 3.23. Craticula cuspidata (Kütz.) D.G.Mann, Strain Navi4. Fig. 3.24. Navicula gregaria Donkin, Strain D06_122. Fig. 3.25. Craticula buderi (Hust.) Lange-Bert., Strain D06_069. Fig. 3.26. Navicula cryptocephala Kütz., Strain D06_059. Fig. 3.27. Navicula slesvicensis Grunow, Strain D06_038. Fig. 3.28. Stauroneis phoenicenteron (Nitzsch) Ehrenb., Strain Stau1. Fig. 3.29. Pinnularia neomajor Krammer, Strain PinnB. Fig. 3.30. Pinnularia sp., Strain PinnC. Fig. 3.31. Pinnularia viridiformis Krammer, Strain Pinn2. Fig. 3.32. Pinnularia viridiformis Krammer, Strain ElPin01. Fig. 3.33. Pinnularia sp., Strain Navi2. Fig. 3.34. Gomphonema saprophilum (Lange-Bert. & Reichardt) N.Abarca, R.Jahn, J. Zimmermann & Enke, Strain D36_003. Scale bar represents 10 µm.
Figs. 4.1a–i. Amphora berolinensis N.Abarca & R. Jahn sp. nov., Strain HSB02; Fig. 4.1d–i. Holotype B 40 0040823. Figs. 4.2a–g. Mayamaea terrestris N.Abarca et R.Jahn sp. nov., Strain D29_009b; Fig. 4.2d–g. Holotype B 40 0040847. Figs. 4.3a–h. Planothidium caputium J.Zimmermann & R.Jahn sp. nov., Strain D06_014; Fig. 4.3e–h Holotype B 40 0040871. Fig. 4.4a–h. Stauroneis schmidiae R.Jahn & N.Abarca sp. nov., Strain D28_008; Fig. 4.4f–h. Holotype B 40 0040882.
DNA sequence analyses
PCR and sequencing success for 18S V4 and rbcL was 100% for all strains, resulting in 140 reference sequences for 70 strains. We established 129 novel sequences (INSDC accession numbers KM084866-KM084994) and an additional 11 sequences that had been previously published in Abarca et al.  and Zimmermann et al. .
There was little molecular variation within the here generated sequence data – only up to 0.5% in 18S V4 (representing 2 bp) and 0.3% in rbcL (corresponding to 3 bp) – between the different strains representing one taxon (Appendix S2). The highest in-taxon variation was found in e.g. Mayamaea terrestris 0.53% (18S V4), respectively Navicula cryptocephala e.g. 0.33% (rbcL). The uncorrected p distances for all genera and sequences are given in Appendix S2.
The results from sequence comparison with sequences published in the databases of the International Nucleotide Sequence Database Collaboration (INSDC, includes GenBank, EMBL and DDBJ) are shown in Table 3 and summarised in Fig. 5a, 5b. In the case of 18S V4, 22% of our taxa had entries with identical sequences in the INSDC whereas for rbcL this number was 21% (Fig. 5b). This was the case e.g. for Caloneis silicula and Navicula cryptotenella (Table 3). 22% (18S V4, Fig. 5a) respectively 25% (rbcL, Fig. 5b) of our taxa had no entry in the INSDC databases, e.g. Amphora ovalis and Luticola sparsipunctata (Table 3). For 15% of our taxa an identical 18S V4 sequence (Fig. 5a) with a different taxon name was found in the INSDC databases (e.g. Gomphonema parvulum); the number was considerably lower in rbcL with only 4% (Fig. 5b). The remaining taxa of which many showed sequence dissimilarities of over 15 bp were 41% for 18S V4 (Fig. 5a) and 50% for rbcL (Fig. 5b). The highest difference was found for Pinnularia viridiformis with 97 bp in 18S V4 (Table 3).
Inferred from data in Table 3.
The tree derived from the concatenated data set and calculated by the Neighbour Joining (NJ) algorithm, including only the here presented strains, is shown in Fig. 6; the trees of the individual analysis of both markers are given in the Appendix S2. The molecular clades are congruent between 18S V4 and rbcL, the tree topology is partly differing between both markers (Appendix S3, S4); however, the conflicting nodes have bootstrap values below 0.85 and are therefore neglected.
All bootstrap support values given above branches.
In the tree derived from the combined dataset, the sampled genera are monophyletic and well supported (>0.98 bootstrap support BS, Fig. 6), except for Caloneis, Craticula and Sellaphora.
Craticula buderi falls into a clade with the genera Stauroneis and Karayevia (0.48 BS; Fig. 6). Sellaphora falls into one group with Eolimna (0.98 BS; Fig. 6). The genus Caloneis is found in two distinct clades: Caloneis silicula is clustering with Pinnularia (0.61 BS; Fig. 6), Caloneis amphisbaena forms an independent clade on its own (1.00 BS; Fig. 6). The deeper bifurcations representing the relationship between the genera are generally not well supported by bootstrap values. All 37 subgeneric taxa included in this study are monophyletic (Fig. 6).
The trees for the genus Amphora including all available data from INSDC databases (this includes also accessions from the genus Halamphora) are shown in Fig. 7a (18S V4) and Fig. 7b (rbcL). The Amphora ovalis strains (Amph1, Amph4, Amph5, D45_003 and TeAm01) form a monophyletic clade, that is well supported in both 18S (0.99 BS) and rbcL (0.97 BS). The strain HSB02, identified as Amphora berolinensis appears to be rather isolated within the Amphora tree, except for an affiliation with the unidentified strain C10 (INSDC accession number FJ002132) in the rbcL tree (0.89 BS; Fig. 7b). All strains identified as Amphora pediculus cluster in one clade in 18S V4 (0.90 BS; Fig. 7a) and rbcL (Fig. 7b). This includes also the strain D54_002 named Amphora sp. aff. atomoides. The tree derived from rbcL sequences also includes the strain AT-21.206 (INSDC accession number AN502022) identified as Amphora cf. fogediana (Fig. 7b), which forms a branch with strain s0992 named Amphora copulata (INSDC accession number AB754831) in 18S V4 adjacent to the Amphora pediculus clade (Fig. 7a). In respect to the other strains available from the INSDC databases there is no topology consistent with the taxonomic identifications found in the trees (Fig. 7a, 7b). Several taxa, including the species Amphora coffeaeformis, Amphora normannii and Amphora montana were recently transferred to the genus Halamphora ; these taxa and also the two INSDC accessions listed as Halamphora in the (numbers AB754832, AB754833;) are forming a loose cluster in the upper part of the 18S V4 tree (Fig. 7a). The rbcL data set supports an independent clade for the taxa of the genus Halamphora (Amphora coffeaeformis, Amphora normannii, Amphora montana; 0.97 BS; Fig. 7b). However, within the Halamphora clade the strains identified as Amphora coffaeaformis are not monophyletic (Fig. 7b).
Bootstrap support values >0.75 given at nodes. Red indicates data from new species, green information and conlusions derived from data in the AlgaTerra Information System .
The trees for the genus Mayamaea including all available data from INSDC databases are given in Fig. 7c (18S V4) and Fig. 7d (rbcL). All strains identified as Mayamaea terrestris are forming an independent clade in both trees (0.84 BS in 18S V4, 1.00 BS in rbcL; Fig. 7c, 7d). The strains D06_106 and D06_107 representing Mayamaea permitis (Syn.: Mayamaea atomus var. permitis) cluster together in one clade (in rbcL 1.00 BS), however other strains named either Mayamaea atomus, Mayamaea permitis or Mayamaea atomus var. permitis show no clear pattern according to their names provided in the INSDC databases (Fig. 7c, 7d).
The trees for the genus Planothidium including all available data from INSDC databases are given in Fig. 8a (18S V4) and Fig. 8b (rbcL). 18S V4 supports three independent clades for the three including Planothidium taxa; namely Planothidium caputium, Planothidium frequentissimum and Planothidium lanceolatum (each taxon supported by 1.00 BS; Fig. 8a). Strain LCR-S18-1-1 (INSDC accession number JQ610164), listed in the INSDC databases as Planothidium sp., sits on another branch (Fig. 8a). The topology derived from rbcL sequences gives one clade (0.97 BS) for Planothidium caputium and strain LCR-S18-1-1 (INSDC accession number JQ610172) plus a second clade for a monophyletic group Planothidium frequentissimum (0.92 BS; Fig. 8b). The strains identified as Planothidium lanceolatum do not form an independent clade in the rbcL analysis (Fig. 8b).
Bootstrap support values >0.75 given at nodes. Red indicates data from new species.
The trees for the genus Stauroneis including all available data from INSDC databases are given in Fig. 8c (18S V4) and Fig. 8d (rbcL). The Stauroneis schmidiae strains (D28_002, D28_008) cluster in one clade, which is sister to the strain UTEX FD 51 (INSDC accession numbers HQ912579 (18S V4) and HQ912443 (rbcL)) in both analyses (1.00 BS; Fig. 8c, 8d). The strain Stau1 is identified as Stauroneis phoenicenteron and forms a monophyletic clade (1.00 BS for both markers) with all the other accessions with this name available from the INSDC databases (AT-18.207 (INSDC accession numbers AM502031 (18S V4) and AM710498 (rbcL)) and AT-11.704 (INSDC accession numbers AM501987 (18S V4) and AM710453 (rbcL))). The other taxa available from the INSDC databases also cluster taxonomically consistent, however there are difference in the overall topology recovered from 18S V4 respectively rbcL sequences (Fig. 8c, 8d).
Nomenclatural and taxonomical consequences
Two new taxa were first discovered by morphological means namely Amphora berolinensis and Stauroneis schmidiae. The analysis of molecular data suggested the existence of two more previously undetected taxa that could later be also morphologically confirmed (Mayamaea terrestris, Planothidium caputium). For yet another two taxa morphological data is incomplete (teratological outline, micro-morphological data missing) but the molecular data show that they both are different from an identified taxon in this genus; these strains are named sp. (Amphora sp. aff. atomoides); in one case we used the term cf. (Amphora cf. pediculus) to show that it is closely related to a known taxon.
Amphora cf. pediculus
The strains D03_063 & D03_082 are morphologically very similar to our Amphora pediculus D03_074 but have double areolae in each ventral stria and not only a single elongated areola like A. pediculus. The specimens of these strains have a similar valve outline as A. indistincta, but in SEM the differences are more distinct because in A. indistincta the width of the central and dorsal side is almost equal and the striae are composed of elongated areolae.
Amphora sp. aff. atomoides Levkov
The strain D54_002 has a valve semi elliptical with arched dorsal margin, concave ventral margin and narrowly rounded valve ends. Valve length is 10–12.4 µm, breadth 4.6–5 µm. The central area on dorsal side is a rectangular fascia almost extending to the dorsal margin; on the ventral side the much broader fascia is expanding towards the valve margin. Raphe branches linear, filiform. Proximal raphe endings straight, distal raphe endings ventrally deflected. Dorsal striae radiate throughout, 16 in 10 µm.
This species closely resembles A. atomoides but differences can be observed in the shape of the central area and valve breadth (7–11 µm in A. atomoides). In A. atomoides the central area on the dorsal side is small or absent not extending to the valve margin, contrary to our Amphora sp. aff. atomoides where the central area presents a rectangular fascia almost extending to the dorsal margin. D54_002 also resembles A. pediculus with respect to its valve shape and size. However D54_002 can be differentiated by the valve width (A. pediculus is narrower with 2.5–4 µm) the central area (A pediculus has a distal raphe dorsally deflected and a central area with a rectangular facia, extended to the dorsal valve margin) and the stria density (A. pediculus has more striae 18–24/10 µm). D54_002 can also be differentiated from A. minutissima by the shape of valve apices (ventrally bent in A. minutissima). Additional observations of more specimens by SEM would be necessary to establish the proper identity of this population from Heiligensee, Berlin.
Four taxa in the genera Amphora, Mayamaea, Planothidium, and Stauroneis do not fall within the description of any previously known taxa and are therefore described here as new.
Amphora berolinensis N.Abarca & R.Jahn (Figs. 4.1a–i)
Holotype: B 40 0040871 from strain HSBO2; the holotype is represented by Fig. 4.1d.
Type locality: Germany, Berlin, Heiligensee, N 52.60394°E 13.21499° leg. and isolated by O. Skibbe, August 2011.
Amphora berolinensis differs from A. copulata (Kützing) Schoeman & Archibald because the latter has bigger valves (19–42 µm length, 5–7.5 µm breadth). In SEM the differences are more distinct. Differences can be observed in the shape of the central area (bordered by striae close to the valve margin in A. copulata), the raphe (biarcuate in A. copulata) and the morphology of the dorsal striae (crossed by longitudinal bars in A. copulata). A. berolinense also differs from A. neglectiformis Levkov & Edlund by the larger valves of the later (18–53 µm length, 5–7 µm breadth) and the ventral striae which are composed of two areolae in A. neglictiformis near the valve ends.
The valves of Amphora berolinensis are semi-lanceolate to semi-elliptical, with smoothly arched dorsal margin and straight to slightly concave ventral margin, valve ends rounded. Valve length is 9.5–18.9 µm, breadth 4.7–5.2 µm. Axial area is narrow, slightly arched. The central area on the dorsal side has a rectangular fascia extending to the dorsal margin; on the ventral side the fascia is wider expanding towards the valve margin. Raphe is filiform and more or less straight, in some valves the proximal raphe endings are straight, in others they are dorsally bent and the distal raphe endings are straight and in some valves they are ventrally bent. Dorsal striae are coarsely punctated and radiate throughout, 12–14 in 10 µm. Ventrally striae are radiate, composed of one areola.
Amphora copulata (Kütz.) Schoeman & R.E.M. Archibald (concept syn. Amphora libyca Ehrenberg, sensu post auct.) is morphologically the closest fit to Amphora berolinensis, the latter forms a distinctly different clade according to both 18S V4 and rbcL (Fig. 7a, 7b). The sequence difference to strain AT-117.10 belonging to Amphora libyca sensu post auct. is e.g. 5 bp for 18S V4 and 35 bp for rbcL.
Mayamaea terrestris N.Abarca & R.Jahn sp. nov. (Fig. 4.2a–g)
Holotype: B 40 0040847 from strain D29_009b; the holotype is represented by Fig. 4.2e.
Type locality: Germany, Berlin-Dahlem, agricultural soil, N 52.460833°E 13.296944°, leg. L. Buhr, 21 April 2004, cultures isolated by J. Bansemer.
Mayamaea terrestris differs from Mayamaea atomus var. atomus  because the latter is longer and wider and has less striae (8.5–13 µm length, 4–5.5 µm breadth, 19–22/10 µm striae). Also the molecular data differ from Mayamaea atomus var. atomus entries in the INSDC databases in 5 bp for 18S V4 and 25 bp for rbcL of strain AT-115Gel07 and even in 69 bp for rbcL of strain AT-199Gel01 (AM710510) .
The valves of Mayamaea terrestris are narrow linear-elipical, ends obtusely rounded. Valve length is 7–8.7 µm, breadth 3–4.5 µm. Striae are radiate throughout, 22–24 (–26) in 10 µm with c. 50 areolae in 10 µm. Raphe is filiform, the two branches are gently arcuate with distinct central pores. Axial area is slightly broad, widening lanceolately towards the middle of the valve. Central raphe ends expanded by depressions around the central pores and deflected, while the ends of the terminal raphe fissures are deflected to the opposite side.
This new species lives in soil; this is signified by the epithet name.
10 further strains (D27_003 & D27_006 & D27_009 & D28_001 & D28_004b & D28_007b & D29_003b & D30_003 & D30_006b & D30_009) have only low sequence differences for 18S V4 and rbcL (Appendix S2) and form a clade clearly different from all the other available Mayamaea strains (Fig. 7c, 7d).
Planothidium caputium J.Zimmermann & R.Jahn sp. nov. (Fig. 4.3a–h)
Holotype: B 40 0040871; strain D06_014; the holotype is represented by Fig. 4.3f.
Type locality: Germany, Berlin, small river Wuhle, N 52.52079°E 13.57781°, leg. O. Skibbe, 21 April 2004, cultures isolated by J. Bansemer.
Morphologically, Planothidium caputium has a similar outline as Planothidium lanceolatum but differs from it by a hood over the depression on the rapheless valve as in P. frequentissimum. The difference to P. frequentissimum lies in the form and size of the hood; which is bigger, longer and wider in P. caputium than in P. frequentissimum and the hood has a wider opening; this results in a line-like instead of a horse shoe appearance when focusing through the hood. The uncorrected p-distances show that Planothidium caputium sequences differ at least 2.4% (18S V4) respectively 2% (rbcL) from Planothidium frequentissimum, and 6% (18S V4) respectively 4% (rbcL) from Planothidium lanceolatum (Appendix S2), this is also represented in the trees including all available Planothidium strains (Fig. 8a, 8b).
Valves are elliptical to elliptic-lanceolate, with rounded apices. Valve length is 20–22.9 µm, breadth 5.5–6.4 µm. The striae are radiate on both valves, becoming more radiate towards the apices, with 13–14 in 10 µm. Striae are multiseriate with three to five rows of areolae per stria. The axial area is narrow and linear to lanceolate in both valves. A weak central area on the raphe valve and a horseshoe-shaped collar on one side of the rapheless valve which by focusing in LM another line less arched can be recognized (see also Straub 1990 ).
Also strain D06_113 belongs to this species.
Stauroneis schmidiae R.Jahn & N.Abarca sp. nov. (Fig. 4.4a–h)
Holotype: B 40 0040883 from strain D28_008; the holotype is represented by Fig. 4.4f.
Type locality: Germany, Berlin-Dahlem, agricultural soil, N 52.460833°E 13.296944°, leg. L. Buhr, 21 April 2004, cultures isolated by J. Bansemer.
Morphologically, Stauroneis schmidiae differs from Stauroneis borrichii (Petersen) Lund, which has a similar valve outline but with protracted ends, because the latter is shorter and more slender and has more striae (18–25 µm length, 4.0–5.0 breadth, 20–22 striae and 25–28 punctae per 10 µm (see Van de Vijver et al 2004) and from Stauroneis pseudomuriella Van de Vijver & Lange-Bert. (2004)  which has similar morphometrics as our new species but more striae (21–42 µm length, 5–6.5 µm breadth, 20–22 striae and 25 punctae per 10 µm) but this species has no pseudosepta.
Valves are linear-lanceolate with very slightly rounded non-protracted ends. Valve length is 27–28.2 µm, breadth 5.5–6 µm. Striae are radiate throughout the entire valve, 15–18 in 10 µm. Puncta of the striae are discernible in LM and are 24–28 in 10 µm. Pseudosepta present.
Also strain D28_002 belongs to this species.
Compared to the other available Stauroneis strains Stauroneis schmidiae clusters independently for both markers (Fig. 8c, 8d).
This species is named in honor of Prof. Dr. AnnaMaria Schmid who was an inspiring diatom teacher to Regine Jahn.
The 37 naviculoid diatom taxa, of which reference barcodes are published here, represent only about 7% of the total diatom flora which is 14% of the naviculoid taxa recorded for Berlin waters (539 taxa, see ). Nevertheless, it is a first milestone in characterising diatoms not only by morphological but also by molecular means, which represents the start of a taxonomic reference library for diatoms.
Identification via DNA sequences is an important tool, especially in microorganisms. Many of the large scale environmental DNA barcoding studies in protists so far rely on higher taxonomic levels of families and above; only rarely they reach a resolution at genus level. In diatoms, assignment to genus level is unproblematic , . Even identification to the species level is possible, but strongly depends on the quality of the reference database –. We here tested the taxonomic consistency of naviculoid diatom taxa at the species level by comparing our identified sequences with the published sequences in the repositories of the INSDC. We found that the taxonomic assignment in INSDC is currently unsatisfying, because it is often erroneous. In the data of the two commonly used DNA barcoding markers for diatoms 18S V4 and rbcL we analysed, we found that for rbcL 26% for the sequences listed under the same name as our strains more than 15 bp sequence difference were recorded (Fig. 5b); for 18S V4 this was 12% (Fig. 5a). For the 800 bp long rbcL fragment 15 bp difference amounts to roughly 2% sequence difference, in the shorter (400 bp) 18S V4 fragment 15 bp difference correlates to even 4%. The relatively high percentage of differences in these short DNA fragments suggests that the sequences belong to a different taxon. This implies morphology-related misidentification, mislabelling or cross-contamination. There are an additional 16% (rbcL, Fig. 5b) respectively 5% (18S V4, Fig. 5a) of the sequences where sequences with the same taxon name showed differences between 6 and 15 bp, here it is unclear whether these strains belong to a different taxon of a closely related cryptic species or whether they reflect natural intraspecific variation. Furthermore, we found that in 4% (rbcL, Fig. 5b) respectively 15% (18S V4, Fig. 5a) of the cases, identical sequences in the repositories of the INSDC were annotated with a different taxon name than the strains of this study. These sequences therefore provide an erroneous identification. In summary, the unevaluated use of information deposited in the INSDC leads to wrong identifications in at least 30% of the cases; in only about 20% of our cases, the identifications coincided unambiguously.
Unfortunately, in most cases it is not possible to trace the DNA sequence to the specimen from which it originated and, because of lacking voucher specimens, taxonomic evaluation is not possible; hence there are no means to verify whether a faulty taxon assignment had occurred or an interesting biological phenomenon. Therefore such sequences are of no future use and valuable information is lost to science. Assessment of diatom community composition through environmental DNA barcoding could greatly benefit from better documented reference libraries, especially because biodiversity in general should be evaluated at least on the species level .
Furthermore, the linkage between historically and morphologically described taxa and molecular sequences is not very strong. A possible threat is that two independent data clouds might develop : one including large amounts of molecular data from environmental sequencing, the other species specific data (e.g. paleontological and recent distribution, ecology, phylogeny) linked to morphological descriptions. For organism groups where next to no morphology based data exist (e.g. many groups of bacteria), there is little harm if the information in the two clouds cannot be correlated. However, in groups like diatoms, where two centuries of data collection linked to morphologically described species exists, it would be a waste of painfully acquired data not to link these two groups of data. At the moment, this link would be a reference sequence that is connected to a morphological voucher (and DNA sample) deposited in a natural history collection and therefore available for multiple testing and verification of results as well as for long-term studies.
We here define a taxonomic reference library as an entity combining molecular data – in our case DNA sequence data of two markers – with morphological documentation of important features as well as a valid name. Also environmental information on the collecting site should be provided in a standardised format.
Documentation should also include the deposition of DNA in a curated repository. To ensure traceability of a name/sequence back to the specimen it originated from, morphological details important for identification should be provided in an online photographic documentation, this includes high-resolution photographs giving an overview of the cell as well as details produced by electron microscopy or comparable techniques. Another special aspect for diatoms (and some other microorganism groups) is that many sequences derive from cultured clonal strains, especially if they are linked to morphological entities. Therefore, the strain number and other strain specifications are valuable information that should be presented along with the sequence.
Ideally, all the necessary information for traceable taxonomic classification should be available in a single data portal; however, at the moment there are several technological limitations to deposit and/or respectively retrieve all the information in and from one location. The Consortium for the Barcode of Life (CBOL) aims at compiling DNA barcode records in a public library (Barcoding of Life Database BOLD)  and even designed a Barcode Submission Tool for submitting sequences to the INSDC databases. However, this tool is limited to one marker, namely the mitochondrial cytochrome oxidase subunit I (COI) e.g. , –. For many groups, e.g. plants  but also diatoms, this barcoding marker is not routinely applicable , –, albeit there are BOLD supported activities to implement alternative solutions for some organism groups e.g. . On the other hand, the Barcode Submission Tool provides possibilities to at least upload a pherogram (output of sanger sequencing), but no pictures of the organisms can be stored. Therefore, this tool does not require a link to a morphological voucher (digital and physical), which would allow for subsequent taxonomic validation. Also a link to a herbarium specimen is only indirectly possible if the accession number of the specimen collection is given and the respective collection has their specimen picture online available. Although, it seems generally possible to deposit pictures and other data along with the DNA sequence in BOLD , unfortunately, the data deposited within BOLD is often not open access, depending on the rights given by the administrator. Also, we heard reports that data is not released to the public even if requested by the author. In conclusion it would be preferable if INSDC would extend their service, as they are the most commonly used platform to deposit sequence data .
Here we present our strategy on how documentation can be performed to build a comprehensive reference database for diatoms even with inconvenient IT possibilities. The here presented materials and data have been documented as follows: The physical vouchers (microscopic slides and SEM stubs) have been deposited in the Berlin Herbarium (B), the DNA in the DNA bank network of the Botanic Garden and Botanical Museum Berlin-Dahlem . The data for both items are made available through The Global Genome Biodiversity Network (GGBN ) and The Global Biodiversity Information Facility (GBIF ). The sequences have been submitted to an INSDC database (EMBL) along with strain numbers, voucher number from the Berlin Herbarium (B) and DNA bank number. Also primer details and geo-references have been deposited there. Photographic documentation is online available from the AlgaTerra Information System , linked through INSDC accession number and accession number from the Berlin Herbarium. Morphological characters, cultivation details as well as sampling data of the collecting sites beyond the geo-references (e.g. ecological specifications) have also been deposited in the AlgaTerra Information System .
A carefully documented reference sequence could be considered as something similar to a molecular type of the name of a species. Biological taxon types should be documented with a maximum amount of data, which makes it possible for every researcher to determine whether a specific specimen belongs to the concept of the designated type. In the botanical  and zoological  codes of nomenclature the basis for species description is the deposition of physical specimen. A reference sequence or reference barcode should be similarly well documented.
Biological identification systems are in constant development, therefore a continuous process of confirmation, validation and updating in relation to alpha taxonomy is required to build a compressive and accurate reference library. Protocols for data curation and revision are indispensable for new species discovery as well as taxonomic revisions. Therefore, entries in a taxonomic reference library (e.g. in an extended INSDC like system) need to be curated and updated in order to be in line with current taxonomy. However, a huge impediment for data curation by the respective author – once it is submitted – is, that there is no reward system for researchers for curating their data . It has been shown, that incentives for researchers for the publication of thoroughly documented datasets similar to the publication of the conclusions drawn from these could greatly increase the motivation to publish datasets . Another approach would be that data curation would be carried out by professional personnel employed for this purpose or a combination of both approaches.
Not only DNA barcoding approaches would benefit from well documented and referenced molecular data but also taxonomic and phylogenetic studies of diatoms which could integrate published data more efficiently if better documentation linked to physical objects were available . For example, the clusters found for the genus Mayamaea, based on available 18S V4 and rbcL sequences, show low taxonomic consistency (Fig. 7c, 7d). The INSDC data suggest that there are different groups of Mayamaea (atomus var.) permitis, and within the Mayamaea atomus (var. atomus) sequences is one sequence named Mayamaea fossalis var. fossalis (Fig. 7c, 7d, black and red). For two of the AT strains included in the Mayamaea analysis additional data is available from the AlgaTerra Information System  (Fig. 7c, 7d, green): (a) more taxonomic detail is given than deposited alongside the sequence in INSDC - strain AT-115Gel07 is identified as Mayamaea atomus var. atomus and AT-101Gel04 as Mayamaea atomus var. permitis - and (b) photographs with morphological details are provided. Therefore the identification of both strains could be checked and verified. Even though additional data for only two strains is available from the AlgaTerra Information System , this already aids in the interpretation of the trees given in Fig. 7c and 7d; especially for the tree based on 18S V4. There is a cluster of Mayamaea permitis (Syn. Mayamaea atomus var. permitis), incl. strain AT-101Gel04, and one strain (AT-115Gel07) belonging to Mayamaea atomus var. atomus (Fig. 7c, green). As Mayamaea permitis (Syn. Mayamaea atomus var. permitis) has been raised to species rank due to morphological reasons (see above), this allows the interpretation that Mayamaea fossalis could be an independent taxon (Fig. 7c, green). For the tree based on rbcL, however, only an informed guess can be made: for two strains, namely (Wes2)f and AT-199Gel01, no additional data is available to check the identification (Fig. 7d). If it could be assumed that (Wes2)f was misidentified and AT-199Gel01 belongs to Mayamaea permitis, again four independent taxa could be assumed: Mayamaea atomus, Mayamaea fossalis, Mayamaea permitis and Mayamaea terrestris. This example, particularly the different interpretation possibilities between 18S V4 and rbcL trees, clearly shows how valuable additional data can be for the interpretation of sequence based analyses.
Due to the fact that species descriptions in diatoms are based on morphology derived from microscopic pictures (of variable quality) of single, or a limited number, of valves from a presumed population in mixed samples, it is often difficult to unambiguously identify a strain. Even within a single clonal culture, morphological variation sometimes fits in parts to different species circumscriptions . In addition, size wise clonal cultures are often at the lower end of the morphometrics of a taxon description; if cultured for too long and if no auxosporulation has taken place, diatom valves tend to lose their typical morphological features because they get smaller with each cell division. This leads to the problem how to link sequences derived from cultures to a type specimen or at least to a current species concept. If a type specimen is designated, this can be achieved e.g. through epitypification as has been done for Cocconeis pediculus and C. placentula , . But in most cases, this will be done in the context of a taxonomic revision of a species group as e.g. for Gomphonema saprophilum  and needs to be done for the two unidentified Pinnularia species of this study. For the purpose of a reference library, if no unambiguous identification seems possible, the sequence could either be designated as belonging to a certain “formenkreis” (taxon group) marked as affine (e.g. Amphora sp. aff. atomoides), as not exactly fitting the original descriptions marked as confer (e.g. Amphora cf. pediculus) [http://bionomenclature-glossary.gbif.org/], or a new taxon has to be described formally along with providing the reference sequence (e.g. Amphora berolinense). The first two options are a practical way to make re-users of the data aware of an “uncertainty level” concerning the taxonomic identification; this is better than providing no guidance to the species group by giving just the genus name such as Amphora sp. As we documented in this study, the marine or halophilic species of Amphora sensu lato have been recently moved into the genus Halamphora; for a freshwater reference library, this is important ecological data. In addition, this information might become valuable for the interpretation of taxonomic discrepancies
As here shown exemplarily for some naviculoid diatoms, taxonomic reference libraries could serve as an online accessible and algorithmically searchable equivalent to commonly used printed identification literature. They are needed to link molecular based identification technologies with correct organism references. However, up to now searchable data bases often include large percentages of wrongly annotated sequencesand provide no possibility to trace the identification back to the respective specimen, leaving molecular based techniques often with identifications only to family or genus level. While for some studies this level of taxonomic depth seems to suffice (e.g. large scale biodiversity assessments), there are many studies that could profit from well documented molecular data (e.g. species inventories, monitoring, taxonomy, phylogeny). Therefore, it would be worth the effort to provide all material needed for identification of an organism.
Table of INSDC accessions with strain numbers and references.
Table with uncorrected p distances given for individual genera. Intra-taxon variability is highlighted.
Neighbour Joining Trees (10 000 bootstrap replicates) derived from individual datasets 18S V4 including all sequences from this study. All bootstrap support values given above branches.
Neighbour Joining Trees (10 000 bootstrap replicates) derived from individual datasets rbcL including all sequences from this study. All bootstrap support values given above branches.
The help of Jana Bansemer and Ilona Danßmann with cultivation and retrieval of molecular data and of Monika Lüchow at the SEM is gratefully acknowledged. We thank Gabriele Dröge for providing counsel concerning the DNA Bank Network and Birgit Gemeinholzer for fruitful discussions.
Conceived and designed the experiments: JZ NE NA RJ WHK OS. Performed the experiments: JZ NA NE. Analyzed the data: JZ NE. Wrote the paper: JZ NA NE RJ. Conceived the idea: RJ. Cultivation: OS. Composition of the manuscript: NE JZ. Taxonomical and morphological analyses: NA RJ. Data curation: WHK RJ.
- 1. Falkowski PG, Barber RT, Smetacek V (1998) Biogeochemical Controls and Feedbacks on Ocean Primary Production. Science 281: 200–206. doi: 10.1126/science.281.5374.200
- 2. Field CB, Behrenfeld MJ, Randerson JT, Falkowski P (1998) Primary Production of the Biosphere: Integrating Terrestrial and Oceanic Components. Science 281: 237–240. doi: 10.1126/science.281.5374.237
- 3. Smetacek V (1999) Diatoms and the Ocean Carbon Cycle. Protist 150: 25–32. doi: 10.1016/s1434-4610(99)70006-4
- 4. Mann DG (1999) The species concept in diatoms. Phycologia 38: 437–495. doi: 10.2216/i0031-8884-38-6-437.1
- 5. Poulíčková A, Špačková J, Kelly M, Duchoslav M, Mann D (2008) Ecological variation within Sellaphora species complexes (Bacillariophyceae): specialists or generalists? Hydrobiologia 614: 373–386. doi: 10.1007/s10750-008-9521-y
- 6. Vanelslander B, Créach V, Vanormelingen P, Ernst A, Chepurnov VA, et al. (2009) Ecological differantiation between sympatric pseudocryptic species in the estuarine benthic diatom Navicula phyllepta (Bacillariophyceae). Journal of Phycology 45: 1278–1289. doi: 10.1111/j.1529-8817.2009.00762.x
- 7. Jahn R, Zetzsche H, Reinhardt R, Gemeinholzer B (2007) Diatoms and DNA barcoding: A pilot study on an environmental sample. In: Kusber W, Jahn R, (ed.) Proceedings of the 1st Central European Diatom Meeting 2007 Berlin: Botanic Garden and Botanical Museum Berlin-Dahlem. 63–68.
- 8. Schaumburg J, Schranz C, Hofmann G, Stelzer D, Schneider S, et al. (2004) Macrophytes and phytobenthos as indicators of ecological status in German lakes — a contribution to the implementation of the water framework directive. Limnologica - Ecology and Management of Inland Waters 34: 302–314. doi: 10.1016/s0075-9511(04)80003-3
- 9. Schaumburg J, Schranz C, Stelzer D, Hofmann G, Gutowski A, et al. (2005) Bundesweiter Test: Bewertungsverfahren “Makrophyten & Phytobenthos” in Fließgewässern zur Umsetzung der WRRL. Bayerisches Landesamt für Umwelt, Endbericht im Auftrag der LAWA (Projekt Nr O204).
- 10. Watanabe T, Asai K, Houki A (1988) Numerical water quality monitoring of organic pollution using diatom assemblages. In: Round, FE (ed.), Proceedings of the 9th Diatom Symposium, Biopress Ltd,: 123–141.
- 11. Kelly MG, Cazaubon A, Coring E, Dell'Uomo A, Ector L, et al. (1998) Recommendations for the routine sampling of diatoms for water quality assessments in Europe. Journal of Applied Phycology 10: 215–224.
- 12. European-Committee-for-Standardization (2003) EuropeanStandard. EN 14407. Water Quality – Guidance Standardfor the Identification, Enumeration and Interpretation of Benthic Diatom Samples from Running Waters. CEN.
- 13. Kusber WH (2001) Mikroalgen und Naturschutz - Rote Listen, Bewertungsinstrumentarium und Auswertungsansätze. Ökologie & Umweltsiherung 21: 197–228.
- 14. Mann DG, Droop SJM (1996) 3. Biodiversity, biogeography and conservation of diatoms. Hydrobiologia 336: 19–32. doi: 10.1007/bf00010816
- 15. Mann D, Sato S, Trobajo R, Vanormelingen P, Souffreau C (2010) DNA barcoding for species identification and discovery in diatoms. Cryptogamie Algologie 31: 557–577.
- 16. Rach J, DeSalle R, Sarkar IN, Schierwater B, Hadrys H (2008) Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proceedings of the Royal Society B: Biological Sciences 275: 237–247. doi: 10.1098/rspb.2007.1290
- 17. Blaxter ML (2004) The promise of a DNA taxonomy. Philosophical Transactions of the Royal Society of London Series B: Biological Sciences 359: 669–679. doi: 10.1098/rstb.2003.1447
- 18. Hebert PDN, Gregory TR (2005) The Promise of DNA Barcoding for Taxonomy. Systematic Biology 54: 852–859. doi: 10.1080/10635150500354886
- 19. Zimmermann J, Jahn R, Gemeinholzer B (2011) Barcoding diatoms: evaluation of the V4 subregion on the 18S rRNA gene, including new primers and protocols. Organisms Diversity & Evolution 11: 173–192. doi: 10.1007/s13127-011-0050-6
- 20. Hamsher SE, Evans KM, Mann DG, Poulíčková A, Saunders GW (2011) Barcoding Diatoms: Exploring Alternatives to COI-5P. Protist 162: 405–422. doi: 10.1016/j.protis.2010.09.005
- 21. MacGillivary ML, Kaczmarska I (2011) Survey of the Efficacy of a Short Fragment of the rbcL Gene as a Supplemental DNA Barcode for Diatoms. Journal of Eukaryotic Microbiology 58: 529–536. doi: 10.1111/j.1550-7408.2011.00585.x
- 22. Moniz MBJ, Kaczmarska I (2009) Barcoding diatoms: Is there a good marker? Molecular Ecology Resources 9: 65–74. doi: 10.1111/j.1755-0998.2009.02633.x
- 23. Moniz MBJ, Kaczmarska I (2010) Barcoding of Diatoms: Nuclear Encoded ITS Revisited. Protist 161: 7–34. doi: 10.1016/j.protis.2009.07.001
- 24. Evans KM, Wortley AH, Mann DG (2007) An assessment of potential diatom “barcode” genes (cox1, rbcL, 18S and ITS rDNA) and their effectiveness in determining relationships in Sellaphora (Bacillariophyta). Protist 158: 349–364. doi: 10.1016/j.protis.2007.04.001
- 25. Evans KM, Wortley AH, Simpson GE, Chepurnov VA, Mann DG (2008) A molecular systematic approach to explore diversity within the Sellaphora pupula species complex (Bacillariophyta). Journal of Phycology 44: 215–231. doi: 10.1111/j.1529-8817.2007.00454.x
- 26. Eiler A, Drakare S, Bertilsson S, Pernthaler J, Peura S, et al. (2013) Unveiling Distribution Patterns of Freshwater Phytoplankton by a Next Generation Sequencing Based Approach. PLoS ONE 8: e53516. doi: 10.1371/journal.pone.0053516
- 27. Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Molecular Ecology 21: 1794–1805. doi: 10.1111/j.1365-294x.2012.05538.x
- 28. Pawlowski J, Audic S, Adl S, Bass D, Belbahri L, et al. (2012) CBOL Protist Working Group: Barcoding Eukaryotic Richness beyond the Animal, Plant, and Fungal Kingdoms. PLoS Biol 10: e1001419. doi: 10.1371/journal.pbio.1001419
- 29. Kermarrec L, Franc A, Rimet F, Chaumeil P, Frigerio JM, et al. A next-generation sequencing approach to river biomonitoring using benthic diatoms. Freshwater Science 33 (1) 349–363. doi: 10.1086/675079
- 30. Jahn R, Kusber WH (2005+) AlgaTerra Information System (online). Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin. http://www.algaterra.org.
- 31. Geissler U, Kies L (2003) Artendiversität und Veränderungen in der Algenflora zweier städtischer Ballungsgebiete Deutschlands: Berlin und Hamburg. Nova Hedwigia, Beihefte. pp. 777.
- 32. Hofmann G, Werum M, Lange-Bertalot H (2011) Diatomeen im Süßwasser - Benthos von Mitteleuropa. Bestimmungsflora Kieselalgen für die ökologische Praxis. Über 700 der häufigsten Arten und ihre Ökologie. Lange-Bertalot H (ed.), Gantner, Ruggell, Liechtenstein, Gantner.
- 33. Guillard RRL, Lorenzen CJ (1972) Yellow-green algae with chlorophyllide C. Journal of Phycology 8: 10–14. doi: 10.1111/j.0022-3646.1972.00010.x
- 34. Bold H, Wynne M (1978) Cultivation of algae in the laboratory. Introduction to the algae: structure and reproduction. Englewood Cliffs, N. J.: Prentice-Hall INC. pp. 571–578.
- 35. Krammer K, Lange-Bertalot H (1997) Bacillariophyceae. 1. Teil: Naviculaceae; Ettl H, Gerloff J, Heynig H, Mollenhauer D, editors. Jena, Germany: Fischer.
- 36. Ettl H, Gärtner G (2013) Syllabus der Boden-, Luft- und Flechtenalgen. 2nd ed. Springer: 773.
- 37. Levkov Z (2009) Amphora sensu lato. In.: HLange-Bertalot (ed.), Diatoms of Europe: Diatoms of the European Inland Waters and Comparable Habitats. pp. 5–916.
- 38. Levkov Z, Metzeltin D, Pavlov A (2014) Luticola and Luticolopsis. In: Diatoms of Europe: Diatoms of the European Inland Waters and Comparable Habitats Lange-Bertalot H, editor. 7: 7–697.
- 39. Gemeinholzer B, Droege G, Zetzsche H, Knebelsberger T, Raupach M, et al. (2009+) DNA Bank Network Webportal.
- 40. Abarca N, Enke N, Zimmermann J, Jahn R (2014) Does the cosmopolitan diatom Gomphonema parvulum (Kützing) Kützing have a biogeography? PLOS One 9: e86885. doi: 10.1371/journal.pone.0086885
- 41. Messing J (1983)  New M13 vectors for cloning. In: Ray Wu LGKM, ed. Methods in Enzymology: Academic Press. pp. 20–78.
- 42. Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Molecular Ecology Notes 7: 544–548. doi: 10.1111/j.1471-8286.2007.01748.x
- 43. Müller JMK, Neinhuis C, Quandt D (2010) PhyDE–Phylogenetic Data Editor. Computer program
- 44. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797. doi: 10.1093/nar/gkh340
- 45. Swofford DL (2002) Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4 ed: Sinauer Associates, Sunderland, Massachusetts.
- 46. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28: 2731–2739. doi: 10.1093/molbev/msr121
- 47. Lange-Bertalot H (ed.) (2001) Navicula sensu stricto. 10 genera separated from Navicula sensu lato. Frustulia. Diatoms of Europe 2. Gantner, Ruggell.
- 48. Bruder K, Medlin L (2007) Molecular assessment of phylogenetic relationships in selected species/genera in the Naviculoid diatoms (Bacillariophyta). Nova Hedwigia 85. doi: 10.1127/0029-5035/2007/0085-0331
- 49. Straub F (1990) Compared variability of Achnanthes lanceolata (Bréb.) Grunow. 2. Biometrical approach of several races of the sub-species frequentissima Lange-Bertalot. In: Ricard M, editor. Ouvrage dedie à la memoire du Professeur Hanry Germain (1903–1989. Königstein, Germany; Champaign, Illinois, USA: Koeltz Scientific Books.
- 50. Van der Vijver B, Beyens L, Lange-Bertaltot H, editors. The genus Stauroneis in the Arctic and (Sub-)Antarctic Regions. Bibliotheca Diatomologica 51.
- 51. Kermarrec L, Franc A, Rimet F, Chaumeil P, Humbert JF, et al. (2013) Next-generation sequencing to inventory taxonomic diversity in eukaryotic communities: a test for freshwater diatoms. Molecular Ecology Resources 13: 607–619. doi: 10.1111/1755-0998.12105
- 52. Zimmermann J, Glöckner G, Jahn R, Enke N, Gemeinholzer B (subm.) Metabarcoding vs. morphological identification to assess diatom diversity in environmental studies. Molecular Ecology Resources
- 53. Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Molecular Ecology Notes 7: 355–364. doi: 10.1111/j.1471-8286.2007.01678.x
- 54. Becker S, Hanner R, Steinke D (2011) Five years of FISH-BOL: Brief status report. Mitochondrial DNA 22: 3–9. doi: 10.3109/19401736.2010.535528
- 55. Yoccoz NG (2012) The future of environmental DNA in ecology. Molecular Ecology 21: 2031–2038. doi: 10.1111/j.1365-294x.2012.05505.x
- 56. Bortolus A (2008) Error Cascades in the Biological Sciences: The Unwanted Consequences of Using Bad Taxonomy in Ecology. AMBIO: A Journal of the Human Environment 37: 114–118. doi: 10.1579/0044-7447(2008)37[114:ecitbs]2.0.co;2
- 57. Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Molecular Ecology Resources 13: 969–975. doi: 10.1111/1755-0998.12046
- 58. Kvist S (2013) Barcoding in the dark?: A critical view of the sufficiency of zoological DNA barcoding databases and a plea for broader integration of taxonomic knowledge. Molecular Phylogenetics and Evolution 69: 39–45. doi: 10.1016/j.ympev.2013.05.012
- 59. Carpenter SR, Mooney HA, Agard J, Capistrano D, DeFries RS, et al. (2009) Science for managing ecosystem services: Beyond the Millennium Ecosystem Assessment. Proceedings of the National Academy of Sciences 106: 1305–1312. doi: 10.1073/pnas.0808772106
- 60. Marakeby H, Badr E, Torkey H, Song Y, Leman S, et al. (2014) A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature. PLoS ONE 9: e89142. doi: 10.1371/journal.pone.0089142
- 61. Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B: Biological Sciences 270: 313–321. doi: 10.1098/rspb.2002.2218
- 62. Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PDN (2005) DNA barcoding Australia's fish species. Philosophical Transactions of the Royal Society B: Biological Sciences 360: 1847–1857. doi: 10.1098/rstb.2005.1716
- 63. Saunders GW (2005) Phil Trans R Soc B 360: 1879.
- 64. Robba L, Russell SJ, Barker GL, Brodie J (2006) Assessing the use of the mitochondrial cox1 marker for use in DNA barcoding of red algae (Rhodophyta). American Journal of Botany 93: 1101–1108. doi: 10.3732/ajb.93.8.1101
- 65. Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004) Identification of Birds through DNA Barcodes. PLoS Biol 2: e312. doi: 10.1371/journal.pbio.0020312
- 66. Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, et al. (2009) From the Cover: A DNA barcode for land plants. Proceedings of the National Academy of Sciences 106: 12794–12797. doi: 10.1073/pnas.0905845106
- 67. Droege G, Barker K, Astrin JJ, Bartels P, Butler C, et al. (2013) The Global Genome Biodiversity Network (GGBN) Data Portal. Nucleic Acids Research doi: 10.1093/nar/gkt928
- 68. GBIF (2014) Global Biodiversity Information Facility. Published on the Internet at http://www.gbif.org, accessed 2014 April 29.
- 69. McNeill J, Barrie FR, Buck WR, Demoulin V, Greuter W, et al. (2012) Regnum Vegetabile. 154: 1.
- 70. (1999) Nomenclature. ICoZ, Ride WDL, Nomenclature. ITfZ, Sciences. IUoB, Museum NH International code of zoological nomenclature = Code international de nomenclature zoologique. London: International Trust for Zoological Nomenclature, c/o Natural History Museum.
- 71. Enke N, Thessen A, Bach K, Bendix J, Seeger B, et al. (2012) The user's view on biodiversity data sharing — Investigating facts of acceptance and requirements to realize a sustainable use of research data —. Ecological Informatics 11: 25–33. doi: 10.1016/j.ecoinf.2012.03.004
- 72. Cocquyt C, Jüttner I, Kusber W-H (2013) Reinvestigation of West African Surirellaceae (Bacillariophyta) described by Woodhead and Tweed from Sierra Leone. Diatom Research 28: 121–129. doi: 10.1080/0269249x.2012.752411
- 73. Jahn R, Kusber W-H, Romero OE (2009) Cocconeis pediculus Ehrenberg and C. placentula Ehrenberg: Typification and taxonomy. Fottea 9: 275–288.
- 74. Romero OE, Jahn R (2013) Typification of Cocconeis lineata and Cocconeis euglypta (Bacillariophyta). Diatom Research 28: 175–184. doi: 10.1080/0269249x.2013.770801