Identification of benthic diatoms isolated from the eastern tidal flats of the Yellow Sea: Comparison between morphological and molecular approaches

Benthic diatoms isolated from tidal flats in the west coast of Korea were identified through both traditional morphological method and molecular phylogenetic method for methodological comparison. For the molecular phylogenetic analyses, we sequenced the 18S rRNA and the ribulose bisphosphate carboxylase large subunit coding gene, rbcL. Further, the comparative analysis allowed for the assessment of the suitability as a genetic marker for identification of closely related benthic diatom species and as potential barcode gene. Based on the traditional morphological identification system, the 61 isolated strains were classified into 52 previously known taxa from 13 genera. However, 17 strains could not be classified as known species by morphological analyses, suggesting a hidden diversity of benthic diatoms. The Blast search on NCBI’s Genebank indicated that the reference sequences for most of the species were absent for the benthic diatoms. Of the two genetic markers, the rbcL genes were more divergent than the 18S rRNA genes. Furthermore, a long branch attraction artefact was found in the 18S rRNA phylogeny. These results suggest that the rbcL gene is a more appropriate genetic marker for identification and classification of benthic diatoms. Considering their high diversity and simple shapes, and thus the difficulty associated with morphological classification of benthic diatoms, a molecular approach could provide a relatively easy and reliable classification system. However, this study suggests that more effort should be made to construct a reliable database containing polyphasic taxonomic data for diatom classification.


Introduction
Diatoms are the most dominant taxa among the various microalgae and are known to account for ca. 40% of the total primary production in the ocean [1,2]. Diatoms also play an important role in the biogeochemical cycles of carbon and silica [3]. In tidal flats, especially, benthic diatoms are the most dominant and diverse group and are key organisms that contribute to the preservation of the ecological functions of tidal flats such as primary production, nutrient cycling, and sediment stabilization [4][5][6][7]. Thus, the ecology and diversity of diatoms in tidal 46.73" N 126˚36' 32.64" E) along the west coast of Korea (Fig 1). The numbers of strains obtained in each region were 53 in Geunso Bay and four in Sihwa, and four in Eulwang-ri. Most samples were obtained in the Geunso Bay where regular monthly surveys had been conducted from 2009. Geunso Bay is a semi-enclosed bay with an area of 87 km 2 , and the water depth at high tide is 2-4 m depending on the area. There is no inflow river, and facies are predominantly sandy silt. The Oi tidal flat, where Sihwa station is located, has an area of 0.025 km 2 , and the facies are predominantly silty sand. Eulwang-ri is a sandy facies and there is a beach near the sampling station.
To obtain sediment samples containing diatoms, the surface of the tidal flat was scratched to a depth of ca. 2 mm and the sediment collected in a conical tube. Samples were transported to the laboratory under refrigerated conditions and then incubated at ± 2˚C of the in situ temperature. Diatom strains were isolated within 1 day of sampling. A single-diatom cell was isolated under an inverted microscope (Eclipse Ti-U; Nikon, Tokyo, Japan) using a glass Pasteur pipette and placed into a 24-well plate containing f/2 medium with silicate (Sigma-Aldrich, St. Louis, MO, USA). After confirmation of monoclonal growth, each culture was transferred to a new tissue culture flask (Falcon, Cockeysville, MD, USA) containing 35 ml of fresh medium for one week. Several cultures suspected to be a mixture were further isolated by a dilution method [29]. All strains were incubated at 15˚C under a 12:12 h light-dark cycle. Illumination was provided by a fluorescent lamp with an irradiance of ca. 100 μmol photons m -2 s -1 . The strains were transferred to fresh medium every 2 or 3 weeks. Research activities at the sampling areas of this study did not require specific permission because the areas are not restricted or ecosystem protected. Endangered and protected species do not live in the study area and thus were not included in the survey.

Morphological observations
Monoclonal cultures of benthic diatom strains were identified to the genus or species level by morphological features based on observations under light and scanning electron microscopy. For the light microscopy examination, diatom cultures were treated with acid to prepare cleaned frustules [30], and then permanent slides were made using Mountmedia (Wako Pure Chemical Industries, Osaka, Japan). The slides were examined using light microscopy under a ×100 oil immersion objective lens (Eclipse 80i; Nikon). For scanning electron microscopy examination, diatom cells fixed with Lugol's solution were filtered onto a polycarbonate filter (diameter of 25 mm; pore size of 1 or 2 μm) and then washed with distilled water. The filter papers were dehydrated in a graded ethanol series (10%, 25%, 50%, 75%, 90%, and 100%) and dried using tetramethylsilane (Sigma-Aldrich, St. Louis, MO, USA). Finally, the samples were mounted onto stubs and sputter-coated with platinum. Observations were performed with a Hitachi S-4300 scanning electron microscope (Hitachi, Tokyo, Japan). The previous studies were referred to for instructions on morphological comparisons [31][32][33][34][35][36][37][38][39][40][41]. Strains that did not match those in the published literature were treated as unidentified species.

DNA extraction, PCR and sequencing
For DNA extraction, the cultured strain (100 μl) was harvested by centrifugation at 14,000 × g for 1 min and the cell pellet was resuspended in 1 ml of sterilized STE (sodium chloride-Tris-EDTA, pH 7.8) buffer solution. Two cycles of freezing (-80˚C) and thawing (95˚C) were followed by vigorous vortexing with sterilized silica/zirconium beads to break the cells. To remove cell debris, the lysate was centrifuged at 8,000 × g for 1 min. The supernatant was dispensed into a clean tube and used as template DNA for PCR. PCR amplification was performed using two primer sets: Diatom9F (5 0 -TGTGGGAGAGGG GAAATCAAG-3 0 ) [42] and EukB-R (5 0 -TGATCCTTCTGCAGGTTCACCAC-3 0 ) [15] for 18S rDNA, and DPrbcL1 (5 0 -AAGGAGAAATHAATGTCT-3 0 ) and DPrbcL7 (5 0 -AARCAACCTTG TGTAAGTCTC-3 0 ) for the rbcL gene [43]. These primers produced PCR products of approximately 1,600 bp and 1,550 bp, respectively. PCR was performed in a total volume of 30 μl, containing 1.0 μl of template DNA, 3 μl of 10 × Ex Taq buffer, 2.4 μl of dNTPs (10 mM), 0.5 μl of each primer (10 μM), and 0.2 μl of TaKaRa Ex Taq polymerase (5 U μl −1 ; Takara, Otsu, Japan). PCR was conducted using the following conditions: PCR of 18S rRNA was conducted with initial denaturation at 94˚C for 5 min, 34 cycles of main amplification (94˚C for 45 sec, 55˚C for 55 sec, 72˚C for 2 min), and final extension at 72˚C for 10 min. PCR of rbcL was conducted with initial denaturation at 94˚C for 3 min, 35 cycles of main amplification (94˚C for 1 min, 55˚C for 1 min, 72˚C for 1.5 min), and final extension at 72˚C for 10 min. PCR products were purified using the Accuprep PCR Purification Kit (Bioneer, Daejeon, South Korea) and sent for commercial sequencing at Macrogen (Seoul, South Korea). The electrophenogram outputs for each product were edited and assembled using the ChromasPro v.1.45 program (www. technelysium.com.au/chromas.html) and Vector NTI Advance 11 (Invitrogen Corp., Carlsbad, CA, USA). The sequences obtained in this study were deposited in GenBank and the accession numbers of the sequences are shown in Table 1.

Sequence alignment and phylogenetic analyses
For phylogenetic analysis, 18S rRNA and rbcL sequences from diatoms were retrieved in Gen-Bank (www.ncbi.nlm.nih.gov). After excluding uncultured and environmental clone sequences, 1,853 sequences of the 18S rRNA gene and 1,473 sequences of the rbcL gene were aligned with the sequences obtained in present study using the ARB program [44] and corrected manually. Two Ochrophyta species (Nannochloropsis salina D.J. Hibberd and Ochromonas danica E.G. Pringsheim) were used as an outgroup. Neighbor-joining (NJ) and maximum-parsimony (MP) trees were constructed using MEGA 5.2 [45]. Maximum-likelihood (ML) trees were constructed using Randomized Axelerated Maximum Likelihood (RAxML) v.8.2.1 [46]. We used the "-f a" option for rapid bootstrap analysis and the best likelihood tree search using "-# 100" with default settings, namely, "-m GTRGAMMA" for the substitution model with rate   heterogeneity, "-i" for the automatically optimized SPR rearrangement for heuristic search, and "-c" for 25 distinct rate categories. The robustness of each clade was assessed by further bootstrap analyses (1,000 replications) under the NJ and MP criteria using MEGA v.5.2 [45].

Morphological observations
The 61 diatom isolates were identified by morphometric characteristics using light and scanning electron microscopy and their detailed information is shown in Table 2. All strains were raphid diatoms and classified into 3 orders, 6 families, 13 genera, and 52 taxa (36 known and 16 unknown taxa; Fig 2). Forty-two strains could be morphologically identified to the species level (

Molecular-based identification
Both 18S rRNA and rbcL genes from 61 culture strains were sequenced successfully. The BLASTn results of each 18S rRNA and rbcL sequence are given in Table 2 according to the  best matched species and sequence identity. For many strains, the closest relative based on the BLAST search differed from identification based on morphology. The morphological and genetic classification results were consistent for only nine strains with >98.7% identity to their closest relatives based on their 18S rRNA gene sequences (Table 2). Similarly, morphological and genetic identification using the rbcL sequences were consistent only in six strains with relatively high sequence identities, ranging from 94.3% to 99.5% (Table 2). From the phylogenetic trees, phylogenetic relationships among the isolates can be determined (Figs 7-9). In total, 110 sequences of the 18S rRNA gene and 93 sequences of the rbcL gene were used for the phylogenetic analysis. In the phylogenetic trees of the rbcL gene, most of strains were separated in accordance with their taxonomic positions. In contrast, some strains were not consistent with the morphological classification in the 18S rDNA phylogenies. Petrodictyon gemma TA201, belonging to Surirellaceae, clustered with Entomoneis ornata strain 14A, belonging to Entomoneidaceae, with a long branch in the ML tree of 18S rDNA (Fig 7). Additionally, two Entomoneis paludosa strains, TA208 and TA263, showed another long branch (Fig 7). Unlike the ML tree, however, P. gemma and the two E. paludosa strains clustered together with a long branch in the NJ and MP phylogenies. Thus, in the 18S rDNA tree, the phylogenetic positions of these species were unstable. In the Naviculales, despite the fact that the morphological features were similar to those of naviculoids, the tube-dwelling diatoms Berkeleya and Parlibellus did not cluster in the naviculoid group, but rather in asymmetrical biraphid diatoms with a low bootstrap value in the 18S rDNA phylogenies (Fig 7). In addition, several different species were not clearly differentiated in the 18S rDNA phylogenies, such as Berkeleya rutilans TA440 and Berkeleya fennica TA424, which had a very low sequence distance (Fig 7, Table 2). A similar low resolution was also found among Navicula salinarum TA402, Navicula trivialis TA83, and N. cf. trivialis TA407 (Fig 8).
Using the sequences obtained in this study, we analyzed divergence levels of the 18S rRNA and rbcL genes (Table 3). Although the divergence levels of 18S rRNA genes were higher than those of rbcL genes in the genus Entomoneis due to long branches, the genetic distance of the rbcL gene within the genus was, on average, double that of the 18S rRNA gene. Furthermore, the genetic distance of rbcL was three times higher than that of 18S rRNA in two dominant benthic genera, Navicula and Nitzschia.

Discussion
In this study, we attempted to identify and classify benthic diatoms by the polyphasic approach using both morphological characteristics and molecular markers and suggested that molecular approach using rbcL gene could become a better alternative to traditional morphological classification approach. Despite a long history of taxonomic studies on benthic diatoms, overcoming the difficulties associated with identification and classification of diatoms is a major challenge because of their small size and morphological similarities. In the process of identifying the strains obtained in this study, many strains were not morphologically identified at the species level due to these difficulties. Although more strains might be identifiable by a thorough literature review and some may be confirmed to be a new species, it is evident that morphometric classification is a laborious and time-consuming procedure. Some previous studies  avoided identification at the species level or dealt only with the community dynamics of benthic diatoms [12,13]. Therefore, the community structure of diatoms and their distribution in tidal flats have not been clearly elucidated [48]. To reveal easily and quickly the hidden diversity of benthic diatoms, largely attributed to their very small and similar morphologies, the development of molecular barcoding techniques is urgently needed. To enable this, it is necessary to construct a reliable genetic database.
The quality of a database has a direct and absolute influence on the applicability and efficiency of DNA barcoding techniques [49]. Currently, genetic information on most species could not be found in GenBank, indicating that the database is still insufficient, and that molecular taxonomic studies on benthic diatoms are limited. At the time of writing, the numbers of 18S rDNA and rbcL gene sequences deposited in GenBank are 4,775 and 3,099, and the number of species are reduced to 811 and 709, respectively. Despite the fact that extant diatoms are estimated to include 30,000-100,000 species [50], there is no genetic information on the majority of such species. Owing to the limited data available in GenBank, the closest  relatives of most 18S rDNA sequences did not match the classifications by morphological identification (Table 2). These inconsistencies were more apparent in the case of the rbcL gene.
In this study, six groups of diatoms, namely, Bacillariaceae, Naviculaceae, Pleurosigmataceae, Berkeleyaceae, Entomoneidaceae, and Surirellaceae, were clearly distinguished and formed monophyletic groups in the phylogenetic trees of rbcL gene. In the 18S rDNA analyses, despite a morphological difference, some diatom sequences showed high similarity (more than 99%) to those of other species. These relatively high sequence similarities might have been due to either misidentification of records deposited in GenBank or low resolution of the 18S rDNA gene [18,19]. However, a relatively low sequence distance within a genus shows that 18S rDNA is not an appropriate genetic marker to differentiate diatom species clearly, as is seen in the case of lower resolution among species and polyphyletic characteristics of several species (Table 2). For example, Navicula salinarum TA402, N. cf. salinarum TA407, and N. trivialis TA83 are similar but morphologically different species. N. trivialis TA83 has subrostrate apices and a central area that is bound by mostly shortened striae, whereas N. salinarum TA402 has rostrate apices and a central area that is formed by alternating long and short striae [31,33]. However, the 18S rDNA sequences of these species are almost identical, and therefore cannot be clearly distinguished from each other (Fig 8). Similarly, Berkeleya fennica, which can be distinguished by its smaller and denser striae (over 30/10 μm) from B. rutilans [40], were not clearly differentiated from B. rutilans in the 18S rDNA phylogenetic tree. In addition, the Surirelloid diatom Petrodictyon gemma was clustered with Entomoneis by a long branch in the 18S rDNA phylogeny. This long branch attraction artefact was also found in the 18S rDNA phylogenies of Haslea nipkowii and Neidium affine [51], indicating that unusually rapid evolutionary events have occurred in the 18S rRNA genes of some benthic diatoms [52]. In this respect, it is apparent that the 18S rRNA gene of some benthic diatoms has undergone unusually rapid evolutionary changes. Thus, although 18S rRNA has been widely used in phylogenetic studies on diatoms and has the largest database compared with other genetic markers [20,22,23], it is unsuitable as a marker for the study of diatom biodiversity because of its low resolution [20].
Conversely, the rbcL gene varies markedly compared with 18S rDNA [16]. Consistently in this study, the rbcL gene showed higher divergence levels than those of the 18S rRNA gene, with a few exceptions in Entomoneis and Haslea, which were supposed to have undergone rapid evolutionary changes in 18S rDNA (Figs 7 and 8). Furthermore, long branch artefacts were not found among the rbcL phylogeny. In addition, the rbcL gene, a plastid-encoded gene, is advantageous in its use as a genetic marker because of its high PCR success rate (i.e., ease of amplification), simplicity of alignment, and low susceptibility to interference by heterotrophic contaminants [53]. However, the deficiencies in databases must still be addressed. Hamsher et al. [54] reported that the range of divergence in the rbcL gene sequence among species in the genus Sellaphora was 0.14-0.73%. Also, Kermarrec et al. [55] suggested 99% and 98% rbcL gene sequence identities as the thresholds for species-and genus-level classifications, respectively. However, most strains obtained in this study shared a sequence identity of 97% or less with sequences in the GenBank database. These results indicate that much of the necessary information remains unknown. However, it is still clear that the rbcL gene would be more appropriate than 18S rDNA for the molecular taxonomy and phylogenetic analyses of benthic diatoms.
Despite the ecological importance of benthic diatom community, their identification and classification systems still need to be improved. In this study, we showed that a large proportion of diatoms could not be identified by morphological characteristics and that genetic information should be expanded for molecular phylogenetic analyses. Furthermore, rbcL gene is suggested as a superior genetic marker to 18S rRNA gene to identify and phylogenetically classify benthic diatoms. The huge number of diatom species estimated in various environments suggests a need for more efforts to construct a reliable database containing polyphasic taxonomic data.