PCR Primers for Metazoan Nuclear 18S and 28S Ribosomal DNA Sequences

Background Metagenetic analyses, which amplify and sequence target marker DNA regions from environmental samples, are increasingly employed to assess the biodiversity of communities of small organisms. Using this approach, our understanding of microbial diversity has expanded greatly. In contrast, only a few studies using this approach to characterize metazoan diversity have been reported, despite the fact that many metazoan species are small and difficult to identify or are undescribed. One of the reasons for this discrepancy is the availability of universal primers for the target taxa. In microbial studies, analysis of the 16S ribosomal DNA is standard. In contrast, the best gene for metazoan metagenetics is less clear. In the present study, we have designed primers that amplify the nuclear 18S and 28S ribosomal DNA sequences of most metazoan species with the goal of providing effective approaches for metagenetic analyses of metazoan diversity in environmental samples, with a particular emphasis on marine biodiversity. Methodology/Principal Findings Conserved regions suitable for designing PCR primers were identified using 14,503 and 1,072 metazoan sequences of the nuclear 18S and 28S rDNA regions, respectively. The sequence similarity of both these newly designed and the previously reported primers to the target regions of these primers were compared for each phylum to determine the expected amplification efficacy. The nucleotide diversity of the flanking regions of the primers was also estimated for genera or higher taxonomic groups of 11 phyla to determine the variable regions within the genes. Conclusions/Significance The identified nuclear ribosomal DNA primers (five primer pairs for 18S and eleven for 28S) and the results of the nucleotide diversity analyses provide options for primer combinations for metazoan metagenetic analyses. Additionally, advantages and disadvantages of not only the 18S and 28S ribosomal DNA, but also other marker regions as targets for metazoan metagenetic analyses, are discussed.


Introduction
Human activities pose severe threats to planetary biodiversity e.g. [1][2][3], and it is thus critically important to be able to rapidly estimate biodiversity across space and through time. Species richness is the most widely used index of diversity, but it is difficult to estimate diversity comprehensively using traditional morphological approaches because for many groups most species remain undescribed [4][5][6], and even when described, are often difficult to identify. In the marine environment, species that comprise the majority of metazoan biodiversity are also often small and thus difficult to sample and analyze individually (for example Nematoda, Copepoda, Ostracoda, Rotifera, Kinorhyncha, Loricifera, and Tardigrada). In this context, the ability to rapidly assess biodiversity at various spatio-temporal scales without assigning formal taxonomic names to all samples is urgently needed.
Analyses based on second-generation sequencing have many advantages in this regard, as they can produce very large numbers of sequences from single samples, either by targeting single or multiple genes using PCR (metagenetics: [7]) or by targeting entire genomes (metagenomics). However, most metagenetic studies to date have focused on microbes and protozoans e.g. [8][9][10][11], and only a limited number of studies have been carried out for metazoans [12][13][14][15][16][17].
Unlike the situations with microbes, where analysis of the 16S ribosomal DNA sequence is standard [18], the appropriate gene for metazoan metagenetic studies is less clear [13]. Over the past 20 years, several studies have reported universal primers for metazoan nuclear 18S and 28S ribosomal DNA sequences [19][20][21], but the extent of compatibility between the primers and the target regions of these primers has not been estimated thoroughly. In 1994, Hillis and Dixon [19] reported universal primers for nuclear ribosomal DNA regions using up to seven reference sequences. Since then, much larger numbers of reference sequences have been made available in databases, and the numbers are expected to continue to increase.
In the present study, we aimed to perform (i) discovery of new metazoan universal primers, (ii) estimation of compatibility of the newly designed and also previously reported primers, and (iii) identification of regions with higher variability within the genes. By integrating results obtained from these analyses, we propose combinations of primers that are likely to retrieve a more complete representation of the taxonomic diversity of metazoans present in environmental samples.
Our primers might be applicable not only for environmental samples but also for individual samples. However, the goal of this study was to design primers that amplify a short portion of target genes for metagenetic analyses, rather than for phylogenetic analyses for which longer sequences of these genes would be preferred.

Results and Discussion
We identified five and eight conserved regions that were suitable for designing PCR primers against nuclear 18S and 28S ribosomal DNA (Table 1). In addition to the newly designed primers, we also tested the efficacy of previously published primers by comparing sequence similarity between the primers and SILVA datasets [19][20][21] (see materials and methods, Fig. 1, Tables S1, S2). From these, we identified three additional primers for the 28S ribosomal DNA region (primer numbers 22, 24, and 26 of [21]) that were also used in the compatibility test described below.

Compatibility test between the primers and SILVA datasets
To test the potential efficacy of primers, the similarity between primer sequences and target regions from the SILVA database sequences were compared for each phylum (Tables 2, 3). These sequences represent the 18S and 28S ribosomal DNA sequences for 34 and 27 phyla, respectively. However, only 24 of 34 phyla for the 18S ribosomal DNA (Platyhelminthes, Chaetognatha, Chordata, Echinodermata, Hemichordata, Annelida, Brachiopoda, Bryozoa, Entoprocta, Mollusca, Myzostomida, Nemertea, Ar-thropoda, Tardigrada, Sipuncula, Acanthocephala, Cycliophora, Gastrotricha, Nematoda, Nematomorpha, Rotifera, Cnidaria, Ctenophora, and Porifera) and 11 of 27 phyla for the 28S ribosomal DNA (Platyhelminthes, Chordata, Annelida, Bryozoa, Mollusca, Arthropoda, Acanthocephala, Nematoda, Rotifera, Cnidaria, and Porifera) had 10 or more sequences in the database at the time of the analysis. The remaining phyla typically had five or fewer sequences in the database, and the generality of our findings for these groups is thus limited.
A maximum of 16 fold-degeneracy was required to design the primers for the 18S ribosomal DNA (Table 1). For those phyla with ten or more sequences, at primer positions #1 and #5, all phyla had one or no mismatches for more than 90% of the sequences ( Table 2). For primer position #2, this level of matching was exceeded in all of the phyla except the Chaetognatha (85% of sequences with one or no mismatches). For primer position #4, this level of matching was exceeded in all of the phyla except the Gastrotricha (71% of the sequences with one or no mismatches). For primer position #3, only the Chordata (89%) and the Gastrotricha (88%) had fewer than 90% of sequences with one or no mismatches.
A maximum of 48-fold degeneracy was required for designing primers for the 28S ribosomal DNA region (Table 1). For primer positions #3, #6, #8, #9, #10 and #11, all of the phyla had more than 90% of their sequences with one or no mismatches (Table 3). For four additional primer positions, this level of the matching was exceeded in all the phyla except the Chordata (90% for positions #2 and #7), Nematoda (86% for position #4), and Bryozoa (86% for position #5). For primer position #1, the only phyla with less than this level of matching were the Bryozoa (86%) and the Nematoda (88%).

Sliding window analyses of nucleotide diversity
Sliding window analyses of the 18S ribosomal DNA region showed a similar nucleotide diversity pattern for all the taxa analyzed in this study (Fig. 2). The highest nucleotide diversity was observed between nucleotide positions 50-250. The next peak was observed for the region between primers 3 and 4. More peaks were observed in the 28S ribosomal DNA region than in the 18S ribosomal DNA region, but again a similar nucleotide diversity pattern was observed across all the taxa (Fig. 3). A relatively high peak was observed for the region between primers 1 and 2, and subsequent peaks were observed for the region between primers 2 and 3 as well as between primers 3 and 4. For the region between primers 4 and 5, two to four peaks were observed. No peaks were observed for the region between primer positions 5 to 9. The last clear peak was observed for the region between primers 9 and 10. All of the primer regions were depicted on a secondary structure model of Apis mellifera ribosomal RNA genes derived from [22] (Figs. S1, S2, S3). This showed that the regions with higher nucleotide diversity were all around variable regions estimated from the secondary structure model (Figs. 1, 2, 3). All newly designed primers were located outside of these variable regions.   [21], respectively. Primers used in Fonseca et al. [12] were also drawn. Relative positions of variable region (V) and expansion segments (D) [22] based on the secondary structure model of Apis mellifera ribosomal RNA genes are also depicted. doi:10.1371/journal.pone.0046180.g001 Table 2. Percentages of sequences, which showed mismatches between the primer and target regions of the nuclear 18S ribosomal DNA sequences downloaded from the SILVA database.  (5) 1.32 (13) 0.71 (7) 1.02 (10) 1.12 (11) 1.32 (13) 1.53 (15) 0.71 (7) 1.53 (15) 0.92 (9)
Although we propose primer pairs 18S #1/#2RC and 28S #8/ #11RC as good candidates for metazoan metagenetic analyses based on the likelihood that they will successfully amplify the target regions in most metazoans, some of the alternative primers in Table 1 might be more suitable in some cases. For example, two or more mismatches between the 18S primer (18S #2) sequences and their target regions were observed in Chaetognatha (Table 2). Because groups with higher mismatches will be less effectively amplified when the primers are used for metagenetic analysis [23], application of other primer combinations might sometimes be appropriate, depending on the community and taxa being analyzed. Another reason to use alternative primers is because of limitations of the sequencers. Commonly used second-generation sequencers, such as Roche 454, Illumina Solexa, and Applied Biosystems SOLiD, have length limitation not only for reading but also for library processing (length limitation for emulsion PCR and bridge amplification). Those limitations vary among the machines, although those limitations are getting smaller of late. From this standpoint, alternative primer combinations yielding shorter products might be better solutions depending on the machine being used. The approximate length of the amplicons can be estimated from Figures 1, 2, 3.

Comparison of suitability of different gene regions for metagenetic analyses
In the present study we have designed and tested the compatibility of primers for nuclear 18S and 28S ribosomal DNA sequences. However, other regions might be good targets for the metagenetic analyses depending on the goals of the study. We have listed five regions [mitochondrial COI, 12S, nuclear ITS (Internal Transcribed Spacer), 18S and 28S] as potential candidates and compared advantages and disadvantages of these regions as targets for metagenetic analyses (Table 4).
Closely related taxa are most reliably distinguished using regions with fast evolutionary rates. In this regard, the mitochondrial COI gene has an advantage over the other four genes [24,25]. The nuclear 18S and 28S ribosomal genes have slower evolutionary rates [19] so that the ability to distinguish closely related taxa using these primers will generally be lower than for primers that target COI [25]. On the other hand, ease of designing universal primers is inversely related to evolutionary rate, and universality is important for metagenetic analyses of environmental samples. In this regard, the 18S and 28S genes have an advantage over COI.
The five regions also differ with respect to whether or not they are coding sequences. The nuclear ITS region is non-coding, making it difficult to identify informatically whether any sequence obtained is from the ITS region. Furthermore, because of the frequent occurrence of insertions and deletion in the nuclear ITS region, it is difficult to assign sequences to taxa by sequence similarity, especially to higher taxonomic levels. Nevertheless, the nuclear ITS region can potentially be a useful marker if a highdensity of reference sequences are available for the target community and species.
Of the five regions, only the mitochondrial COI region is a protein-coding gene, which results in different rates of evolution for the different codon positions and thus the potential for information to be obtained for a wide range of taxonomic levels. For example, second codon information can be used to make assignments at higher taxonomic levels, whereas third codon information will be good for population or species level estimation [24].
Alignment of sequences is the first analytic step after sequence data are available. However, it is not always an easy procedure because of occurrence of indels. In this regard, the mitochondrial COI region has an advantage over the other regions because insertions and deletions are relatively uncommon in protein-coding gene sequences. In contrast, lengthy and numerous insertions and deletions can be expected within the mitochondrial 12S and nuclear ITS regions, making it difficult to get good alignments, especially across a large taxonomic span such as all metazoans.
Another criterion in the choice of gene region for metagenetics is the availability of reference sequence data. If a goal of a study is to assign sequences to taxonomic groupings, then the density of the reference sequence data will be important. In this regards, mitochondrial COI and nuclear 18S and 28S regions have advantages because of the availability of datasets [26,27].

Materials and Methods
Designing PCR primers for the nuclear 18S and 28S ribosomal DNA Two datasets containing nuclear 18S and 28S ribosomal DNA sequences (SSURef_106_full_align_tax_silva_trunc.fasta and LSURef_106_full_align_tax_silva_trunc.fasta) were downloaded from the SILVA database (www.arb-silva.de) [26] and metazoan sequences were extracted. No new sequences were generated in this study. Sequences without taxonomic information, such as those determined from environmental DNA, were removed. Sequences with ambiguities and some positions where all characters were ''-'' (gaps) were also removed using Mothur command screen.seqs and filter.seqs [28]. These filtered sequences were then imported to Geneious (Biomatters Ltd), and conserved regions suitable for designing primers were identified. In total, five and eight primer sites were found in the 18S and 28S ribosomal DNA regions, respectively (Table 1). Next, SILVA sequences that did not extend to those primer sites (short sequences) were removed from the dataset by using the Mothur command screen.seqs. We did not extend the primer search beyond the identified primer regions to keep the number of SILVA sequences as large as possible. As a result, we retained 14,503 18S and 1,073 28S metazoan nuclear ribosomal DNA sequences. Because of the limitations of the SILVA database, some regions of the 39 and 59 ends of genes were not included the analyses. The newly identified primers in the prepared SILVA metazoan datasets were searched for each phylum using three criteria: no mismatches, one mismatch, and two or more mismatches. If .20% of the sequences of each phylum showed $1 mismatches, all the sequences of the phylum were extracted, and the sites with mismatches were inspected. If the mismatch at a single site was shared by .20% of the sequences of that phylum, the degeneracy of the primer was increased and the primer site search was repeated. This step was repeated until the prevalence of mismatches was ,20% for all the phyla having .10 sequences. Some primer sites still had a mismatch rate of .20% after this procedure; in these cases, the indels or substitutions were shared by ,20% of the sequences at multiple sites.
In addition to the developing these newly designed primers, we also performed this same compatibility test for previously reported universal primers [19][20][21] (Table S1, S2). After the first search, the degeneracy of the primers that had good compatibility was increased until the prevalence of the mismatches was ,20% for the phyla having .10 sequences. Primers that anneal to regions outside of the prepared SILVA datasets were not considered. As a result, primer numbers 22, 24, 26 from [21] were retained as good candidates and numbered 28S #6, 28S #8, and 28S #10, respectively, in the present study. Primer number 28v [19] also had good compatibility with the SILVA datasets, but it was removed from the list because of degeneracy at both end positions of the primer. Other than these primers, good compatibility was not observed between previously published primers and the metazoans sequences we obtained from the SILVA datasets (Table  S1, S2).
During preliminary analyses, we increased the degeneracy of the primers to decrease the mismatches by up to 10%. However, the highly degenerate primers failed to amplify individually extracted DNA because the degeneracy was too high. Therefore, we maintained a percentage of mismatches at a maximum of 20%.

Sliding window analyses of nucleotide diversity
Sliding window analysis of nucleotide diversity was performed to determine a suitable region for metagenetic analyses. In total, 11 lower taxonomic groups, ranging from genus to class, were selected for this analysis (Figs. 2 and 3; the name of the taxon and number of sequences used for the analysis are listed in the figure legend). Although some of these taxa are terrestrial or parasitic, they were included because ancestors of these phyla were marine. First, sequences for these taxonomic groups were extracted from the datasets, and sequence alignment was performed using MAFFT E-INS-i [29]. Sequences outside the newly designed primer regions were removed from the datasets. Sliding window analysis of these datasets was performed using the Drosophila Polymorphism Database, SNP Graphics (http://dpdb.uab.es/ dpdb/diversity.asp), with window length: 99 and step size: 10. During the analysis, we found some sequences with very large indels (sequence ID, 18S: AANH01015347.5743.7741, AANH01010553.44707.47729, and FJ196122.1.1563; 28S: AF154052.1.3517 and DQ790024.1.3760), which were removed from the datasets.

Supporting Information
Table S1 Percentages of sequences, which showed mismatches between the previously reported primer [17] and target regions of the nuclear 18S ribosomal DNA sequences downloaded from the SILVA database. Comparisons were made for each phylum. One and two or more mismatches were estimated independently. The numbers in parentheses indicate the number of sequences that had the mismatches. The hierarchy of the NCBI taxonomy database is followed in this table. (DOC)

Table S2
Percentages of sequences, which showed mismatches between the previously reported primer [17][18][19] and target region of the nuclear 28S ribosomal DNA sequences downloaded from the SILVA database. Comparisons were made for each phylum. One and two or more mismatches were estimated independently. The numbers in parentheses indicate the number of sequences that had the mismatches. The hierarchy of the NCBI taxonomy database is followed in this table. (DOC)