Comparison of DNA Extraction Methods in Analysis of Salivary Bacterial Communities

Culture-independent high-throughput sequencing-based methods are widely used to study bacterial communities. Although these approaches are superior to traditional culture-based methods, they introduce bias at the experimental and bioinformatics levels. We assessed the diversity of the human salivary microbiome by pyrosequencing of the 16S rDNA V1–3 amplicons using metagenomic DNA extracted by two different protocols: a simple proteinase K digestion without a subsequent DNA clean-up step, and a bead-beating mechanical lysis protocol followed by column DNA purification. A high degree of congruence was found between the two extraction methods, most notably in regard to the microbial community composition. The results showed that for a given bioinformatics pipeline, all the taxa with an average proportion >0.12% in samples processed using one extraction method were also detected in samples extracted using the other method. The same taxa tended to be abundant and frequent for both extraction methods. The relative abundance of sequence reads assigned to the phyla Actinobacteria, Spirochaetes, TM7, Synergistetes, and Tenericutes was significantly higher in the mechanically-treated samples than in the enzymatically-treated samples, whereas the phylum Firmicutes showed the opposite pattern. No significant differences in diversity indices were found between the extraction methods, although the mechanical lysis method revealed higher operational taxonomic unit richness. Differences between the extraction procedures outweighed the variations due to the bioinformatics analysis pipelines used.


Introduction
Changes in the salivary microbiota are associated with various oral and systemic conditions, including caries, periodontal disease, cancer, arthritis, cardiovascular disease, and obesity [1][2][3][4]. Studies of salivary bacterial communities initially used culture-based techniques [5,6]. However, the presence of numerous unculturable bacteria in the mouth, currently estimated to represent about one third of the 600 inventoried species in the curated Human Oral Microbiome Database [7], has necessitated the development of culture-independent approaches. These techniques include DNA-DNA hybridization [8] and high-throughput sequencing (HTS) of 16S rDNA amplicon libraries [1,[9][10][11][12] or metagenome fragments [13]. The HTS-based methods, now widely used to study bacterial communities, allow the analysis of a small or large number of samples with the desired depth of coverage. Although significantly better than culture-based approaches, the culture-independent methods may introduce bias related to the DNA extraction procedure and the downstream molecular and informatics analyses.
Enzymatic lysis of samples collected using oral swabs [14][15][16] has been used in the study of salivary bacterial communities. This protocol includes treatment with proteinase K, Tween, and EDTA, after which the lysate is used in PCR analyses, either directly [11] or following a clean-up step [13]. In several studies, salivary bacteria were disrupted mechanically using glass, zirconia, or zirconia/silica beads in the presence or absence of phenol, and DNA was purified using different procedures [1,9,10,12].
Here we evaluated two DNA extraction protocols for human saliva samples. DNA extracts obtained by enzymatic or mechanical cell lysis were used to assess bacterial community diversity based on 16S rDNA amplicon pyrosequencing.

Ethics Statement
This study was approved by the Ethics Committee of the Geneva University Hospitals (09-078). Written informed consent was obtained from the participant.

Sample Collection
Unstimulated human saliva was obtained as described previously [13] from a 32-year old female non-smoker, without obvious signs of oral disease.

DNA Extraction
After vigorous mixing by vortex, the saliva sample was divided into six aliquots. Three 100-mL aliquots were mixed with the same volume of 26 lysis buffer (20 mM Tris, 2 mM EDTA pH8, 1% Tween, and 400 mg/mL proteinase K; Fermentas) and then incubated at 55uC for 2.5 h, followed by heating at 95uC for 10 min to inactivate the proteinase K. Three 200-mL saliva sample aliquots were mixed with 700 mL of lysis buffer SL1 (SDScontaining) and 120 mL of Enhancer SX from the NucleoSpin Soil Kit (Macherey-Nagel). The mixture was shaken in a NucleoSpin Bead Tube for 2 min at maximum speed on a Vortex-Genie 2 with a horizontal tube holder (Scientific Industries). From this point, we followed the NucleoSpin Soil Kit protocol (Macherey-Nagel). DNA was eluted in 100 mL of elution buffer SE. DNA extracts from both protocols were stored at 220uC.

PCR and Sequencing
PCR amplification mixtures included 5 mL of DNA extract, 25 pmol each of forward primer (59-ctatgcgccttgccagcccgctcag-ac-GAGTTTGATCMTGGCTCAG-39) and a barcoded reverse primer (59-cgtatcgcctccctcgcgccatcag-NNNNNNNN-at-CCGCGGCTGCTGGCAC-39) in 50 mL Primestar HS Premix (Takara). The composite PCR primers were designed as described previously [11] except that they contained Titanium adaptor sequences (plain lowercase). For each of the six DNA extractions (three enzymatic and three mechanical), two separate PCRs were carried out using reverse primers with a different barcode. All PCRs were performed in duplicate using the following parameters: 30 cycles of 98uC for 10 s, 56uC for 15 s, and 72uC for 1 min. The V1-3 amplicons corresponded to bases 28-514 (excluding primer sequences) in the Escherichia coli 16S rRNA gene. Amplicon sizes were checked on a 2100 Bioanalyzer (Agilent). Two replicate PCRs were then pooled and purified using a QIAquick PCR Purification Kit (Qiagen). DNA quantity was assessed using a NanoDrop ND-8000 spectrophotometer (NanoDrop Technologies). Three hundred nanograms of each of the purified samples were then pooled and sequenced from the reverse primer on a Genome Sequencer FLX system with Titanium chemistry (Roche) at LGC Genomics (Berlin, Germany).

Real-time PCR
To determine the concentration of bacterial DNA, real-time PCR was carried out on an Mx3005P Stratagene cycler using a Brilliant II SYBR Green QPCR Master Mix Kit (Stratagene). Reaction mixtures contained 1 mL of DNA extract, 7.5 pmol each of forward (59-ACTCCTACGGGAGGCAGCAGT-39) and reverse (59-ATTACCGCGGCTGCTGGC-39) HPLC-purified primers [17] and 0.75 mL of 1/250-diluted reference dye, in a total volume of 25 mL. The cycling conditions included an initial denaturation of 10 min at 95uC, followed by 40 cycles of 95uC for 30 s and 68uC for 1 min. The reference curve for DNA quantitation was created using known concentrations of Staphylococcus aureus strain MW2 genomic DNA.
To determine the concentration of human DNA, qPCR was conducted on an Mx3005P Stratagene cycler using ABsolute QPCR Mix (ABgene) and a TaqMan b-actin Control Reagents kit (PE Applied Biosystems). Reaction mixtures contained 2.5 mL of DNA extract, 9 pmol each of b-actin forward and reverse primers, and 6 pmol of b-actin probe (PE Applied Biosystems), in a total volume of 15 mL. The cycling conditions included an initial denaturation of 15 min at 95uC, followed by 42 cycles of 95uC for 15 s and 60uC for 1 min. The reference curve for DNA quantitation was created using known concentrations of human genomic DNA provided with the kit.
Bioinformatic Analysis Pipeline 1. The fastq file containing 199,205 row sequence reads was deposited in MG-RAST under accession number 4499916.3. Sequence reads were processed using the Mothur (v 1.28) software package [18]. The reads were removed (Mothur's command trim.seqs) if they met any of the following criteria: (i) contained ambiguous bases; (ii) contained more than one mismatch in the reverse primer (excluding the barcode sequence for which no mismatches were allowed); (iii) contained homopolymer runs longer than 9 bases; (iv) had a minimum quality score of ,35 over a 50-base window; (v) had a sequence length ,350 or .600 bases after primer sequence trimming. Sequences were aligned (command align.seq) against the Greengenes [19] reference core alignment set (available in Mothur as core_set_aligned.imputed.fasta) and truncated to include bases corresponding to E. coli 16S rRNA gene positions 101-514. Sequences that did not cover this region were removed. The commands pre.cluster (with option -diff = 4) and chimera.uchime (default parameters) were used for denoising [20] and removal of potentially chimeric sequences [21], respectively. Samples were normalized to an equal sampling depth corresponding to the size of the smallest sample (command sub.sample). Taxonomic assignment of individual sequences was based on the Naïve Bayesian method [22] and the reference Greengenes taxonomy database (Mothur's files gg_99.pds.ng.fasta and gg_99.pds.tax), with a confidence score threshold of 80% (command classify.seqs). Sequences were assigned to representative operational taxonomic units (OTUs) using the furthest neighbor method and a 3% dissimilarity cut-off (command dist.seqs and cluster). The consensus taxonomy for the majority of sequences within an OTU was determined using the command classify.otu. A multiple alignment of representative OTUs was imported into FastTree [23] to construct a tree that was then used as the input file for the Fast UniFrac web interface [24].
For Pipelines 2-5, only differences with respect to Pipeline 1 are indicated.
Pipeline 2. In this pipeline, potential chimeras were removed using Chimera Slayer (command chimera.slayer) with default parameters [25].
Pipeline 6. Sequences were pre-processed as described in Pipeline 1 (steps i-v), except that the minimum sequence length was set at 200 bases, and residual sequences of the forward PCR primer were removed. Each sequence was mapped to a Greengenes reference OTU by BLASTN, as described in Pipeline 3. Downstream analyses were carried out using the matching fulllength 16S rRNA gene sequences and their taxonomic information. The pre-constructed Greengenes reference OTU tree (Greengenes file gg_97_otus_4feb2011.tre) was used as the input file for the Fast UniFrac web interface.

Clustering of Bacterial Communities
Bacterial community comparisons were carried out with (i) UniFrac, which exploits different degrees of similarity between 16S sequences, and (ii) Bray-Curtis similarity [29], which is a nonphylogenetic metric. Principal coordinates analysis (PCoA) of Bray-Curtis similarities, was performed in PRIMER-E (Primer-E Ltd., Plymouth, UK). Ecological indices were calculated from OTU relative abundance in PRIMER-E.

Statistical Analysis
Indicator taxa were determined by indicator species analysis [30] (for each taxonomic level individually) using a Monte Carlo test of significance with 1,000 iterations. The P-values were adjusted for multiple testing according to Benjamini and Hochberg [31].
Error values are given as standard deviation unless otherwise indicated. Permanova (Primer-E Ltd., Plymouth, UK) and Mann-Whitney U tests were used to assess statistical significance.

Sample Preparation and Bioinformatics Analysis Pipelines
DNA was extracted in triplicate from a saliva sample using either a mechanical or enzymatic lysis procedure. Real-time PCR revealed a 5.3-fold and 2.5-fold higher yield of bacterial and human DNA, respectively, using enzymatic lysis compared with the mechanical lysis procedure (Table S1). We generated V1-3 16S rDNA amplicon libraries using the same volume of each DNA extract, which corresponded to 0.6-1.3 and 2.3-2.4 ng of bacterial DNA for the mechanical and enzymatic lysis procedures, respectively.
A total of 199,205 raw sequence reads were generated by pyrosequencing of the amplicon libraries. Following quality control, the sequences corresponding to positions 101-514 in the reference E. coli 16S rRNA gene were extracted and processed by bioinformatics Pipelines 1-5, which use different tools for chimera removal, OTU clustering, or taxonomy assignments. In Pipeline 6, the quality-filtered sequences with a length $200 bases were retained and a reference-based OTU-picking was performed. The number of sequence reads obtained after different preprocessing steps (i.e. DNA quality check, chimera removal, and normalization steps) and the number of OTUs determined across different pipelines are presented in Table 1.

Abundance of Taxa
Assessment of the salivary samples using two extraction procedures revealed a high degree of congruence in terms of the composition and abundance of species in the microbiota. From the phylum level down to the OTU level, all of the taxa with an average proportion .0.12% in samples processed using one extraction method were also detected with the other method, when applying the same bioinformatics analysis pipeline (Figure 1). The most frequent taxa tended to be the most abundant, and the same taxa tended to be abundant and frequent for both extraction methods. These trends, at the OTU level, are exemplified for Pipelines 1 and 6 in Figure 2. The correlation between the extraction methods in terms of OTU abundance and prevalence reached the highest values in Pipeline 6 ( Table 1).
Only very low-abundance (average proportion ,0.08%) OTUs were present in all six samples derived from one extraction method but absent in all samples obtained with the other method. Two OTUs assigned to the genus Treponema (Spirochaetes), along with an OTU belonging to the phylum TM7, were confined to the bead-beating method, whereas an OTU from the genus Haemophilus was identified only in samples lysed by enzymatic treatment (Table S2).
Indicator species analysis revealed significant differences in the relative abundance of most major phyla that was related to the extraction procedure (Figure 1). In samples processed by enzymatic lysis, the relative abundance of sequence reads assigned to the phylum Firmicutes was significantly higher than in the mechanically-treated samples, whereas the phyla Actinobacteria, Spirochaetes, and TM7 showed the opposite pattern across all pipelines. The phyla Synergistetes and Tenericutes also had significantly higher relative abundance in mechanically-processed samples, except in Pipeline 5, in which members of this phylum were not identified.
As shown previously [33], different members of the same taxon may present opposite directions of change in relative abundance (increase or decrease) when comparing one extraction method to the other. For instance, the relative abundance of the orders Burkholderiales and Neisseriales, both belonging to the class Betaproteobacteria, were higher and lower, respectively, in mechanically-lysed samples ( Figure S1 and Table S2). Similarly, one Streptococcus OTU was present at a significantly higher proportion in samples treated mechanically, whereas the abundance of other OTUs from the same genus was significantly greater in enzymatically-lysed samples.

Clustering of Bacterial Communities
Similarities between bacterial communities in terms of their phylogeny were assessed by measuring UniFrac distance [34]. This metric takes into account the relative abundance of OTUs (weighted data) or their presence/absence (unweighted data). The between-method weighted UniFrac distance was significantly higher (P,10 215 ) than the within-method distance across all pipelines ( Figure 3A). The within-method distances for the mechanical and enzymatic lysis methods were not significantly different.
Comparisons of bacterial community membership using unweighted UniFrac ( Figure 3B) showed that intra-method distances were significantly smaller for the mechanically-treated samples than for the enzymatically-treated samples in the pipelines in which OTUs were clustered using Mothur (Pipelines 1/5, 2 and 3). In these pipelines, as well as in Pipeline 4, UniFrac distances between the mechanically-lysed samples were also smaller than inter-method distances. In contrast, in Pipeline 6, a statistically significant decrease was observed when unweighted UniFrac distances among enzymatically-processed samples were compared with inter-method distances.
Permanova of Bray-Curtis similarities confirmed the effect of the extraction method on the structure of the salivary microbiota (Table S3). This test was based on square-root transformed abundance of OTUs and did not take into account the phylogenetic distance, as this was the case when using UniFrac. Similarly, PCoA of the Bray-Curtis similarity matrix based on square-root-transformed OTU abundance showed that samples processed using the same extraction method clustered together (Figure 4). We used this non-phylogenetic approach to analyze smaller size libraries simulated by sampling a given number of OTUs from each sample. The PCoA plots show that as few as 100 randomly chosen OTUs per sample results in a clear separation of bacterial community profiles according to the extraction method ( Figure 4). Further reducing this number to 50 still showed the between-method separation, although with some overlap.

Shared OTUs and Diversity Estimates
Pipeline 2 generated the highest total number of OTUs, as well as the highest average number of OTUs per sample ( Table 1). The proportion of chimeras detected using Chimera Slayer (Pipeline 2) was lower than that obtained with UCHIME (Pipeline 1) or a BLASTN-based method (Pipeline 3) (Table 1). When the samples derived from the mechanical disruption were compared with those obtained by enzymatic lysis, the following trends were observed across all analysis pipelines: (i) lower average fraction of detected chimeras; (ii) higher average number of OTUs per sample; and (iii) greater average fraction of OTUs shared among samples. The Chao 1 richness estimator predicted a higher number of OTUs for mechanically-lysed samples in all pipelines except Pipeline 2. However, these differences exceeded the threshold of statistical significance in only a few cases. Differences in Shannon and Simpson diversity indices between the extraction methods were not statistically significant (Table 1).

Discussion
The current study showed that the same bacterial taxa tended to be abundant and frequent in human saliva samples processed by either the mechanical or enzymatic lysis methods, but that their relative abundance, determined by pyrosequencing of 16S rDNA amplicon libraries, differed between the extraction methods. These differences could be distinguished with as few as 100 sequences per sample, i.e. far below the number of reads routinely analyzed in community studies based on HTS of 16S rDNA amplicon libraries. Similarly, Kuczynski et al. [35] showed that ,100 sequences per sample were sufficient to discriminate between microbiota from dissimilar habitats, but underlined that such a low coverage does not allow the determination of differences in the rare biosphere.
The observed lower DNA yield, which was associated with higher relative standard deviation, in the mechanically-lysed samples points to the possibility of DNA loss during the column purification. Relatively small differences in input DNA amounts likely had very little impact on the profiling of the microbiota because the values were much higher than the 1 pg/mL threshold under which salivary microbiota profiling is significantly impacted by template DNA levels [36].
Differences in the cell wall composition and structure may explain variations in bacterial susceptibility to different lysis procedures. For instance, variation in peptide cross-links in peptidoglycan renders bacteria more or less susceptible to proteinase K lysis [37]. Several studies have shown that disruption of bacteria with tough cell walls, such as those belonging to the phyla Firmicutes and Actinobacteria, is more efficient by a mechanical approach than by an enzymatically-based protocol [36]. Our data confirmed the trend of a higher proportion of Table 1. Number of sequence reads, number of OTUs, and ecological indices in enzymatically-and mechanically-processed samples using different bioinformatic analysis pipelines.   Actinobacteria in mechanically processed samples, whereas Firmicutes showed the opposite trend. However, some Firmicutes from the order Clostridiales (Catonella, Eubacterium, and Selenomonas) as well as a Streptococcus OTU had higher relative abundance when mechanical cell disruption was applied.
Hierarchical clustering of datasets based on the relative abundance of genera resulted in two main clusters corresponding to the extraction method ( Figure S2). Within each of these two clusters similar trends were observed: (i) for each sample, datasets obtained using Pipelines 1-5 generally clustered together; (ii) datasets obtained for different samples using Pipeline 6, which uses reference-based OTU picking, were more similar to each other than to the datasets obtained using other pipelines. Therefore, differences between the extraction procedures outweighed the variations due to the bioinformatics analysis pipelines used.
In our study, the exact species composition/abundance of the salivary microbiota and the proportion of non-lysed bacterial cells were unknown, and therefore, it remains unclear which protocol yielded a more accurate description. Also, to determine whether differences due to the DNA extraction method outweighed interindividual differences, as has been recently shown for bronchoalveolar lavage samples [38], it would be necessary to analyze samples from multiple individuals.
The advantage of the enzymatically-based DNA extraction method resides in its simple procedure including few steps and direct use of the lysate in PCR. This approach may be suited for samples with limited amounts of DNA template or other cases where DNA loss associated with DNA clean-up is to be avoided. However, in the absence of a DNA purification step, more PCR inhibitors may remain in the sample.
Variations in DNA extraction methods, such as the use of different cell wall-degrading enzymes, chemical agents, bead sizes, bead-beating time, and DNA purification procedures [33,36,38,39] may affect profiling of the microbiota. In addition, we found that the magnitude of differences in the microbiota profiles obtained by different extraction methods is somewhat sensitive to the parameters used in bioinformatics pipelines. Therefore, caution is needed when comparing microbial community data from different studies.  Supporting Information Figure S1 Microbial community profiles at the class (A), order (B), family (C), and OTU (D) levels. Only taxa found at an average frequency .0.25% by at least one extraction method in at least one analysis pipeline are presented. The indicator values .0.5, determined by indicator species analysis, associated with the Benjamini-Hochberg corrected P values ,0.05, were used to define indicators. Symbols ''.'' and '','' correspond to such indicator taxa and denote increasing and decreasing trends in the relative abundance for enzymatic vs. mechanical lysis. OTUs with an average abundance ,0.25% for both extraction procedures (of a given analysis pipeline) were filtered out to reduce the number of comparisons in the indicator species analysis. For abbreviations, refer to Figure 1A. (ZIP) Figure S2 Hierarchical clustering of the 16S profiles obtained using different DNA extraction protocols and bioinformatic analysis pipelines. Group-average clustering was based on the Bray-Curtis similarity matrix computed from square-root transformed relative abundance of genera. Dataset IDs: Extraction method (E, enzymatic; M, mechanical)_Extraction # (1-3)_PCR # (1 and 2 stand for different barcode sequences in the reverse PCR primer)_Bioinformatics pipeline # (P_1-P_6). The genera Lautropia and TG5, absent in the RDP taxonomy, were excluded from the analysis. (TIF)