DNA extraction and primer choice have a large effect on the observed community structure in all microbial amplicon sequencing analyses. Although the biases are well known, no comprehensive analysis has been conducted in activated sludge communities. In this study we systematically explored the impact of a number of parameters on the observed microbial community: bead beating intensity, primer choice, extracellular DNA removal, and various PCR settings. In total, 176 samples were subjected to 16S rRNA amplicon sequencing, and selected samples were investigated through metagenomics and metatranscriptomics. Quantitative fluorescence in situ hybridization was used as a DNA extraction-independent method for qualitative comparison. In general, an effect on the observed community was found on all parameters tested, although bead beating and primer choice had the largest effect. The effect of bead beating intensity correlated with cell-wall strength as seen by a large increase in DNA from Gram-positive bacteria (up to 400%). However, significant differences were present at lower phylogenetic levels within the same phylum, suggesting that additional factors are at play. The best primer set based on in silico analysis was found to underestimate a number of important bacterial groups. For 16S rRNA gene analysis in activated sludge we recommend using the FastDNA SPIN Kit for Soil with four times the normal bead beating and V1-3 primers.
Citation: Albertsen M, Karst SM, Ziegler AS, Kirkegaard RH, Nielsen PH (2015) Back to Basics – The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities. PLoS ONE 10(7): e0132783. https://doi.org/10.1371/journal.pone.0132783
Editor: Ramy K. Aziz, Cairo University, EGYPT
Received: March 16, 2015; Accepted: June 19, 2015; Published: July 16, 2015
Copyright: © 2015 Albertsen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All raw 16S rRNA sequence data is available in the European Nucleotide Archive (ENA) under the project PRJEB8668 (http://www.ebi.ac.uk/ena/data/view/PRJEB8668). Raw metagenome and metatranscriptome data is available in ENA under the project PRJEB8669 (http://www.ebi.ac.uk/ena/data/view/PRJEB8669). All data analysis and figures described in the paper can be reproduced from the original processed data using R Markdown files available in the online documentation on GitHub (http://madsalbertsen.github.io/ampvis/).
Funding: This study was funded by the Innovation Fond Denmark (EcoDesign-MBR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Culture-independent methods have revolutionized our understanding of the microbiology of activated sludge communities [1,2] and especially the DNA-based methods for phylogenetic analysis are increasingly applied, due to the exponential decrease in sequencing costs and increased read length . However, you only see what you sequence and only sequence what you can extract and amplify. Hence, the choice of DNA extraction and amplification protocols is pivotal to the outcome of any amplicon sequencing study.
Several studies have investigated the influence of different DNA extraction protocols in activated sludge. However, the comparisons have mostly been made on yield and the ability to obtain reproducible PCR products [4–10]. Only in a few cases have the results been compared to DNA extraction independent methods such as qFISH . Recently, Gou & Zhang  used next generation sequencing to evaluate seven DNA extraction kits and showed that methods without bead beating underestimate the presence of bacteria that are typically hard to lyse, such as Gram-positive Actinobacteria. The general conclusions have been that a bead beating step is needed, and the FastDNA spin kit for soil has been recommended due to general high yield and purity [10,12]. Only two papers have investigated the influence of primer choice in activated sludge [13,14] and the often theoretically best primer-sets are applied , or they are selected based on recommendations for use in other environments. While the primer choice is known to introduce bias, the input template and specific parameters used in the PCR amplification protocol also influence the observed microbial community [16–18].
Another potential bias on the observed microbial community is the influence of DNA from dead cells . Pre-treatment with propidium monoazide may remove most of the DNA from dead cells by penetrating cells with degraded cell walls and binding to DNA upon photoactivation, which then interferes with the subsequent PCR amplification . However, the effect in activated sludge is presently unknown. In addition to DNA from dead cells, extracellular DNA (eDNA) is a component in the biofilms of activated sludge [21,22]. Currently it is unknown if eDNA is composed of specific sequences, although eDNA has been suggested to be similar to whole genome DNA in Pseudomonas aeruginosa biofilms . Interestingly, eDNA has been shown to be in high abundance in the vicinity of selected bacterial groups in activated sludge , which might result in a biased view of the microbial community structure.
In this study, we systematically investigated parameters influencing the observed community composition in activated sludge, from sampling and DNA extraction to primer choice and PCR settings, using Illumina MiSeq sequencing of 16S rRNA gene amplicons. Reproducibility was investigated by sampling four separate locations within the same wastewater treatment plant and resolution by sequencing a time-series spanning five months of operation. The effect of short-term storage was investigated by sequencing libraries prepared from samples stored at 4°C for 4 hours, 4°C for 24 hours, or 20°C for 24 hours. In addition, fresh samples were also investigated for the effect of eDNA removal. Instead of surveying a number of different DNA extraction methods, we chose to systematically explore the effect of bead beating intensity and amount of input material, using the commonly applied FastDNA spin kit for soil and MoBio PowerLyzer PowerSoil DNA isolation kit. The effect of primer choice was investigated using three popular primer-sets spanning different variable regions of the 16S rRNA gene; V1-3 (used by Human microbiome project ), V3-4 (recommended based on theoretical analysis , and the V4 (used by the Earth microbiome project ). To explore the effect of the PCR settings, we varied the amount of template, number of cycles and annealing temperature. For comparison, selected samples were also investigated by PCR-free metagenomics and stranded metatranscriptomics. Finally, the results were qualitatively compared with qFISH, using 23 oligonucleotide probes covering many of the important and abundant bacterial groups in activated sludge .
Materials and Methods
Activated sludge was sampled from aeration tanks at Aalborg West WWTP (57.049422° N, 9.864735° E) and Aalborg East WWTP (57.036973° N, 10.016569° E) in Denmark from October 2012 through March 2013. Sampling was carried out with the permission of Aalborg Forsyning, Kloak A/S. The plants primarily receive domestic wastewater (avg. 195 000 PE and 45 000 PE), and employ nitrogen and enhanced biological phosphorus removal. The aeration tanks were well mixed, and 1 L activated sludge was sampled 1 m below the surface. Sample material was transported on ice and processed within 4 h. The samples were homogenized (1 min, 1650 rpm) using a mortar-pestle, glass/teflon tissue homogenizer (30 mL) mounted on a Heidolph RZR 2020 (Heidolph, Germany) and stored as 2 mL aliquots at -80°C. To investigate the heterogeneity of the aeration tanks, four samples were obtained from different locations in the aeration tanks. To simulate the effect of longer transport time before processing, a subsample was stored at 20°C and shaken at 100 RPM for 24 h and another stored at 4°C for 24 h before processing. Subsamples of all samples were fixed for qFISH analysis . Total solid (TS) content was measured in duplicate for all samples .
Removal of DNA from dead cells and extracellular DNA
Fresh activated sludge samples were spiked with extracted E. coli DNA in 0 to 50% concentrations relative to the TS content of the sludge. Each sample was then divided into six 2 mL tubes, and three were treated with propidium monoazide (PMA). Samples not treated with PMA were kept on ice until DNA extraction. PMA (Biotium, USA) was added to a final concentration of 100 μM, and the samples were incubated in the dark for 10 min with occasional shaking. After incubation, the samples were subjected to a strong visible light (650 W halogen light bulb) for 10 min at a distance of approx. 20 cm, while keeping the samples on ice.
The standard protocols for DNA extraction with FastDNA Spin kit for soil (MP Biomedicals, USA) and PowerLyzer PowerSoil DNA isolation kit (MoBio, USA) were used with the following exceptions. Samples were thawed at room temperature, and varying amounts of biomass (0.9–2.2 mg TS) were spun down (5 min at >10,000 x g) and re-suspended in the respective lysis or bead solutions before extraction. Bead beating was performed on a FastPrep FP120 (MP Biomedicals, USA) at different intensity settings (20–400 s at 4–6 m/s), following general operation recommendations. All extraction conditions were tested in triplicate.
Purity of the extracted DNA was evaluated spectrophotometrically with Nanodrop1000 using A260/230nm and A260/280nm (Thermo Fisher Scientific, USA). The quality of the extracted DNA was evaluated with agarose gel electrophoresis, using the Tapestation 2200 and Genomic DNA screentapes (Agilent, USA). Finally, the concentration was measured fluorometrically with Quant-iT HS DNA Assay (Thermo Fisher Scientific, USA) on an Infinite M1000 PRO (Tecan, Switzerland).
16S rRNA amplicon sequencing
The procedure for bacterial 16S rRNA amplicon sequencing targeting the V1-3, V3-4, and V4 variable regions was modified from Caporaso et al. . Amplicon library PCR was performed on all replicate extractions independently. Ten ng of extracted DNA was used as template and the PCR reaction (25 μL) contained dNTPs (400nM of each), MgSO4 (1.5 mM), Platinum Taq DNA polymerase HF (2mU), 1X Platinum High Fidelity buffer (Thermo Fisher Scientific, USA), and a pair of barcoded library adaptors (400 nM). V1-3 primers: 27F AGAGTTTGATCCTGGCTCAG and 534R ATTACCGCGGCTGCTGG. V3-4 primers: 314F CCTACGGGNGGCWGCAG and 805R GACTACHVGGGTATCTAATCC. V4 primers: 515F GTGCCAGCMGCCGCGGTAA and 806R GGACTACHVGGGTWTCTAAT. Thermocycler settings for V1-3 and V4 PCR: Initial denaturation at 95°C for 2 min, 30 cycles of 95°C for 20 s, 56°C for 30 s, 72°C for 60 s and final elongation at 72°C for 5 min. Thermocycler settings for V3-4 PCR: Initial denaturation at 95°C for 2 min, 30 cycles of 95°C for 20 s, 50°C for 30 s, 72°C for 30 s and final elongation at 72°C for 5 min. The effect of different annealing temperatures (52-56-58°C), template amounts (1-5-50 ng DNA) and the number of PCR cycles (25-30-35 cycles) was tested, all performed with V1-3 primers. All PCR reactions were run in duplicate and pooled afterwards. The amplicon libraries were purified using the Agencourt AMpure XP bead protocol (Beckmann Coulter, USA) with the following exceptions: the sample/bead solution ratio was 5/4, and the purified DNA was eluted in 33 μL nuclease-free water. Library concentration was measured with Quant-iT HS DNA Assay (Thermo Fisher Scientific, USA) and quality validated with a Tapestation 2200, using D1K ScreenTapes (Agilent, USA). Based on library concentrations and calculated amplicon sizes, the samples were pooled in equimolar concentrations and diluted to 4 nM. The library pool was sequenced on a MiSeq (Illumina, USA), using a MiSeq Reagent kit v3 (2x300 PE) following the procedure in Caporaso et al. , with the exception of 10% PhiX control library (Illumina, USA) spike-in and final library loading concentration of 20 pM.
Amplicon bioinformatic processing
All sequenced sample libraries were subsampled to 50,000 raw reads and screened for PhiX contamination using bowtie2 v. 2.1.0  with standard settings and all matching reads removed. The potential PhiX contamination is due to the use of an un-indexed PhiX as a quality control, which can result in index carryover from nearby clusters with indexes.
Forward and reverse reads from the V4 amplicons were trimmed for quality by truncating reads when the median quality of the reads dropped below a Phred score of 20. The trimmed V4 reads were merged using FLASH v. 1.2.7 . For the V1-3 and V3-4 amplicons, only the first 225 bp of each amplicon were used as the quality of the reverse reads from some sequencing runs prevented a decent merging of reads.
The reads were dereplicated and formatted for use in the UPARSE workflow . The dereplicated reads were clustered, using the usearch v. 7.0.1090-cluster_otus with default settings. OTU abundance was estimated using the usearch v. 7.0.1090-usearch_global with-id 0.97. Taxonomy was assigned using the RDP classifier  as implemented in the parallel_assign_taxonomy_rdp.py script in QIIME , using MiDAS taxonomy version 1.20 , which is based on the SILVA taxonomy .
The results were analysed in R  through the Rstudio IDE (http://www.rstudio.com/). To facilitate effortless analysis and visualization of the amplicon data, we build the R package ampvis v.1.9.2 (https://github.com/MadsAlbertsen/ampvis) which builds on the R packages phyloseq , ggplot2 , reshape2 , dplyr , vegan , knitr , Biostrings , data.table , DESeq2 , ggdendro  and stringr .
Principal component analysis (PCA) was conducted, using vegan  with square root transformed OTU counts. Significance of treatments was tested, using the envfit parametric test on the first two principal components and on the Bray-Curtis dissimilarity matrix, using adonis as implemented in vegan. In addition, sample clustering was visualized using hierarchical clustering of the Bray-Curtis dissimilarity matrix. OTUs in differential abundance between treatments were identified, using DESeq2  with test = “wald” and fitType = “parametric”.
Three replicate DNA extractions used for amplicon sequencing were selected for metagenome sequencing and prepared according to the recommendations in the Illumina TruSeq PCR free protocol (Illumina Inc.). The prepared libraries were paired-end sequenced, using 2x301 bp MiSeq Reagent kits v3 on an Illumina MiSeq. The reads were trimmed, using CLC genomics workbench v.7.03 (CLCbio, Qiagen) by requiring a minimum Phred score of 20, a minimum length of 200 bp and removing any adaptors, if found. In addition, the reverse read was discarded due to relatively low quality, compared to the forward read and in order not to bias the subsequent count-based analysis. The trimmed metagenome reads were mapped to the MiDAS database version 1.20 , using the map reads to reference function in CLC genomics workbench v. 7.03, requiring 95 percent similarity over the full read length and random assignments of reads, which mapped to two sequences equally well. The results were exported as.csv files, imported to R and converted to phyloseq objects for easy manipulation and visualization. See the online documentation for the exact workflow applied.
Three replicate samples were subjected to RNA extraction, using the RiboPure-Bacteria Kit (Thermo Fisher Scientific, USA) according to the protocol, except that the input was 2.2 mg TS activated sludge biomass, and bead beating (160s at 6 m/s) was performed with FastPrep FP120 (MP Biomedicals, USA). Extracted total RNA purity was evaluated with Nanodrop1000 (Thermo Scientific Fisher, USA), quality with Tapestation 2200, using High Sensitivity screentapes (Agilent, USA) and concentration, using Qubit RNA BR Assay Kit (Thermo Scientific Fisher, USA). The extracted RNA was used for library preparation, using the TruSeq stranded mRNAseq protocol (Illumina Inc., USA) according to the manufacturer’s recommendations and sequenced, using 2x75bp MiSeq Reagent kits v3 on an Illumina MiSeq. The reads were trimmed, using CLC genomics workbench v.7.03 (CLCbio, Qiagen), requiring a minimum Phred score of 20 and a minimum length of 75 bp. In addition, the reverse reads were discarded in order not to bias the subsequent count based analysis. The trimmed metatranscriptome reads were mapped to the MiDAS database version 1.20 , using the map reads to reference function in CLC genomics workbench v.7.03, requiring 95 percent similarity over the full read length and random assignments of reads, which mapped to two sequences equally well. The results were exported as.csv files, imported to R and converted to phyloseq objects for easy manipulation and visualization. See online documentation for the exact workflow applied.
Theoretical primer coverage
The theoretical coverage of each primer-set was calculated using the TestPrime 1.0 function implemented at www.arb-silva.de  against the SILVA SSU r119 database. The analysis was conducted with 0, 1, 2, or 3 mismatches for each primer-set and otherwise default settings.
All experiments were conducted as independently sampled biological triplicates and had at least 25,000 non-unique quality filtered sequences per sample after sequencing and bioinformatic processing. If nothing else is stated, the optimized DNA extraction and amplification protocol was applied (four times the kit-recommended bead beating and V1-3 primers).
Reproducibility, resolution, and short-term storage
To investigate the reproducibility of the sampling at the wastewater treatment plant, two different aeration tanks were sampled in both ends of the basins. There was no significant effect of sampling position (padonis = 0.1, n = 12, Fig 1A), indicating that the aeration basins were completely mixed and sampling reproducible if handled properly. The variation within the biological replicates was small enough to distinguish between samples taken only weeks apart (padonis = 0.006, n = 9, S1 Fig). At OTU level, the variation was highly dependent on sequencing depth. Using three biological replicates, the variation in estimation of the mean decreased as a function of sequencing depth, until it reached a 95% confidence interval of approximately 20% of the mean, above 100 reads (Fig 1B, nsd = 11, nse = 3).
(A) No clustering by sampling site was seen in the PCA analysis of square root transformed OTU abundances (padonis = 0.53, nseq = 17000, nsample = 12). (B) OTU level variation when sequencing three biological replicates as a function of sequencing depth. Variation was measured as the 95% confidence interval in percentage of the mean OTU read count (nseq = 32000, nsd = 11, nse = 3).
In order to mimic the process of sampling and short-term storage from the wastewater treatment plant to the laboratory, samples were subjected to incubation at either 4 h at 4°C, 24 h at 4°C or 24 h at 20°C before being deposited in -80°C storage and subsequently sequenced. While there was a small effect of the short-term storage methods (padonis = 0.01, n = 9) on the overall community composition, it was small, compared to the variation between time-points months apart, but could potentially influence the conclusions drawn from samples within a weekly timeframe (S2 Fig).
The effect of removing extracellular DNA was investigated by treating samples with PMA. The PMA treatment removed all spiked-in E. coli DNA (S3 Fig), showing that PMA was able to remove naked DNA. The PMA treatment significantly changed the overall community profile (padonis = 0.001, n = 24) and reduced alpha diversity, measured as the number of observed OTUs (pt.test = 7.1e-5, n = 24). In total, 197 of 1643 OTUs were significantly affected by the PMA treatment (padj(DESeq2) < 0.001, n = 24; Fig 2). The most significantly changed OTU was OTU_11 related to the genus Trichococcus within the phylum Firmicutes (padj(DESeq2) = 5.8e-81, n = 24). Compared to the time series, the effect of the PMA treatment was large, although treated and untreated samples from the same time point were more similar to each other than the rest of the time series (S3 Fig).
(A) OTUs determined to be in significant different abundance after PMA treatment using DESeq2. Only species with a padj < 0.001 are visualized as being in significant different abundance. (B) The 10 OTUs with the lowest p-value. For all OTUs a phylum and genus classification is shown along with the OTU number.
Amount of input material
The effect of different amounts of input material was investigated by using 0.9, 2.2, or 22 mg of activated sludge (TS) as input to the DNA extraction protocol. The 0.9 and 2.2 mg samples clustered together, while the samples extracted using 22 mg of activated sludge as input clustered separately (S4 Fig).
To explore the effect of bead beating on the observed community composition, the bead beating intensity was varied around the recommended setting (40 s at 6 m/s) in the FastDNA SPIN Kit for Soil protocol. Increased bead beating had a dramatic influence on the observed community composition (Fig 3A). Compared to the time series samples, the effect of bead beating was larger than the effect of sampling 5 months apart (Fig 3B). However, despite the large effect of bead beating, it was still possible to distinguish samples from different wastewater treatment plants (S5 Fig).
(A) Percentage read abundance of the 11 most abundant phyla as function of bead beating intensity (Proteobacteria are show at class level). The data is visualized as a table with the underlying colors visualizing the changes. (B) Principal component analysis of the samples extracted with different bead beating intensities compared to the samples taken at different dates, but extracted with the same bead beating settings (160 s at 6 m/s).
The change in community composition with increased bead beating could be an effect of degradation of DNA from fragile bacterial groups (Fig 3A). However, by accounting for yield, all phylum level bacterial groups increased in absolute numbers (Fig 4). The effect of increased bead beating was highest for bacteria associated with the Gram-positive phylum Actinobacteria with an average of four times higher absolute abundance, compared to the standard settings, and a relative increase from 10.0% to 21.7% of all reads (Fig 3A). The large effect of bead beating was also observed using the V4 primers (S6 Fig) and another DNA extraction kit (MoBio, S7 Fig). The phylum level differences in extraction efficiencies were also seen at order level, although notable differences were present within each group (S8 Fig). For example, within the abundant orders of Actinobacteria, the absolute change varies from 2.7 to 6.3 fold and within Alphaproteobacteria from 1.2 to 3.7 fold (S9 Fig).
(A) The increase in yield as a function of bead beating intensity. While yield increased with bead beating, the DNA also became more fragmented. (B) Comparison of absolute phylum level differences as function of bead beating intensity. The relative read abundances were scaled by the DNA yield to obtain absolute counts and normalised relative to the standard settings (40 s at 6 m/s) to facilitate direct comparison between different phyla. The data is visualized as a table with the underlying colors visualizing the changes.
The three sets of primers showed very different community profiles, but similar diversity indices (Fig 5). For direct comparison, we sequenced PCR-free metagenomes (Fig 5, MG) from the same DNA as was used for the amplicon samples. In general, the dominant phyla in the metagenomes were also dominant in the amplicon samples. However, the V1-3 primers seemed better at capturing bacteria from the Actinobacteria and Chloroflexi, while Gammaproteobacteria were underestimated. The V4 primers seemed furthest from the observed metagenome diversity. While much of the difference observed between the primers is due to different amplification efficiencies, some orders were not amplified at all (S10 Fig). For example, the V1-3 primers successfully amplified Xanthomondales within the Gammaproteobacteria, but failed to amplify both Pseudomondales and Alteromondales from the same class.
(A) Comparison of the phylogenetic composition at phylum level using three different sets of primers (V1-3, V3-4, V4), PCR-free metagenomics (MG), stranded metatranscriptomics (MT), and the theoretical primer-set coverage with 1 mismatch allowed. The data is visualized as a table with the underlying colors visualizing the changes. (B) Comparison of alpha-diversity between the different sets of primers using diversity indices and rank abundance curves.
For qualitative comparisons we also sequenced stranded transcriptomes, which showed that the bacterial groups abundant in the DNA-based methods were also the most abundant in the transcriptomes. In addition, the qFISH analysis of a number of important probe-defined bacterial groups (S1 File) also showed a high abundance of Actinobacteria and Chloroflexi.
The theoretical analysis of the primer-set coverage showed that the V4 primers had the best coverage of both Bacteria and Archaea, while the V1-3 primers had the lowest coverage (Fig 5).
The effect of different PCR settings was tested by varying the amount of template DNA, annealing temperature, and the number of cycles. There was a significant effect of different PCR settings on the overall community composition (Fig 6, n = 20, padonis = 0.001). However, for both the amount of template and the number of cycles only small changes were seen on the individual OTUs. Interestingly, lowering the annealing temperature from 58°C to 52°C increased the abundance of some OTUs that were barely detected at 58°C, while not changing the abundance of the majority of OTUs (Fig 6).
(A) Principal component analysis of all PCR settings tested. (B) Differential abundance analysis of OTUs as an effect of different annealing temperatures. (C) The 10 most significantly differential abundant OTUs as a result of a change in annealing temperature.
Many studies have evaluated parameters that influence the observed microbial communities in DNA-based studies, but often only few parameters are evaluated, and hence their individual contributions are difficult to assess. Here we focused on the major sources of variation from sampling to DNA extraction and primer choices in order to evaluate their relative contribution to the observed variance and hence their influence on the resolution power of the analysis.
Even though wastewater treatment plants consist of large process tanks that may seem heterogeneous, we did not observe any significant effect when sampling in four separate locations of the aeration basins. The resolution with three replicates enables comparisons of samples on a weekly timescale within the same treatment plant. However, our results indicate that the weekly resolution is lost if the samples are not transported appropriately. Shipping samples at room temperature with overnight delivery should be avoided if a weekly resolution is warranted.
The influence of DNA from dead cells and potential extracellular DNA was investigated through pre-treatment of the samples with PMA that can penetrate disrupted cell walls and bind to DNA through photoactivation. PMA treatment is rarely applied in environmental samples, but has gained a lot of interest in clinical samples, where it is essential to get viable cell counts . The PMA treatment had a large effect on the beta-diversity when seen in the context of time series samples, but still enabled clustering with untreated samples from the same time-point. The most significantly affected OTU belonged to the genus Trichococcus, which has been found to be abundant in the wastewater influent , and this might indicate that a substantial proportion of Trichococcus arriving via the influent cannot survive in the treatment plant. However, even though the PMA treatment seems to alter the abundance of some community members, the majority of OTUs were unaffected. Considering that PMA treatment needs to be conducted on fresh samples, we think the additional effort to process every sample when sampled outweighs the additional benefit of removing a few potentially dead cells. However, PMA treatment might be useful in environments where the carbon turnover is less than in wastewater treatment plants.
DNA extraction in activated sludge is often carried out by the FastDNA Spin Kit for Soil [10,12,48]. Like many other methods, it relies on chemical lysis in combination with physical disruption, using bead beating. The bead beating intensity recommended by the manufacturer (6 m/s for 40 s) was insufficient in extracting DNA from many bacterial groups that are known to be difficult to lyse, e.g. Gram-positive bacteria. The increase in recovery reached a plateau for most bacterial groups at 6 m/s for 160 s. Further increase of bead beating led to a marked reduction in DNA integrity and only increased the recovery of Actinobacteria. It is unknown if a substantial proportion of bacteria remains undetected, however a qualitative comparison with the qFISH results suggests that DNA from the majority of the community members is extracted. In addition, the plateau of DNA yield and the abundance of most bacterial species after bead beating for 160 s at 6 m/s indicate that the limit was reached regarding what can be extracted by the applied method. Compared to Gram-negative bacteria, the Gram-positive species, in general, seemed more difficult to extract DNA from, although there were great differences even within orders of the same phylogenetic class. This points to other sources of variation such as micro-colony strength preventing extraction of DNA from specific microbial groups. The variation in the observed microbial composition from the bead beating experiment was larger than the variation in samples taken 5 months apart, which highlights the need for standardized DNA extraction methods, as echoed by all previous papers investigating the topic. In activated sludge, four times the normal bead beating duration (i.e. 4x40 s at 6 m/s) seems to be an optimal compromise between maintaining DNA integrity and representative extraction of bacteria that are difficult to lyse. The effect of increased bead beating was not only observed with the FastDNA Spin Kit for Soil, but also using the MoBio PowerLyzer PowerSoil DNA isolation kit, which is commonly applied in many different environments. While similar results were obtained regarding the microbial community composition, the yield was significantly lower in the MoBio PowerLyzer PowerSoil DNA isolation kit (data not shown).
The effect of primer choice was investigated using three popular primer-sets spanning different variable regions of the 16S rRNA gene; V1-3 (used by the Human microbiome project ), V3-4 (recommended based on theoretical analysis ) and V4 (used by the Earth microbiome project ). As there is no absolute truth and the primers target different variable regions, the evaluation of the “best” primer-set relies on a qualitative evaluation. The three primer sets resulted in very different community structures, even though they were applied to the exact same extracted DNA. Only the V1-3 primers captured a reasonable number of Chloroflexi, which are often abundant in treatment plants and responsible for much of the dynamic changes and potential settling problems . However, it was particularly bad at detecting specific orders within the Gammaproteobacteria, although detection of the functional important genus Ca. Competibacter  seemed fine. The V4 primer set has often been highlighted as a good general primer set as it has one of the highest phylogenetic coverages based on in silico analysis . However, in activated sludge, it greatly underestimates the abundance of Chloroflexi and Actinobacteria, which are often the predominating members of the community. Interestingly, the V1-3 primer-set performs poorly in the in silico test, but seems quite good when applied in situ. For activated sludge, we recommend the use of the V1-3 primer-set as it has a good overall agreement with the PCR-free metagenomes and captures the highest percentage of the Chloroflexi.
Finally, we tested the impact of template concentration, number of cycles, and annealing temperature during the PCR step on the observed community composition. Although some studies have shown that the number of cycles has a profound effect on the relative abundance of community members in simple communities  we did not observe any effect, which might be attributed to a more complex community, where template reannealing is limited. Similarly, we did not observe any major differences due to different template concentrations, although recent studies have shown that a higher template concentration decreases the variance . Lowering the annealing temperature resulted in amplification of several OTUs that were undetected at higher annealing temperatures; interestingly, there did not seem to be any negative effect of lowering the annealing temperature. While the experimental design employed in this study indicate the effect of individual parameters, some effects might be hidden due to interaction among parameters. These could be explored in future studies using a factorial experimental design of selected variables.
For 16S rRNA amplicon analysis in activated sludge, we recommend the use of the FastDNA spin kit for soil with four times the standard bead beating (160s at 6 m/s) and the V1-3 primer set. Complete step-by-step protocols can be found at www.midasfieldguide.org. Although we only investigated the influence of the selected parameters in activated sludge, similar effects might be expected in other environmental samples. We also recommend that the amplicon analysis be used with the MiDAS curated taxonomy  to provide a solid joint foundation for the study of microbial ecology of the activated sludge process and related treatment processes.
S1 Fig. Resolution using time-series data.
(A) PCA analysis of square root transformed OTU abundances. (B) Hierarchical clustering using Bray-Curtis dissimilarity. The variation within the biological replicates is small enough to distinguish between samples taken only weeks apart (padonis = 0.006, n = 9).
S2 Fig. Effect of storage method.
(A) PCA analysis of square root transformed OTU abundances. (B) Hierarchical clustering using Bray-Curtis dissimilarity. (C and D) The effect of storage seen in the context of the time series samples. While there was a significant effect of the short-term storage methods (padonis = 0.01, n = 9) on the overall community composition, it was small compared to the variation between samples months apart, but could potentially influence the conclusions drawn from samples within a weekly time frame.
S3 Fig. Effect of PMA treatment on alpha and beta diversity.
(A) PMA treatment removed all spiked-in DNA from E. coli. In the subsequent analysis, E.coli OTUs were removed and samples subsampled to the same number of sequences (17000) to facilitate comparisons using the whole dataset. (B) Impact of PMA tratment on the observed number of OTUs (pt.test = 7.2e-5, n = 24). (C) PCA analysis of square root transformed OTU abundances. (D) Hierarchical clustering using Bray-Curtis dissimilarity. The PMA treatment significantly changed the overall community profile (padonis = 0.001, n = 24). (E and F) The effect of PMA treatment seen in the context of the time series samples.
S4 Fig. Effect of the amount of input material.
(A) PCA analysis of square root transformed OTU abundances. (B) Hierarchical clustering using Bray-Curtis dissimilarity.
S5 Fig. Time series vs. bead beating and another WWTP.
While the resolution is lost within months using different bead beating settings, the two different WWTPs can still be separated (AAW and AAE).
S6 Fig. Effect of bead beating using the V4 primers.
Absolute abundance was calculated by accounting for DNA yield and then normalised to the standard bead beating setting (40s at 6 m/s) to facilitate comparison between phyla (Proteobacteria are show at class level).
S7 Fig. Effect of bead beating using the MoBio PowerLyzer PowerSoil DNA isolation kit.
S8 Fig. Effect of bead beating at order level (relative).
Both phylum and order level taxonomic classifications are shown (Proteobacteria are shown using classes instead).
S9 Fig. Effect of bead beating at order level (absolute).
Absolute abundance was calculated by accounting for DNA yield and then normalised to the standard bead beating setting (40 s at 6 m/s) to facilitate comparison between groups. Both phylum and order level taxonomic classifications are shown (Proteobacteria are shown using classes instead).
S10 Fig. Effect of primers at order level.
Both phylum and order level taxonomic classifications are shown (Proteobacteria are shown using classes instead). MG = Metagenome; MT = Metatranscriptome.
Conceived and designed the experiments: MA SMK ASZ PHN. Performed the experiments: MA SMK ASZ RHK. Analyzed the data: MA SMK ASZ RHK PHN. Contributed reagents/materials/analysis tools: MA SMK ASZ RHK. Wrote the paper: MA SMK ASZ RHK PHN.
- 1. Nielsen PH, Saunders AM, Hansen AA, Larsen P, Nielsen JL. Microbial communities involved in enhanced biological phosphorus removal from wastewater—a model system in environmental biotechnology. Curr Opin Biotechnol. 2012; 10: 452–9.
- 2. Sheik AR, Muller EEL, Wilmes P. A hundred years of activated sludge: time for a rethink. Front Microbiol. 2014; 5: a47.
- 3. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 2012; 6: 1621–4. pmid:22402401
- 4. Bourrain M, Achouak W, Urbain V, Heulin T. DNA extraction from activated sludges. Curr Microbiol. 1999; 38: 315–9. pmid:10341070
- 5. Yu Z, Mohn WW. Killing two birds with one stone: simultaneous extraction of DNA and RNA from activated sludge biomass. Can J Microbiol. 1999 45: 269–272.
- 6. Gabor EM, de Vries EJ, Janssen DB. Efficient recovery of environmental DNA for expression cloning by indirect extraction methods. FEMS Microbiol Ecol. 2003; 44: 153–63. pmid:19719633
- 7. Purohit HJ, Kapley A, Moharikar AA, Narde G. A novel approach for extraction of PCR-compatible DNA from activated sludge samples collected from different biological effluent treatment plants. J Microbiol Methods. 2003; 52: 315–23. pmid:12531500
- 8. Roh C, Villatte F, Kim B-G, Schmid RD. Comparative study of methods for extraction and purification of environmental DNA from soil and sludge samples. Appl Biochem Biotechnol. 2006; 134: 97–112. pmid:16943632
- 9. Shan G, Jin W, Lam EKH, Xing X. Purification of total DNA extracted from activated sludge. J Environ Sci. 2008; 20: 80–7.
- 10. Vanysacker L, Declerck SAJ, Hellemans B, De Meester L, Vankelecom I, Declerck P. Bacterial community analysis of activated sludge: an evaluation of four commonly used DNA extraction methods. Appl Microbiol Biotechnol. 2010; 88: 299–307. pmid:20652692
- 11. McIlroy SJ, Porter K, Seviour RJ, Tillett D. Extracting nucleic acids from activated sludge which reflect community population diversity. Antonie van Leeuwenhoek. 2009; 96: 593–605. pmid:19768568
- 12. Guo F, Zhang T. Biases during DNA extraction of activated sludge samples revealed by high throughput sequencing. Appl Microbiol Biotechnol. 2013; 97: 4607–16. pmid:22760785
- 13. Cai L, Ye L, Tong AHY, Lok S, Zhang T. Biased diversity metrics revealed by bacterial 16S pyrotags derived from different primer sets. PLoS One. 2013: 8: e53649. pmid:23341963
- 14. Guo F, Ju F, Cai L, Zhang T. Taxonomic precision of different hypervariable regions of 16S rRNA gene and annotation methods for functional bacterial groups in biological wastewater treatment. PLoS One. 2013; 8: e76185. pmid:24146837
- 15. Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013; 41: e1. pmid:22933715
- 16. Suzuki MT, Giovannoni SJ. Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl Environl Microbiol. 1996 62: 625–30.
- 17. Pinto AJ, Raskin L. PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PLoS One 2012; 7: e43093. pmid:22905208
- 18. Kennedy K, Hall MW, Lynch MDJ, Moreno-Hagelsieb G, Neufeld JD. Evaluating bias of Illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol. 2014; 80: 5717–22. pmid:25002428
- 19. Cangelosi GA, Meschke JS. Dead or alive: molecular assessment of microbial viability. Appl Environ Microbiol. 2014; 80: 5884–91. pmid:25038100
- 20. Nocker A, Cheung C, Camper AK. Comparison of propidium monoazide with ethidium monoazide for differentiation of live vs. dead bacteria by selective removal of DNA from dead cells. J Microbiol Methods. 2006; 67: 310–20. pmid:16753236
- 21. Flemming HC, Wingender J. The biofilm matrix. Nat Rev Microbiol. 2010; 8: 623–33. pmid:20676145
- 22. Dominiak DM, Nielsen JL, Nielsen PH. Extracellular DNA is abundant and important for microcolony strength in mixed microbial biofilms. Environ Microbiol. 2011; 13: 710–21. pmid:21118344
- 23. Allesen-Holm M, Barken KB, Yang L, Klausen M, Webb JS, Kjelleberg S, et al. A characterization of DNA release in Pseudomonas aeruginosa cultures and biofilms. Mol Microbiol. 2006; 59: 1114–28. pmid:16430688
- 24. Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS One.2012; 7: e39315. pmid:22720093
- 25. Nielsen PH, Mielczarek AT, Kragelund C, Nielsen JL, Saunders AM, Kong Y, et al. A conceptual ecosystem model of microbial communities in enhanced biological phosphorus removal plants. Water Res. 2010; 44: 5070–88. pmid:20723961
- 26. Morgan-Sagastume F, Larsen P, Nielsen JL, Nielsen PH. Characterization of the loosely attached fraction of activated sludge bacteria. Water Res. 2008; 42: 843–54. pmid:17889248
- 27. Tchobanoglous G, Burton FL, Stensel HD. Wastewater Engineering: Treatment and Reuse, 4th ed. McGraw-Hill Education, ISBN-13: 978–0070418783; 2002.
- 28. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9: 357–9. pmid:22388286
- 29. Magoč T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011; 27: 2957–63. pmid:21903629
- 30. Edgar RC. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods 2013; 10: 996–998. pmid:23955772
- 31. Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007; 73: 5261–7. pmid:17586664
- 32. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010; 7: 335–6. pmid:20383131
- 33. McIlroy SJ, Saunders AM, Albertsen M, Nierychlo M, McIlroy B, Hansen AA, et al. MiDAS: the Field Guide to the Microbes of Activated Sludge. Database 2015; Accepted.
- 34. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 2013; 41: 590–596.
- 35. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. 2015.
- 36. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013; 8: e61217. pmid:23630581
- 37. Wickham H. ggplot2: elegant graphics for data analysis. Springer New York. ISBN: 978–0387981406. 2009.
- 38. Wickham H (2007). Reshaping Data with the reshape Package. J Stat Softw. 2007; 21: 1–20.
- 39. Wickham H, Francois R. dplyr: A Grammar of Data Manipulation. R package version 0.4.0. http://CRAN.R-project.org/package=dplyr. 2015.
- 40. Oksanen JF, Blanchet G, Kindt R, Legendre P, Minchin PR, O'Hara RB, et al. vegan: Community Ecology Package. R package version 2.0–10. http://CRAN.R-project.org/package=vegan. 2013.
- 41. Xie Y. knitr: A Comprehensive Tool for Reproducible Research in R. In Stodden Victoria, Leisch Friedrich and Peng Roger D., eds., Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978–1466561595. 2014.
- 42. Pages H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: String objects representing biological sequences, and matching algorithms. R package version 2.32.1. 2014.
- 43. Dowle M, Short T, Lianoglou S, Srinivasan A. data.table: Extension of data.frame. R package version 1.9.2. http://CRAN.R-project.org/package=data.table. 2014.
- 44. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 2014; 15: 550. pmid:25516281
- 45. de Vries A, Ripley BD. ggdendro: Tools for extracting dendrogram and tree diagram plot data for use with ggplot. R package version 0.1–14. http://CRAN.R-project.org/package=ggdendro. 2013.
- 46. Wickham H. stringr: Make it easier to work with strings. R package version 0.6.2. http://CRAN.R-project.org/package=stringr. 2012.
- 47. McLellan SL, Huse SM, Mueller-Spitz SR, Andreishcheva EN, Sogin ML. Diversity and population structure of sewage-derived microorganisms in wastewater treatment plant influent. Environ Microbiol. 2010; 12, 378–92. pmid:19840106
- 48. Albertsen M, Hansen LBS, Saunders AM, Nielsen PH, Nielsen KL. A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal. ISME J. 2012; 6: 1094–106. pmid:22170425
- 49. Kragelund C, Thomsen TR, Mielczarek AT, Nielsen PH. Eikelboom’s morphotype 0803 in activated sludge belongs to genus Caldilinea in the phylum Chloroflexi. FEMS Microbiol Ecol. 2011; 76: 451–462. pmid:21299573
- 50. McIlroy SJ, Albertsen M, Andresen EK, Saunders AM, Kristiansen R, Stokholm-Bjerregaard M, et al. “Candidatus Competibacter”-lineage genomes retrieved from metagenomes reveal functional metabolic diversity. ISME J 2014; 8: 613–24. pmid:24173461