Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys

doi:10.1371/journal.pone.0094249

Table 1.

Library construction primer sequences.

More »

Expand

Figure 1.

Comparison of 454 and Illumina sequence quality.

Plot depicting the median per base PHRED quality scores (Q score) for the full length 454 and merged Illumina reads from the six natural community samples. The V4 data is shown in orange, the first V4-V5 Illumina run (V4V5.Ia) is in light green, the second run (V4V5.Ib) is in dark green, and the 454 data is in blue. The size and over-lapping regions of the V4 and V4-V5 Illumina amplicons is shown in black below the quality plots. Illumina sequencing read 1 is depicted as a solid line while read 2 is dashed, with arrow heads depicting the direction of the read in reference to the E. coli base position given along the X axis.

More »

Expand

Figure 2.

The RDS processing method replicates de novo OTU clustering better than reference-based clustering.

The correlation between OTU clustering methods is shown by plotting the number of raw (a) and filtered (b) OTUs observed when using de novo OTU clustering versus reference or the RDS method. The reference-based OTU clustering results are depicted with squares while the RDS OTU clustering results are depicted with circles. Open markers indicate samples where the Greengenes 2012 reference was used while closed markers indicate samples where the Greengenes 2013 reference was used. De novo results are depicted as gray diamonds. Linear regression lines are shown for the reference and RDS datasets, with dash lines fitted to datasets processed using the Greengenes 2012–10 reference and solid lines fitted to datasets processed using the Greengenes 2013–08 reference.

More »

Expand

Table 2.

Comparisons of alpha diversity metrics produced from different processing methods.

More »

Expand

Table 3.

Alpha diversity measures of RDS processed samples after normalization.

More »

Expand

Figure 3.

Beta diversity analysis of all datasets.

Three dimensional principal coordinates analysis plots showing the relatedness of datasets using either the Bray-Curtis (A) and weighted UniFrac (B) metric. Individual datasets are represented at spheres which are colored according to their sample source as follows: human stool – brown, leech intestinum – purple, mouse small intestine – orange, mock community – blue, non-adherent rumen contents – red, mixed liquor – green, termite hindgut – gold.

More »

Expand

Figure 4.

Effect of processing method on the taxonomic composition of the mock community datasets.

Plot comparing the taxonomic composition of the mock community sample for the three different library types sequenced when processed three different ways. The replicate V4 and V4-V5 Illumina datasets were combined into one representative dataset for each library type. All taxonomic assignments were made using the RDP Classifier after retraining with the 2013-08 Greengenes reference. Taxonomic ranks are noted by letters preceding the taxon name as follows: genus – g, family – f, order – o.

More »

Expand

Figure 5.

Effects of processing method and Greengenes database version on the taxonomic composition of the termite datasets.

Plot comparing the taxonomic composition of the termite hindgut sample for the three different library types sequenced when processed using three different methods. The replicate V4 and V4-V5 Illumina datasets were combined into one representative dataset for each library type. Taxonomic assignments were made using the RDP Classifier after retraining with either the 2012–10 or 2013–08 Greengenes references. Taxonomic ranks are noted by letters preceding the taxon name as follows: genus – g, family – f, order – o, class – c, phylum – p, domain – d.

More »

Expand