Estimating Bacterial Diversity for Ecological Studies: Methods, Metrics, and Assumptions

doi:10.1371/journal.pone.0125356

Fig 1.

Conceptual figure of the study design.

We have sampled 20 Swiss lakes, performed bioinformatical analyses and applied several ecological concepts on evaluating the microbial communities both with a fingerprinting method (ARISA), as well as by next generation sequencing (Illumina) from three variable regions of the 16S rRNA gene.

More »

Expand

Fig 2.

Number of observed species (SR; left side) and phylogenetic diversity (PD; right side) of the rarefied dataset from Illumina OTU data of the lake samples.

A: SR and PD estimates for the three different regions. Points show the mean SR/PD of all lake samples and lines the standard error of the mean.

B: SR of individual lakes from the V3 region plotted against SR of the same lake from the V4, respectively the V5 region dataset. The solid central line shows the 1-to-1 line, dashed lines show the Major Axis (MA) regressions of the two comparisons.

C: SR (x-axis) plotted against PD (y-axis) for each of the three regions, where each dot represents one lake sample. The different symbols indicate the three different regions. Lines show the MA regression lines for each variable region dataset.

More »

Expand

Table 1.

Major axis (MA) regression results.

More »

Expand

Fig 3.

Comparison of Jaccard and Bray-Curtis dissimilarities between the variable 16S rRNA regions from the lake survey dataset.

A: Median Jaccard dissimilarities of rarefied data for the three different variable regions. Each dot represents the median Jaccard dissimilarity from the pairwise comparisons of one lake to the other 19 lakes for one variable region. Lines connect the median dissimilarities of the same lake for the three different regions.

B: Bray-Curtis pairwise dissimilarities of V3 plotted against V4 (stars) and V5 (circles) pairwise dissimilarities of the same pairwise combination of lakes. Central line shows 1:1 line.

More »

Expand

Fig 4.

Raup-Crick (RC) comparisons between the three variable 16S rRNA regions from the lake survey dataset.

A: Modified RC probability comparison of V3 and V4 (for rarefied data). Each dot represents the RC value of one pairwise dissimilarity comparison of the V3 region plotted against the same pairwise dissimilarity comparison of the V4 region. Values between -1 and -0.975 indicate that communities are significantly less dissimilar, and values between +0.975 and +1 that communities are significantly more dissimilar than expected by chance. Values between -0.975 and + 0.975 indicate that communities are not different from random expectation. Dashed lines show boundaries of significance (-0.975 and +0.975), where points falling between -1 and -0.975, respectively +0.975 and +1 indicate significant deviations from the null-model distribution. Dark areas in the plot represent high densities of points.

B: Same as A, but for V3 plotted against V4 values.

C: Conceptual figure illustrating the four different possible combinations when two RC-matrices are compared. a (white area): both regions come to the same conclusion about the dissimilarity among communities, b (dark grey): one of the regions estimates β-diversity of one lake pair to be significantly more similar than expected by chance while the other region estimates the β-diversity of the same lake pair to be not different from a random null-model distribution, c (light grey): one of the regions estimates β-diversity of one lake pair to be significantly more dissimilar than expected by chance while the other region estimates the β-diversity of the same lake pair to be not different from a random null-model distribution, d (black): cases where pairwise lake comparison of one region estimate β-diversity to be significantly more similar than expected by random chance, while the other region estimates β-diversity to be significantly more dissimilar than expected by chance.

D: Barplot showing the number of cases where the compared regions come to the same (a) or different (b, c, d) conclusions about β-diversity. Coding is illustrated in panel C.

More »

Expand

Fig 5.

Rank-abundance evaluation of the variable 16S rRNA regions from the lake survey dataset.

A: Rank-abundance plot of the complete dataset for each of the three variable regions, where abundance data was added up for all of the 20 lakes, plotted on log-log scale. Vertical dashed lines show the range of the rank-abundance plot (ranks 12 to 440) for which we found a significant difference between the rank-abundance distributions of V4 to V3 and V5. For the same region, the V3 and V5 rank-abundance distributions did not differ significantly from each other (significant Kolomogorov-Smirnov (KS) test: p < 0.05).

B: Example rank-abundance plot of the rarefied data for one lake (Murtensee), plotted on log-log scale. X-axis: OTU rank, y-axis: OTU abundance.

C: Result of KS-test using rank-abundance data of the individual lakes. X-axis: compared regions, y-axis: p-value distribution of KS test, dashed line plotted at p-value of 0.05. Each dot represents the comparison of rank-abundance curves from two regions of the same lake. P-values below 0.05 indicate a significant difference between the the rank-abundance distributions, whereas p-values above 0.05 indicate that there are no significant differences between two rank-abundance distributions.

More »

Expand

Fig 6.

Species richness (SR) estimates from ARISA Fingerprinting plotted against SR estimates from Illumina sequencing.

Each symbol represent the SR estimates of one lake for the two different methods clustered at a SST of 97%. Different symbols represent Illumina estimates from the three different regions. Lines show major linear regressions for each variable region (regression slopes: S5 Table).

More »

Expand

Fig 7.

Barchart of the most abundant bacterial classes.

Relative abundances of the ten most abundant bacterial classes across the V3, V4 and V5 datasets. Each bar represents the relative class distribution in one lake and each group of bars represents the relative abundances for one of the tree variable regions (V3, V4, V5). Bars are ordered from left to right by alphabetical order (see S1 Fig and S3 Table for more information about the lakes). Appendant results of paired t-test statistics are shown in S5 Table. Square brackets indicate candidate class names.

More »

Expand