Table 1.
List of functional groups in activated sludge.
Figure 1.
Percentages of pyrotags that could be assigned at genus level (A) and concerned functional genera (B) by three annotation methods of RDP Classifier, Best-hit and LCA.
Figure 2.
Percentages of pyrotags identified as concerned genera for various multi-variable-regions by three methods.
The cluster among columns was based on the mean Bray-Curtis distances calculated by the percentages of each genus to total pyrotags. The outer, meddle and inner rings are corresponding to remediators, operational group and potential pathogens.
Figure 3.
Agreement of pyrotags assigned as functional bacteria among RDP classifier, BH and LCA methods.
The inner, middle and outer rings are corresponded to the remediators, operational groups and potential pathogens, respectively. The percentages in the pies are the ratios of pyrotags assigned by all the three methods to total tags. Higher shared tag portions and percentage of assigned pyrotags suggested higher efficiencies of the variable regions.
Figure 4.
Comparison of consistently assigned tags by 3 methods among different sequencing regions for the moderately abundant genera.
In the left box chart, abundance is the percentage of consistently assigned tags for 4 regions (V1&V2, V3&V4, V5&V6 and V7&V8&V9, abundance of V1&V2 is the mean value of V12 and V21). Only 25 genera which were over 0.1% abundance as determined by at least one method were listed. Azoarcus had no consistently assigned tag although its abundance is high as determined by LCA annotation. Red dots showed the logarithmetics of the ratios of maximum identified tags to minimum ones among different regions. No dots for Nitrosonomas, Caldilinea, Curvibacter and Azoarcus because they get at least one 0-hit region. Percentages of 3-method consistently assigned tags to total assigned tags for each region were compared in the right box chart. Shown are the minimum, 25% quantile, median, 75% quantile and maximum. Stars indicated the average value. For each genus, colored lines link to the most abundant/consistently assigned region (green) and the least abundant/consistently assigned region (red). More links of green lines indicated the priority of the regions, such as V1&V2, while more red ones suggested the poor performance of the region (like V7&V8&V9) for these abundant genera.
Figure 5.
The consistencies between the RDP-classifier and other two methods under different confidence thresholds for different multi-variable regions.
Raw results from RDP classifier were filtered with different confidence thresholds by a Python script, and then compared with the other two methods through another script. The percentages of consistently classified tags were increased with more strict thresholds, while the solely classified tags showed the opposite trend.
Table 2.
Evaluation of taxonomic precision of RDP Classifier for the moderately abundant genera by referring to Best-Hit and the lowest common ancestor methods.