Table 1.
Selection of existing software packages available for amplicon sequencing data analysis.
Fig 1.
Overview of the CoMA pipeline workflow.
Different colors represent the four sub-sections of the CoMA workflow: Data pre-processing and quality checking (orange), clustering of operational taxonomic units (OTUs) and taxonomic assignment (green), data post-processing (blue) and data visualization and statistical appraisal (yellow). Labelled arrows demonstrate the order of events and name specific file types that are needed as input for each step. Taxonomic assignment is done with Blast, Lambda or RDP using either one of the available databases (e.g. Silva [23]) or any custom database provided by the user. Numbers indicate third party tools that are used for the specific CoMA step: 1 = PANDAseq, 2 = PRINSEQ, 3 = LotuS/sdm, 4 = QIIME, 5 = Mothur. TDOT = Tab-delimited OTU-table. PER = Paired-end reads. SER = Single-end reads. PCoA = Principal coordinates analysis.
Table 2.
Sites that were selected for soil analysis in order to demonstrate the functionality of CoMA.
Fig 2.
Community composition of the mock-13 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 18 bacterial genera, targeted with 16S rRNA amplicon sequencing.
Table 3.
Results of the benchmark test for four different analysis platforms.
Fig 3.
Community composition of the mock-16 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 46 archaeal and bacterial genera, targeted with 16S rRNA amplicon sequencing.
Fig 4.
Community composition of the mock-26 dataset, revealed with four different analysis platforms.
The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 11 fungal genera, targeted with ITS amplicon sequencing.
Fig 5.
Shannon-Wiener diversity (H’) of the three different soils after sequencing data analysis with CoMA, Mothur and QIIME.
Four replicates for each habitat are shown. Letters indicate significant differences across the analysis tools for each habitat. F = forest. GR = grassland. S = swamp.
Fig 6.
Principal component analysis based on archaeal and bacterial families of soil samples from three different habitats: Forest, grassland and swamp.
The color code indicates the applied data analysis tool: CoMA, Mothur and QIIME. Q1—Q4 = quadrants of the coordinate system.
Fig 7.
Venn plots showing the shared phyla, classes, orders, families and genera found with CoMA, Mothur and QIIME in the soil samples.
Data include all of the three investigated habitats (forest, grassland, swamp).
Table 4.
Key microbial families found in three different habitats: Forest, grassland and soil.
Table 5.
Unclassified reads for each taxonomic level in three different soils: Forest, grassland and swamp.