CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

doi:10.1371/journal.pone.0243241

Table 1.

Selection of existing software packages available for amplicon sequencing data analysis.

More »

Expand

Fig 1.

Overview of the CoMA pipeline workflow.

Different colors represent the four sub-sections of the CoMA workflow: Data pre-processing and quality checking (orange), clustering of operational taxonomic units (OTUs) and taxonomic assignment (green), data post-processing (blue) and data visualization and statistical appraisal (yellow). Labelled arrows demonstrate the order of events and name specific file types that are needed as input for each step. Taxonomic assignment is done with Blast, Lambda or RDP using either one of the available databases (e.g. Silva [23]) or any custom database provided by the user. Numbers indicate third party tools that are used for the specific CoMA step: 1 = PANDAseq, 2 = PRINSEQ, 3 = LotuS/sdm, 4 = QIIME, 5 = Mothur. TDOT = Tab-delimited OTU-table. PER = Paired-end reads. SER = Single-end reads. PCoA = Principal coordinates analysis.

More »

Expand

Table 2.

Sites that were selected for soil analysis in order to demonstrate the functionality of CoMA.

More »

Expand

Fig 2.

Community composition of the mock-13 dataset, revealed with four different analysis platforms.

The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 18 bacterial genera, targeted with 16S rRNA amplicon sequencing.

More »

Expand

Table 3.

Results of the benchmark test for four different analysis platforms.

More »

Expand

Fig 3.

Community composition of the mock-16 dataset, revealed with four different analysis platforms.

The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 46 archaeal and bacterial genera, targeted with 16S rRNA amplicon sequencing.

More »

Expand

Fig 4.

Community composition of the mock-26 dataset, revealed with four different analysis platforms.

The set point (SP) depicts the theoretically expected distribution and serves as reference. The dataset comprised 11 fungal genera, targeted with ITS amplicon sequencing.

More »

Expand

Fig 5.

Shannon-Wiener diversity (H’) of the three different soils after sequencing data analysis with CoMA, Mothur and QIIME.

Four replicates for each habitat are shown. Letters indicate significant differences across the analysis tools for each habitat. F = forest. GR = grassland. S = swamp.

More »

Expand

Fig 6.

Principal component analysis based on archaeal and bacterial families of soil samples from three different habitats: Forest, grassland and swamp.

The color code indicates the applied data analysis tool: CoMA, Mothur and QIIME. Q1—Q4 = quadrants of the coordinate system.

More »

Expand

Fig 7.

Venn plots showing the shared phyla, classes, orders, families and genera found with CoMA, Mothur and QIIME in the soil samples.

Data include all of the three investigated habitats (forest, grassland, swamp).

More »

Expand

Table 4.

Key microbial families found in three different habitats: Forest, grassland and soil.

More »

Expand

Table 5.

Unclassified reads for each taxonomic level in three different soils: Forest, grassland and swamp.

More »

Expand