Table 1.
Summary of available datasets by body habitat sorted by the number of WGS reads.
Figure 1.
Screenshot of the METAREP Compare Page.
The Compare page allows users to filter, compare and visualize annotation attributes across multiple datasets. As illustrated in the upper panel, the user can find and select datasets of interest (here pooled body habitats were selected). The middle panel illustrates filter and compare options (here datasets were filtered for the pyruvate dehydrogenase complex and the heatmap plot option was selected). The bottom panel shows the compare results and allows users to switch between annotation attributes and specify its level of granularity (here the taxonomy attribute and phylum level were selected).
Figure 2.
Heatmap plots of three enzymatic markers.
Marker abundance is contrasted across phyla (columns) and body habitats (rows) using Morisita-Horn distances in combination with the average linkage clustering method. Colors encode the relative abundance of the selected feature-dataset combination (dark red 0% to white 100%) while the dendograms at the top and left show annotation feature and dataset differences, respectively.
Figure 3.
Hierarchical cluster plots of 48 samples taken from 12 females and 12 males at two different time points.
Hierarchical clustering analysis of a random subset of human microbiome samples taken from five human body regions clustered by NCBI taxonomy at the family level (a) and by KEGG pathways (b). Clusters were generated by the average linkage clustering method using the Morisita-Horn index to generate a distance matrix (shown on the x-axis). Dataset labels encode the following information [donor ID]-[habitat]-[gender]-[time point]-[sample ID]-[annotation-type]. For example, the dataset label 159814214-an-m-2-SRS047225-mtr encodes a sample from a male donor (ID 159814214) taken from the anterior nares site at time point 2 with sample ID (SRS047225) annotated by the metabolic reconstruction (HUMAnN) pipeline (mtr). The dotted line represents the level at which the tree was cut for analysis. The resulting clusters are labeled as follows: AN (anterior nares), BM (buccal mucosa), SP (supragingival plaque), ST (stool), and PF (posterior fornix).
Figure 4.
Screenshots of METAREP statistical result panels.
List of phyla and pathways that are differentially abundant between the buccal mucosa (n = 116) and supragingival plague (n = 89) habitats. Taxonomic differences reported by Metastats with confidence intervals () shown in (a), differences in KEGG pathway abundance detected by the Wilcoxon rank-sum test are shown in (b).
Table 2.
Number of pathways that are differentially abundant for each statistical test and oral habitat combination.
Table 3.
Selection of KEGG pathways found to be differentially abundant in three oral habitats sorted by the ratio of the median abundances.
Figure 5.
Software architecture overview.
The METAREP software integrates several open source tools to import, store and analyze metagenomics annotations. Users can analyze stored data using a variety of web based tools. A subset of the web functionality is available via a programmatic access module which allows data retrieval directly from the MySQL database and Lucene index files.
Table 4.
Column descriptions of the METAREP tab delimited import format.
Figure 6.
Comparison of query response time for two weighted search approaches.
Each data point marks the query response time (y axis) for a query that returned x number of entries (x axis). The blue line indicates the linear fit for the weighted search approach while the red line indicates the linear fit for the distributed weighted search approach. Parameter estimations for the linear regression models are given in the boxes above the fitted lines.