Compositionally aware estimation of cross-correlations for microbiome data

doi:10.1371/journal.pone.0305032

Table 1.

Cases A, B, and C along with explanations and and overview of applicable methods.

More »

Expand

Table 2.

An overview of the transformations used to assess cross-correlations.

In VST, k is a replicate index.

More »

Expand

Fig 1.

Results on simulated data in case B.

MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column). For the cluster method, different p (number of OTUs) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different p are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 standard errors (SE). The results are based on 1000 simulated datasets where each simulated dataset has 50 replicates.

More »

Expand

Fig 2.

Results for simulated data with differing diversity in case B.

MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column). For the cluster method, different p_eff (effective number of OTUs) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different p_eff are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 SE. The results are based on 1000 simulated datasets where each simulated dataset has 50 replicates.

More »

Expand

Fig 3.

Correlations between microbial abundances and the severity of atopic dermatitis.

Results from a correlation analysis on atopic dermatitis data from Byrd et al. [38]. A: All correlations exceeding the permutation threshold m = 0.59 with color according to the sign of the correlation and with error bars given by the empirical bootstrap 95%-confidence interval. B: Scatter plot between the effective number of families and the objective SCORAD. The blue line is derived from a smooth line fitted to the data with 95% confidence intervals derived from the standard deviation. C: Scatter plot between the estimated correlations using log-TSS and SparCEV. The straight line has slope 1 and intercept 0. D: Scatter plot between the estimated correlations using SparCEV base and SparCEV iterative. The straight line has slope 1 and intercept 0.

More »

Expand

Fig 4.

Results for simulated data in case C.

MAE of different cross-correlation methods for correlation matrices generated by the cluster method (left column) and the loadings method (right column) in case C. For the cluster method, different p (number of OTUs), q (number of genes) and c (the proportion of OTUs in a cluster) are used. For the loadings method, threshold values u = 0, 0.1, …, 0.8 (cf. (7)) and different p and q are used. The lines show the mean accuracy, and the edges of the envelopes show ±1.96 standard errors (SE). The results are based on 200 simulated datasets where each simulated dataset has 50 replicates.

More »

Expand

Fig 5.

Correlation network between bacterial and fungal abundances in the root of Lotus japonicus.

Results from applying SparXCC to 16S and ITS sequencing data from the root microbiome of Lotus japonicus, from Thiergart et al. [20]. Each circular vertex represents a bacterial OTU from the 16S data and a square vertex represents a fungal OTU from the ITS data. Vertices are colored based on the phylum of the OTU it represents. Two vertices are connected by an edge if their estimated correlation is above the permutation threshold. The analysis is carried out separately for the genotypes Gifu, ram1, nfr5, ccamk, and symrk. Only cross-correlations are shown.

More »

Expand

Table 3.

Average running time for cross-correlation estimation methods for case B in seconds.

More »

Expand

Table 4.

Average running time for cross-correlation estimation methods for case C in seconds.

More »

Expand