Fig 1.
Representation of the workflow.
In an interaction network singular green and red arrows represent a commensalistic interaction and an amensalistic interaction respectively, whereas double green arrows represent mutualism and double red arrows competition. A green and red arrow signifies an exploitative interaction. See S1 Fig for more details. (A) A random interaction matrix i. This interaction matrix is implemented in the gLV model (B) together with the intrinsic growth rates and carrying capacities of the species. (C) All timeseries are (slightly) different due to the variation in the interaction strengths. (D) The partial correlations are calculated from the abundances per species sampled from the 300 different hosts at equilibrium. Only the significant correlations and the lower part of the matrix are used for the comparison with the original interaction matrix i. Variations to the workflow were studied by adding for example a perturbation or process noise.
Table 1.
The confusion matrix as used in this study.
The inferred partial correlation coefficient ρ (from the lower part of the partial correlation matrix) must have the same sign as one of the interactions in the interaction matrix A to be considered as a true positive finding in base case analysis.
Fig 2.
Scatter plots between the abundances of two bacterial species for different interaction mechanisms: (A) mutualism, (B) competition, (C) commensalism, (D) amensalism and (E, F) exploitative interactions. The abundances of the two species N1 and N2 at equilibrium are shown as scatterplots and have been obtained by running the two-species Lotka-Volterra model, with K1 = 1.5; K2 = 1.1; r1 = 1; r2 = 2 and αij drawn randomly from normal distributions with identical means and standard deviations (α12 ~ N(|0.7|, 0.2), α21 ~ N(|0.7|, 0.2)). In the case of commensalism and amensalism: α12 ~ N(|0.7|, 0.2) and α21 = 0. The two species can co-exist under certain combinations of αij (S1 Text). The grey polygon indicates the area where co-existence is possible. Note that the axes have different ranges in each subplot. Because the two species have different carrying capacities, the two situations of exploitative interactions are different. i.e., in case of exploitative interaction type 1: species 1 is exploited by species 2 and in case of exploitative interaction type 2: species 2 is exploited by species 1.
Fig 3.
The percentage of significant partial correlations (with sign matching interaction in either direction), as recovered from the base case model.
(A) For different types of pairwise interactions and (B) for the different correlations.
Fig 4.
Inference under various sources of process variability.
For the different scenario’s we show the precision, recall and the F1-score. (A) The base case model. (B) Host-specific variation in the carrying capacities and intrinsic growth rates. (C) Decreased and increased amount of measurement noise (υ) and the effect of process noise (W) (S2 Fig). (D) Interaction strengths drawn from a uniform and unimodal distribution (S3 Fig). (E) The results for a 30 species system, a network based on a producer-consumer structure and a network with hub interactions (S4 Fig). (F) The effect of network inference when specifying the intended sign in correlation analysis, as the sign of the strongest interaction in each pair of species, or by setting the rule that the sign of both interactions must be matched by the inferred correlation coefficient (strict inference). (G) Three scenarios with 3000 hosts, for the base-case with random interaction networks as well as for the scenarios with structured (i.e. producer-consumer and hub-species) networks. Network inference was assessed by the F1-score, which measures agreement between the interaction matrix in the gLV model and the inferred partial correlation matrix on a scale from 0 (no agreement) to 1 (perfect agreement) (according to the rules of Table 1). The dashed line indicates the median result from the base case model. The bars of the boxplots indicate the variability of the data outside the middle 50% (i.e., the lower 25% of scores and the upper 25% of scores).
Fig 5.
The effect of a perturbation on correlation-based network inference.
(A) Example of a timeseries. Dashed lines represent sampling timepoints. Sampling was performed during the perturbation (t1 = green, t2 = yellow, t3 = blue and t4 = grey) and at equilibrium (t5 = dark blue). Alternatively, sampling was performed randomly between t = 100 and t = 1000 (random = pink). (B) Results (F1-scores) of network inference for sampling at various timepoints. After a perturbation all species grow back to their original equilibrium. The bars of the boxplots indicate the variability outside the middle 50% (i.e., the lower 25% of scores and the upper 25% of scores). Dashed lines represent median results of sampling during equilibrium.