Fig 1.
A) Distributions of K for each individual. Colored points and lines are normalized histogram of the data. The black line is a maximum likelihood fit to a truncated lognormal distribution, with truncation at 10−4.5, of the values from all individuals (parameters of fitted lognormal: μ = −19.85, s = 4.93). The dashed line refers to the cutoff due to finite sampling: below this value the effect of sampling produces deviation from the Lognormal distribution; B) Distribution of σ2 for each individual (colored points and lines are normalized histogram of the data). The black line is a maximum likelihood fit to an exponential distribution of the values from all individuals (mean = 0.93). All individuals are characterized by the same distributions of carrying capacity K and variability σ.
Fig 2.
Relationships between dissimilarity measures for communities generated with the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the Carrying capacity correlation ρK. Grey circles represent the 200 pairs of communities generated with the model. Each community has S = 104 OTUs. Each pair of communities have the same σ, extracted from an exponential distribution with mean 0.9. Values of K are extracted from a lognormal distribution with parameters μ = −19 and s = 5. 100 pairs have the same values of K, to mimic samples from the same community at different times. The remaining 100 pairs have correlated values of K, with ρk ranging between 0.5 and 1, obtained by exponentiating values extracted from a bivariate Gaussian distribution. For each community, abundances are extracted from a Gamma distribution with parameters K and σ. For the pairs with the same values of K, Gamma-distributed abundances have a correlation ranging from 0 to 0.5. Reads are obtained from the real abundances by simulating multinomial sampling with number of reads 3 * 104. Black lines are the binned average of the grey circles.
Fig 3.
Comparison between the relationships between dissimilarity measures in empirical data and according to the model.
The different dissimilarity measures (A: Jaccard similarity, B: Sørensen index, C: Whittaker index D: Bray-Curtis disimilarity, E: Horn similarty, F: Morisita-Horn similarity) are plotted against the carrying capacity correlation ρK. Black circles correspond to pairs of empirical samples from the same host, while grey squares correspond to pairs of empirical samples from different hosts (but of the same dataset). The value of ρK for individual samples is inferred as specified in the Methods. Red dots are the binned average of the predictions of the model. The model is simulated with the distributions of K and σ fitted from the data, and with a number of species equal to the one estimated (see Section B in S1 Text). The number of reads is equal to the average number of reads for the empirical samples, 3 ⋅ 104.
Fig 4.
Overlap-Dissimilarity curves in the model and in empirical data.
A) Relationship between Overlap and Dissimilarity for communities generated with the model. The color of the circles corresponds to the correlation ρk of the values of K of the community pair, ranging from 0.6 to 1. The two insets show scatter plots of the abundances in two pairs of communities, one with a high ρk and one with a low ρk. Blue circles represent OTUs sampled in both communities, orange circles OTUs sampled in only one community, and red circles OTUs sampled in neither. The dotted lines mark 1/Nreads. B) Relationship between overlap and dissimilarity in empirical data (black circles: samples from the same hosts, grey squares: samples from different hosts) and according to the model (red circles, binned average of model prediction). The inset shows the same plot with a logarithmic scale on the x axis. For the main plot, the binned average of the model prediction is performed along the y axis, to better capture the pattern at high overlap values.