The authors have declared that no competing interests exist.
Conceived and designed the experiments: JPO SWK JLG. Performed the experiments: JPO SWK JLG. Analyzed the data: JPO SWK JLG. Contributed reagents/materials/analysis tools: JPO SWK JLG. Wrote the paper: JPO SWK JLG.
Microbial communities are typically large, diverse, and complex, and identifying and understanding the processes driving their structure has implications ranging from ecosystem stability to human health and well-being. Phylogenetic data gives us a new insight into these processes, providing a more informative perspective on functional and trait diversity than taxonomic richness alone. But the sheer scale of high resolution phylogenetic data also presents a new challenge to ecological theory. We bring a sampling theory perspective to microbial communities, considering a local community of co-occuring organisms as a sample from a larger regional pool, and apply our framework to make analytical predictions for local phylogenetic diversity arising from a given metacommunity and community assembly process. We characterize community assembly in terms of quantitative descriptions of clustered, random and overdispersed sampling, which have been associated with hypotheses of environmental filtering and competition. Using our approach, we analyze large microbial communities from the human microbiome, uncovering significant variation in diversity across habitats relative to the null hypothesis of random sampling.
Microbial diversity analyses have revolutionized our knowledge of the microscopic world, from terrestrial and marine to human and urban environments. This growing field rests on the evolutionary relatedness of organisms, and at its frontier is the inference of ecological processes from phylogenetic diversity. However, the rapidly reducing cost of sequencing means that computational analysis of phylogenetic data is becoming increasingly intractable. We develop a new analytical method to address this issue, providing a computationally-efficient way to compare local phylogenetic diversity to a sample from a regional pool of organisms, under a given ecological process. Our approach has both pragmatic and far-reaching applications. Until now investigators have lacked even an analytical method to compare the diversity of unequally-sized communities without throwing data away, while on a deeper level our theory provides a new framework for connecting phylogenetic data to a wide range of ecological processes. As an application of our approach, we use our methods to distinguish between random, clustered and overdispersed sampling for human microbiome habitats. Finally, we identify a new, phylogenetic analogue of the widely used taxonomic measure of diversity, the Species Abundance Distribution, and we find that it has consistent behavior across microbiome habitats.
Microbial ecology has been advancing at a rapid pace, but understanding the processes driving microbial community structure remains a challenge
Despite this success in documenting patterns of phylogenetic diversity in a broad range of contexts, our ability to translate these patterns into processes has been hampered by a lack of phylogenetic theory. The difficulty in formulating quantitative hypotheses impacts even the simplest of questions: which of two microbial communities is more phylogenetically diverse? If one community is larger than the other, we cannot answer this basic question without a theoretical hypothesis for the way we
Phylogenetic theory can address both of these issues: the pragmatic problem of comparing phylogenetic diversity in different communities, and the larger question of inferring ecological processes from phylogenetic patterns. In this manuscript we develop a way to cast many different assembly processes in a common framework, centering around the comparison of a local community of co-occuring organisms with a sample from a regional, metacommunity of organisms. We draw from the sampling theory of taxonomic diversity
As a proof-of principle application of our framework, we focus on publicly-available human microbiome data
Phylogenetic Diversity (PD) has been defined as the total branch length connecting all organisms in a phylogenetic tree, and provides a natural phylogenetic analogue of taxonomic diversity
Comparing observed patterns of phylogenetic diversity to the patterns expected under various null models provides both a normalization to take into account the difference in sample sizes across different habitats or treatments
We have developed a new, analytical approach to address these problems. Our conceptual framework links a local sample of individual organisms, a regional pool or metacommunity, and processes connecting these two scales. This perspective has a long history in ecological theory
In (A), (B) and (C) we adapt this framework for a microbial community, the human microbiome, for different definitions of the local community and the reference metacommunity. (A) shows microbiota from a single body habitat, thought of as a sample from the pool of microbiota found in the same habitat on different human subjects. (B) shows the same local community, but thought of as a sample from all microbiota across multiple humans. Finally, the local community in (C) is all microbiota from a single human subject, while the metacommunity is again microbiota from multiple habitats across multiple humans.
Our central result is an analytical method to obtain the expected phylogenetic diversity of a local sample from a larger community. In the Supplementary Information we derive the following expression for expected phylogenetic diversity,
All edges are labeled by their length, and the table shows the total edge-length summed across all edges with a given number of upstream tips. Number of tips appears in the left column, and the total edge-length associated with that number of tips,
The EAD plays an identical role in our sampling framework to the Species Abundance Distribution (SAD) in taxonomic sampling theory
Here we compute the EAD for trees inferred from 16S bacterial microbiome sequences using FastTree
Random sampling has frequently been used as a null model in phylogenetic community ecology, but with the disadvantage of needing to take random samples computationally. For a binomial sampling scheme each individual in the metacommunity has the same probability
We demonstrate our approach in
As an approximation to the real communities in
For binomial samples from a power-law EAD, we observe a power-law increase in expected PD with sample size, consistent with the power-law increase in expected PD with sample size observed in other kinds of community
We can express the expected shared branch length for two local samples as:
In one sense UniFrac distance is
We can avoid this if we have a normalization for how UniFrac changes with unequal sample sizes. We use our sampling framework to normalize shared and total branch length to provide this normalization. In
The score is calculated in terms of expected shared branch length and expected total branch length for the two samples. The black line is for samples of the same size, and colored lines are for various differences in sample size,
The human body is host to multiple microbial communities, whose combined total outnumbers our own cells by at least a factor of 10
We now focus on the deviation of microbiome communities from randomly assembled communities. The reason that random sampling is such a crucial baseline is that it lies between two distinct phylogenetic community assembly processes: local communities with lower PD than random are phylogenetically clustered, a signature of environmental filtering, while local communities with higher than random PD are phylogenetically overdispersed, indicating competitive interactions. The true relationship between clustering and overdispersion and these ecological mechanisms is likely more complicated than this simple description
To evaluate the structure of a local community with respect to a given community assembly hypothesis, we need to make an assumption about the pool of individuals or taxa that this local community can draw on: the metacommunity. The ‘true’ regional pool of organisms that a given habitat from a given subject draws upon is difficult to define unambiguously, and so our approach is to allow for various definitions of the metacommunity, and examine the sensitivity of our results on these definitions. This relevance of metacommunity definition has been highlighted before
First, we find that PD increases as a power-law function of sample size. Our definition of a local community is all reads sampled from a single habitat, from a single subject on a single sample day. We plot the results for three habitats in
We show a comparison of actual PD with expected PD under random sampling for three habitats: nose, hair and forehead. Solid lines represent expected PD from random sampling as a function of expected number of tips, dashed lines represent an upper bound on the 95% confidence interval computed using the variance in sampled PD (see Supplementary Information). Points represent actual PD from each of seven human subjects on one sampling day, with subjects labeled by a letter (M or F indicating male or female) and an identifying number. On the top row the reference metacommunity is all reads from
Next, we see that larger metacommunities make local samples more likely to appear clustered. There are clear differences between these two example habitats in terms of the range of PD across subjects. For metacommunity (1) we see that most samples are phylogenetically clustered, but several of them are overdispersed. In contrast, from metacommunity (2)
Finally, increasing Local Community Resolution reveals variation in local clustering and overdispersion. In
We show a comparison of actual PD with expected PD for multiple habitats and subjects. The lefthand panel shows the PD of all habitats sampled from subject F1, with the solid line indicating the expected phylogenetic diversity of a random sample from a metacommunity comprising all reads taken from all subjects, and the dotted line representing an upper bound on the corresponding 95% confidence interval. The right hand panel shows the total branch length for all reads sampled from each subject (again including F1 in black), with solid and dotted lines as on the left hand-side. When local samples are grouped into a single community for each microbiome, all microbiome communities are consistent with clustered sampling. On the other hand, we observe a wide range of variation in habitat PD relative to random sampling.
In this manuscript we have developed a new, analytical method to quantify and distinguish different hypothesis for the phylogenetic structure of ecological communities. Our approach centers around a new characterization of phylogenetic tree shape, which we term the Edge-length Abundance Distribution (EAD), and we find that this distribution is analogous and complementary to the Species Abundance Distribution (SAD) in taxonomic sampling theory. We observe that the EAD follows a roughly power-law distribution across a number of communities within the human microbiome. Power-law patterns in the distribution of branch lengths have been observed before in phylogenetic trees
We applied this theoretical framework to investigate whether phylogenetic diversity (PD) of local communities from the human microbiome is greater or less than the expected PD for randomly drawn samples from a metacommunity. Local community PD lower than random has been associated with the hypothesis of environmental filtering, while local community PD greater than random (phylogenetic overdispersion) has been associated with competition and competitive exclusion
Our results set the scene for a much more rigorous investigation of these issues, and we see three main future directions. For the first time, our framework has made the phylogenetic analysis of large microbial metacommunities analytically tractable, and we find that metacommunity size is highly relevant in our proof-of-principle analysis of human microbiome communities. This confirms the expectation that our conclusions about community assembly depend crucially on the definition of the metacommunity, and indicates the need for a very careful definition of the metacommunity to fully understand the processes structuring local phylogenetic diversity. Second, we have adapted tools from taxonomic sampling and applied them in a phylogenetic context, but this provides just the first steps towards developing a comprehensive theoretical toolbox for distinguishing hypotheses and predicting patterns. Directly characterizing ecological mechanisms, for example dispersal limitation, in terms of our phylogenetic sampling theory will provide a clearer connection between ecological process and phylogenetic patterns.
Finally, we have focused here on the total phylogenetic diversity of local communities
Our methods are integrated into the body of the manuscript, primarily in the
(PDF)
(PDF)
We thank Helene Morlon for comments on the manuscript.