Table 1.
Characteristics of participants.
Figure 1.
Comparison of bacterial community composition reveals that the upper airway microbiota is primarily structured by body habitat.
Unweighted UniFrac was used to generated distances between oropharynx (red), nasopharynx (pink) and fecal (blue) microbiome samples, then scatterplots were generated using Principal Coordinate Analysis. The percentage of variation explained by each PCoA is indicated on the axes. The differences among communities from different body sites was significant with p<0.001 (t-test with permutation). Fecal microbial communities were from [27].
Figure 2.
Analysis of abundances of bacterial lineages demonstrates that oro- and nasopharyngeal bacterial communities cluster based on smoking status.
The relative abundance of each genus (rows) is shown by the key to the left of the figure. Communities are clustered by hierarchical clustering using complete linkage of Euclidean distance matrices. The number of times each split in the tree is seen in 1,000 bootstrapped samples is indicated at each node. The tree to the left of the heatmap groups genera together based on similarity of abundance profiles (i.e. if two genera are close in the tree, their abundance profiles across each airway site are similar).
Table 2.
Distance-based ANOVA analysis: differences in bacterial community composition between smokers and nonsmokers.
Table 3.
Bacterial taxa that distinguish airway microbial communities of smokers from nonsmokers.
Figure 3.
Partitioning airway microbial communities by smoking status using Random Forrest.
Bacterial communities from each airway site were sorted by smoking status using the Random Forests trained algorithm and compared to guessing. Misclassification frequencies are plotted by airway site and side of body. RF = Random Forrest machine. Guess = guessing alone. The lower- and upper-most bars designate the lowest and highest value excluding outliers (defined as >1.5*IQR). The bottom and top of the green boxes denote the lower and upper hinge (close to 25% and 75% quantiles). The heavy black line designates the median misclassification frequency. The distribution of misclassification errors is significantly different between the two algorithms (P – value<2.2E-16 for all airway sites, Friedman Rank Sum test) and in all airway sites, Random Forests performs better than guessing (95% Confidence Interval: oropharynx right (−0.15–−0.13), oropharynx left (−0.20–−0.18); nasopharynx right (−0.23–−0.22), nasopharynx left (−0.22–−0.20).