Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Extensive novel diversity and phenotypic associations in the dromedary camel microbiome are revealed through deep metagenomics and machine learning

Abstract

The dromedary camel, also known as one-humped camel or Arabian camel, is iconic and economically important to Arabian society. Its contemporary importance in commerce and transportation, along with the historical and modern use of its milk and meat products for dietary health and wellness, make it an ideal subject for scientific scrutiny. The gut microbiome has recently been associated with numerous aspects of health, diet, lifestyle, and disease in livestock and humans alike, as well as serving as an exploratory and diagnostic marker of many physical characteristics. Our initial pilot analysis of 55 camel gut microbiomes from the Fathi Camel Microbiome Project uses deep metagenomic shotgun sequencing to reveal substantial novel species-level microbial diversity, for which we have generated an extensive catalog of prokaryotic metagenome-assembled microorganisms (MAGs) as a foundational microbial reference database for future comparative analysis. Exploratory correlation analysis shows substantial correlation structure among the collected subject-level metadata, including physical characteristics. Machine learning using these novel microbial markers, as well as statistical testing, demonstrates strong predictive performance of microbial taxa to distinguish between multiple dietary and lifestyle characteristics of dromedary camels. We present strongly predictive machine learning models for camel age, diet (especially wheat intake), and level of captivity. These findings and resources represent substantial strides toward understanding the camel microbiome and pave the way for a deeper understanding of the nuanced factors that shape camel health.

Introduction

The gut microbiome has been recently subjected to intensive study, and is found to be associated with numerous aspects of host physiology, including health and disease in humans and many domestic animals. These associations have improved our understanding of the interplay between environment and host in the context of outcomes of interest, and have resulted in biomarkers for conditions as varied as diseases and behavioral traits. In the context of agriculturally significant livestock, the microbiome may provide valuable insights into the animal well-being, which may have significant downstream economic implications [13]. As demonstrated by the Human Microbiome Project (HMP), which extensively cataloged the microbial populations within and upon numerous body sites of healthy humans, substantial differences in microbial taxonomy and population structure exist even within healthy individuals [4]. Notably, each body site exhibits distinct functional repertoires and ecological properties, meriting further study of specific body habitats for their respective contributions to overall host physiology. For example, each body site has its own natural diversity of microbes within it (alpha diversity), which in turn may be altered dramatically in response to health, environment, or lifestyle factors. The degree to which different microbiomes differ with respect to their microbial composition (beta diversity) also reveals ecological trends among hosts. This diversity, along with the taxonomic composition of the microbiome and its predicted functional capacity, have been shown to be associated with host health and environmental conditions [5]. The apparent host-specificity of the microbiome, particularly in the gut where diversity is high, along with its responsiveness to various conditions, suggests both descriptive and predictive potential of the microbiome, as well as a path toward individualized approaches to managing health and disease in livestock.

The role of the diet in shaping the microbiome of ruminant livestock has recently been explored. One recent study by De Menezes et al. highlighted significant differences in the rumen microbiome of dairy cows between cows that were fed via pasture grazing as opposed to a total mixed ration diet. Although this shows a clear influence of diet on one aspect of the gastrointestinal microbiome, the hindgut (fecal) microbiome was not investigated [6]. In another study, Tapio et al. present complex microbiome alterations induced by different forage:concentrate ratios with or without sunflower oil supplementation in dairy cows, revealing diet-specific shifts in diversity, taxonomic composition, and microbial co-occurrence patterns [7]. The implications of these shifts highlight the potential utility of the microbiome as a reporter or biomarker for the effects of diet and the environment, with practical ramifications for agricultural practices and animal nutrition. A better understanding of these associations may in turn enable detection of anomalies (such as disease) and optimization of feeding strategies to improve livestock health and productivity.

Although the rumen in true ruminants is anatomically distinct from the pseudo-ruminant foregut [8] of the dromedary camel, they play similar functional roles as primary degraders of complex plant material via microbial symbiosis in the upper digestive tract. A broad survey of the rumen microbiome across animal species [9], although lacking camels, includes microbiome samples from the rumens of pseudo-ruminant camelids due to this functional similarity, and additionally highlights the substantial role played by diet in shaping the microbiome of the rumen across all species studied. The effect size of dietary association was found to be larger than that of all other host factors measured. This close diet-microbiome relationship in the rumen is justified considering the rumen’s specialization in primary fermentative microbial digestion of complex plant material.

The metagenomic rumen microbiome of dromedary camels has itself been studied previously in at least two instances with shotgun sequencing. In one study of 3 dromedary camels that underwent shotgun metagenomic sequencing (7.7Gbp per sample), Gharechahi and Salekdeh [10] demonstrate the potential of metagenomic shotgun sequencing to analyze these poorly characterized communities, including limited genome reconstruction and carbohydrate-active enzyme (CAZyme) characterization using DNA assemblies. A high number of CAZyme annotations was found, as might be expected of microbiota specializing in complex plant fiber degradation. The authors also note the biases and limitations of previous studies using only 16S rRNA gene amplicon sequencing. In another broader study [11] of 48 rumen samples which underwent shotgun metagenomic sequencing (<2.47Gbp per sample), Hinsu et al. recapitulate the major finding of the aforementioned broad survey of rumen microbiomes [9] in dromedary camels; namely, that diet is a primary driver of the rumen microbiome composition. The study also confirms a high proportion of CAZymes in the shotgun sequencing data like Gharechahi and Salekdeh. Interestingly, Hinsu et al. go on to note the high prevalence of novel lineages of microbes detected, both in terms of uncharacterized genera identified by placeholders in existing databases, as well as novel genera that lack any representative genomes.

However, despite these early efforts to characterize the rumen microbiome of dromedary camels, the fecal microbiome, in contrast, remains largely unstudied at the resolution of shotgun metagenomic sequencing apart from a single older study by Dande et al. [12] of two pooled fecal samples sequenced at extremely low sequencing depth (<0.03 Gbp per sample). In that study, a total of 2 fecal samples were sequenced, not corresponding to individual camels but rather a pooled conglomerate of 2 camels each. This small sample size, low shotgun sequencing depth, and inability to distinguish subject-level microbiota due to the pooling process, all make it difficult to infer a statistical baseline against which to directly compare subsequent studies. Indeed, all of the camel metagenomic studies mentioned, including those studying the rumen, suffer from severe limitations in sample size, which preclude robust statistical comparisons due to lack of power.

Furthermore, it was also difficult to locate any study directly comparing the rumen to the fecal metagenome in dromedary camels, which might have otherwise established the extent to which studies of the rumen microbiome could generalize to the feces. However, some hints exist in the microbiome of a proximal species. The Bactrian camel, Camelus bactrianus, known for its distinctive pair of dorsal humps, has been the subject of microbiome research where the rumen and the fecal microbiome, along with multiple other regions along the digestive tract, were directly compared [13] using 16S amplicon sequencing. This effort revealed remarkable differences in microbial communities between the rumen and the stool, where the rumen had the lowest concordance with the colon/feces among all other gastrointestinal regions with which it was compared (abundance correlation of approximately 0.43 within animal at the genus level), along with a significantly higher alpha diversity than the feces. Taken together, this highlights the lack of readily comparable previous work in the fecal shotgun metagenomics of dromedary camels. Furthermore, amplicon-based studies (16S rRNA) have limitations in taxonomic resolution, as well as potential microbial abundance biases, as they focus on a single hypervariable 16S region, potentially providing a substantively incomplete picture of the gastrointestinal microbiota, limiting direct comparisons to shotgun metagenomics especially in cases where a substantial proportion of the community is novel [10, 14].

Likewise, few microbiome studies in camels have attempted to cross over into machine learning. Methodologically, while machine learning has contributed to measurable improvements in data analysis across various fields, its application to microbiome research has been largely concentrated on human subjects [15]. This leaves a significant gap in the understanding of animal microbiomes. One study demonstrated the utility of machine learning by training several models to predict colonic diseases using human microbiomes [16]. Machine learning models trained on fecal 16S rRNA [17] and metagenomic shotgun [18] data have shown significant potential in predicting diseases such as colorectal cancer. Another study employed metagenomic sequencing to probe the relationship between the gut genes of premature infants and their survival strategies in response to specific clinical and environmental conditions, revealing that formula feeding correlates with an increased presence of certain antibiotic resistance genes in the infant gut microbiome [19]. These examples underscore the potential of machine learning to reveal patterns that might not be discernible through traditional univariate statistical analyses, and hence may be important for better understanding the microbiome’s associations with health and other phenotypes. Therefore, we aim to bring these tools to the study of the dromedary camel as well.

This study aims to provide a comprehensive characterization of the microbial composition of dromedary camels, advancing the current understanding of the microbial diversity in the camel hindgut. By combining deep metagenomic shotgun sequencing, data mining, statistical analysis, and machine learning, we explore the camel microbiome and its host associations more comprehensively than previous work. This integrative approach is expected to pave the way toward an appreciation of the numerous factors influencing camel wellbeing and, by extension, the welfare of the human communities, economies, and ecosystems that depend on them.

Materials and methods

Ethics statement

This research was conducted in accordance with ethical standards set by the Research Ethics Committee at the University of Tabuk. Approval for the study was granted by the Research Ethics Committee under approval number UT-347-177-2022. Site owners (and farm staff, if the site was a farm) supervised all sampling activities. Governmental and local ordinances in Tabuk, Saudi Arabia do not require additional approval for passive collection and investigation of animal excrement from the ground.

Camel fecal specimen collection and handling

Freshly excreted stool was sampled from 55 camels from the Tabuk region of Saudi Arabia from a variety of lifestyles and regional topographies. One fecal sample was collected from one fecal deposit from each individual animal. In brief, the collection protocol consisted of waiting until a camel dropped feces onto the ground, identifying the largest pellet without visible sand or dust on its surface, then immediately collecting the clean inner portion of the freshly passed stool with forceps and suspending 0.5-0.7g of material into 99% ethanol buffer (to preserve the specimen and prevent microbial growth [20]) within a pre-labeled specimen tube. Sampling was performed under supervision of local site managers. 27 distinct herds of camels were selected for our investigation, and at most 4 camels were sampled per herd (avg 2.04/herd). Camels were not interacted with during the sampling process.

Sample storage and processing

Samples were stored under -20 C refrigeration while collection took place over a period of 30 days. Samples were shipped to BGI Hong Kong for DNA extraction and sequencing using the standard BGI complete DNA microbiome extraction kit. The extracted DNA was then sequenced using the DNBseq platform at 2 x 150 bp at 55 M sequencing pair read depth (110 M total reads per sample). Data was transferred via AWS S3 from BGI to a 64-core AMD Ryzen Threadripper Pro 5995WX server with 1.5 terabytes of RAM for downstream analysis.

Raw data analysis

The metagenomics workflow from sample receipt, to extraction, to library prep and QC, sequencing, and read QC, was performed by BGI as part of their standard metagenomic shotgun sequencing offering. We also ran SHI7 quality control to confirm that the quality control performed by the provider was sufficiently high, indeed resulting in less than 0.2% additional read filtering in the worst case (sample 31 A), with average read length greater than 149.5 bp [21]. Host read filtering was not performed, as there were no concerns around de-identification, and minimal concern about reads mapping to contaminated reference genomes, both due to use of a custom database (see below) and the use of genome coverage thresholds in filtering the relative abundance table (see statistical and machine learning analysis methods below). Both single-sample and pooled assemblies were performed; in order to combine data from both methods, species-level representative genomes from the pooled assembly were only retained if a genome of the same species was not assembled from any single-sample assembly. Assembly was performed using megahit v1.2.9 [22]. MAGs were identified and binned using metabat2, and quality was assessed using CheckM2 v1.0.2 [23]. MAGs with assessed completeness >50% and contamination 5% were retained and clustered at 95% ANI in R from a distance matrix formed using aKronyMer v1.0 [24] using ANI GC LOCAL distance parameters with k-mer size 13. Representatives were selected from each cluster on the basis of highest aggregate completeness and lowest contamination (maximizing a score defined as ).

Taxonomy was assigned with GTDB-tk v2.3.0 with GTDB R214 [25]. As most MAGs were unable to be placed within existing species-level designations, we opted to use a MAG profiling approach rather than existing microbiome databases used by popular tools which lack these novel species. Therefore, the resulting set of MAGs was used to create an XTree database for downstream analysis and profiling [26]. Read counts and unique coverage profiles were generated by XTree for all 55 samples, and the resulting representative species-level profiles were compiled into a species-level taxonomy table. An average of 74% of reads were able to be mapped back to the resulting MAG database. To assign a unique species-level placeholder name when GTDB was unable to assign a reference species name, we used an arbitrary genome ID as a placeholder species name. Genus-level aggregation was also performed to assign a consistent placeholder name to genus-level representatives from species that clustered together at 90% ANI (and <95%) using the same approach outlined above.

Statistical and machine learning analysis

Correlations were performed in R (v4.3.0). Species richness was computed by summing the number of non-zero species per sample, defining a genome as present when its genome is greater than 25% uniquely covered in that sample. Beta diversity was calculated using R’s cmdscale function on log10-scaled species relative abundances (log euclidean). Bray-Curtis dissimilarity was also used for comparison. A machine learning model was fitted with the randomForest package in R (using 5000 trees and default parameters without hyperparameter tuning), using the relative abundance data to predict covariates in the metadata, including dietary and lifestyle features. Reported ML performance scores (AUC for binary features) were calculated using random forest out-of-bag predictions; feature importance was assessed using the Gini index. For increased generalizability in this 55-sample dataset, genus-level taxonomy was used for all machine learning and differential analysis.

Results

The Fathi Camel Microbiome Project resulted in the collection and metagenomic sequencing of fecal material from 55 camels, along with a number of metadata fields (covariates) per camel. This allowed for the generation of a robust database of metagenome-assembled genomes (MAGs), as well as the ability to perform statistical associations.

Demographic summary highlights the diversity of sampling

To reflect potential metagenomic diversity in camel microbiomes, we sampled from a diverse set of camels from the Tabuk region of Saudi Arabia. The demographic summary of animal characteristics is presented in Table 1 below, highlighting a diversity of ages, diets, and habitats of these 55 camels across 27 herds.

thumbnail
Table 1. Demographic characteristics of the FCMP, separated into numerical variables (A) and categorical variables (B). Coding was determined as specified in (C). More collected variables are provided in the supplementary metadata table.

https://doi.org/10.1371/journal.pone.0328194.t001

A comprehensive prokaryotic genome database for camel microbiome analysis

An important outcome of this effort was the generation of a prokaryotic reference genome database from direct assembly of deeply sequenced camel fecal metagenomes. A total of 3,165 species-level prokaryotic (bacterial and archaeal) genomes were produced with CheckM2 completeness 50% and contamination 5%, per standard MiMAG quality criteria [27]. After taxonomy assignment, 726 genera were found, with 55 containing > 10 species. Up to 151 genera were novel (as they contained no CheckM2 identification). 2,740 of the 3,165 species-level genomes (87%) represented novel species without any species-level database reference or representative genome in the GTDB. This high level of novelty is expected given the lack of deep metagenomic sequencing efforts in dromedary (Arabian) camels apart from the current study. As shown in Fig 1, the phylogenetic tree was visualized using iTOL [28].

thumbnail
Fig 1. An extensive set of reference genomic MAGs spanning 3,165 new species-level representatives.

Average nucleotide identity-based ladderized circular cladogram depicting approximate relationship among the genomes, generated by iTOL. Narrow, deeply-branched regions indicate phylogenetic singleton genomes, while wider clades near the edges indicate the presence of more closely-related species. The leaves are colored by phylum designation per GTDB-tk R214.

https://doi.org/10.1371/journal.pone.0328194.g001

On average, 74% of the raw reads were mappable back to the FCMP MAG database, indicating reasonable representation of the prokaryotic microbiome in this set of camels. Incidentally, a BLAST analysis using the NCBI nt database on a random subset of 100 contigs (of length under 200 kbp and low multiplicity of 1-2x coverage) not binned into MAGs revealed an assortment of unknown DNA (no matches with default or sensitive blastn parameters), plant matter from wheat (94–98% identity matches) and barley (85% matches), unknown plant material (perhaps local desert grasses or shrubs; including a 70% match of <10% sequence length by blastn to various plants), and unknown eukaryotic DNA (including 2 matches with 75% identity to various insects). However, our analysis focuses on the prokaryotic members of the camel hindgut microbiome, and further analysis of the non-prokaryotic “dark matter” in the stool will be left to future work.

Taxonomic visualization across the dataset

To visually ascertain the degree to which microbes are prevalent and abundant across the dataset and the variability associated with them, we generated stacked barplots of each camel’s microbiome in the dataset, sorted by most abundant microbes, An aggregated summary of the prevalence and abundance of the novel taxa is presented in each figure as “FCMP novel”, representing all the novel taxa at a given taxonomic rank. At the class level, novel classes do not enter into the most abundant 25 classes in the dataset, so they are not displayed, [see Fig 2A]. At this level of taxonomic summarization, it is apparent that 70–75% of the dromedary camel microbiome by relative abundance is dominated by classes Bacteroidia and Clostridia. At the family level [see Fig 2B], we see novel families (indicated in maroon) appear throughout the dataset. The overall microbial composition at the family level is dominated by CAG-272, UBA932, Paludibacteraceae, Lachnospiraceae, and Bacteroidaceae.

thumbnail
Fig 2. Class and family relative abundance across the dataset.

(A) Class-level relative abundance, sorted by fraction of “Other" (classes not within the top 25 by abundance across the dataset). (B) Family-level relative abundance, sorted by decreasing fraction of “Other" taxa (taxa not within the top 25 by relative abundance across the dataset). Plots are truncated to the minimum level of unknown taxa for visualization purposes, hence the truncated y-axis.

https://doi.org/10.1371/journal.pone.0328194.g002

At the genus level [see Fig 3A], there is substantially more diversity than can be represented by the top 25 genera (the “Other” category, encompassing all genera that are not in the top 25, occupies the majority of most microbiomes), but here we see the novel genera discovered by the present study (in green) are the third most abundant members overall (following Cryptobacteroides and RF16). At the species level [see Fig 3B], it is apparent that novel species (in blue) occupy the majority of the microbiome. Notably, sample 42A is an extreme outlier here like it was in the beta diversity analysis. 42A is the youngest camel in the dataset at just 3 months of age versus the dataset average of 6 years, and the only camel exclusively breastfed. This difference becomes more distinct at finer taxonomic levels.

thumbnail
Fig 3. Genus and species level abundance across the dataset.

(A, B) Genus-, and species-level relative abundance, each sorted by decreasing fraction of “Other” taxa (taxa not within the top 25 by relative abundance across the dataset). Plots are truncated to the minimum level of unknown taxa for visualization purposes, hence the truncated y-axis.

https://doi.org/10.1371/journal.pone.0328194.g003

The present study’s finding of Cryptobacteroides (formerly a member of Bacteroides), Alistipes, and Treponema among the most abundant set of (named) genera in dromedary camel feces is consistent with the findings of [12], but the larger number of novel and uncharacterized genomes discovered here dominate the microbiome and highlight the importance of building out infrastructure to quantify the microbial novelty within the dromedary camel fecal microbiome. Indeed, most of the microbes discovered here are completely missed by the use of older, limited databases consisting solely of well-known microbes.

Associations between metadata and diversity

A pairwise all-vs-all spearman correlation among continuous variables showed some expected correlations between collection-related variables, such as time of day and temperature, [see Fig 4A]. Some biological variables also clustered by correlation, such as Bristol index (stool consistency) with stool darkness in one cluster, and species richness with age in another. As expected, level of captivity inversely correlated with the amount of grazing reported, as well as species richness and age (younger camels are more likely to be kept in controlled auction environments).

thumbnail
Fig 4. Principal components plots reveal beta diversity distribution colored by various metadata variables (covariates).

(A) Symmetric heatmap of pairwise spearman correlations between numerical metadata. (B) shows the entire set of 55 camels colored by herd ID. The outlier in the bottom right is from a camel that was in its third month of life, and was removed (for plotting purposes only) for subsequent beta diversity plots in (C). (C) Beta diversity (log euclidean distance) colored by significantly associated covariates as determined by PERMANOVA p < 0.05. value (effect size of association) is also displayed under the x axis along with the p-value of the association. For continuous variables, the legend shows the minimum, maximum, and midpoint of the distribution of values plotted. Percentages in axes labels are in terms of percent variance expressed by each axis. The points are in the same place in all plots; only the colors and statistical results differ by metadata variable. (D) Similar to (C) but using Bray-Curtis beta-diversity and the significant associations found using this metric.

https://doi.org/10.1371/journal.pone.0328194.g004

A visual inspection of the herd-labeled beta diversity ordination, [see Fig 4B], appears to confirm the expectation that camels from the same herd have more similar microbiomes, with a close clustering of like-colored points. A clear outlier microbiome sample is also apparent in the same plot, which was collected from a very young calf (in its third month of life) which was exclusively breastfeeding, unlike any other camel in the dataset (the next youngest is twice its age and consuming solid food).

We tested each metadata variable in the binary and numerical sets using PERMANOVA on the beta diversity distance matrix to determine the significance of the association, and displayed all significant results as colored annotations in [see Fig 4C]. Notably, species richness was a significant driver of beta diversity, forming a clear gradient along PC1, and age followed a similar trajectory (age is also associated with species richness directly; spearman r = 0.61, p = 7.4e-7; pearson rho = 0.48, p = 0.0002). Bristol index and stool darkness followed similar gradients along PC1.

Captivity factor (the degree of captivity), however, exhibits a reverse relationship. Of note, although dietary diversity does not produce a significant association, individual dietary components (milk, barley) show some evidence of microbiome clustering. Pregnancy status and gender also produce significant separation. Notably absent from the significantly associated beta diversity results are disease status (we return to this point in later analyses), weight, and number of co-housed camels. We also evaluated Bray-Curtis dissimilarity in discriminating between camels by covariates, but found that it discriminated less well between microbiome samples with respect to metadata, with only 3 significant associations found (wheat diet, captivity status, and color) [see Fig 4D].

Differential analysis and machine learning

Relative abundances of genera were used as features to train a random forest model across all metadata variables, [see Fig 5]. Additionally, the top-scoring genera by feature importance were also separately visualized and analyzed using univariate regression and associated statistics. Strikingly, some variables, including all binary dietary features, yielded strong predictive performance (OOB ROC AUC > 0.72), including some features that were not significantly distinguished by earlier analyses of diversity metrics (community-wide), or only with specific beta diversity metrics. This is especially apparent with the prediction of dietary wheat, which produced a non-significant association with beta diversity using log-euclidean distances (PERMANOVA p = 0.13) and a weakly significant association using Bray-Curtis (p = 0.032), yet a 0.9 AUC value by random forest prediction using genera relative abundance. Considering the wheat feature in particular is fairly class-balanced (29 camels are not wheat consumers, and 26 are), this predictive performance is less likely to be artifactual than other results with less balanced classes such as milk or grass, where only 3 camels did and didn’t consume these dietary sources, respectively. Milk and grass consumption are also confounded by age and wheat consumption, both of which are represented by highly predictive models.

thumbnail
Fig 5. Visual summary of machine learning and statistical association testing.

The left-most figure in each panel shows the random forest performance of a model trained on the given variable. Binary variables were used to run random forest models in classification mode, whose performance is conveyed using a colored Receiver Operator Characteristic (ROC) curve, colored by the class probability threshold used for predicting class membership and labeled for the predictive performance based on Area Under the Curve (AUC). The identity (y=x) function has been added, denoting the performance of a hypothetical random model (AUC = 0.5). Continuous and ordinal variables were subjected to classification by random forest, which is shown as a scatterplot of the actual value against the out of bag prediction. The identity (y=x) line has been added, denoting hypothetically perfect predictive performance (all predicted values exactly match the true values). Points are colored by their distance from the identify function; more intense red signifies poorer prediction for those values. To the right of each machine learning performance summary plot are two plots showing the log10 relative abundance of the top 2 genera (highest Gini index for the random forest), as either boxplots (classification) or univariate scatterplots (regression) of the genus vs the variable. Each univariate genus plot is labeled with the Gini importance score (“Imp”), as well as both the p-value and coefficient from a univariate model fit (logistic model for binary variables, linear models for the rest).

https://doi.org/10.1371/journal.pone.0328194.g005

The biological interpretation of these results can be challenging, particularly where the taxonomy is poorly-defined, but insights emerge regardless. The top genera predicting wheat consumption, CAG-194 and UBA3766, are both from family Lachnospiraceae, a family which has been shown to be associated with wheat diets in dogs [29]. Comparatively little is known about the role of Candidatus Avigastranaerophilus, a genus which is depleted in camels eating barley diets, other than that it was initially discovered in chickens [30]. As another example, the association of a Malacoplasma genus (of family Mycoplasmoidaceae) with age, particularly within the first 3 years of life, is notable if a bit unexpected. One possible explanation is that the fermentative capacity of young camels may be developing rapidly in the first few years of life, and this microbe may increase in abundance alongside the development of the young camel’s fermentative capacity, feeding off of the fermentation process more effectively over time (some adjacent clades like Mycoplasma contain well-known saprotrophs). Genus Odoribacter’s dual top associations – positively with captivity and negatively with dietary diversity, make sense in context of the (weak) negative correlation between these two factors themselves. Odoribacter is known to be associated with dietary and environmental factors in humans [31], and might plausibly rise in conjunction with a more controlled and potentially less diverse diet in captivity.

Univariate nonparametric statistical association testing

In addition to machine-learning-based random forest modeling, classical non-parametric statistical tests (spearman rank correlation, Wilcoxon rank-sum test) were conducted [see Table 2] to evaluate the univariate association of each microbial genus with each metadata outcome variable, which was followed by multiple hypothesis testing correction using FDR. Notably, the Bifidobacterium genus was found to be positively associated with dietary diversity (number of distinct food sources in a camel’s diet), a number of uncharacterized (but GTDB-recognized) genera were negatively associated with the Bristol stool scale, and a number of novel genera (FCMP prefix) were found to be associated with camel age. Many results recapitulate the features found most important in the random forest model. For example, genus FCMP.c-21.1964, a member of Gastranaerophilales family RUG14156, was found to be more abundant in female camels in both approaches. Likewise, RUG626, a member of family Oscillospiraceae, was found to be associated with pregnancy, and FCMP.c-30.4792, a member of Clostridia order TANB77 family UBA1234, was found to be associated with age, and so on. But some of the top organisms do not perfectly align between the two methods. For instance, the top 2 predictors of diet diversity in the machine learning model do not overlap with the 2 most associated genera by univariate testing. Univariate association testing also revealed a possible slight depletion of genus FCMP-50A 17.82 of family Peptococcaceae in diseased camels (FDR p = 0.077), despite a lack of predictive ability in the machine learning model of the same.

thumbnail
Table 2. Univariate association testing reveals genera associated with various outcomes (top associations). Spearman correlation was used for continuous variables, and Wilcoxon rank-sum tests were used for binary variables. P values were adjusted after all testing using Benjamini-Hochberg FDR. Variables in italic indicate binary variables (Gender, Pregnancy, Diet_barley, Diet_wheat). The Direction column indicates for continuous variables (or ordered factors) whether the genus relative abundance is positively (+) or inversely (-) correlated with the outcome variable, and for binary variables displays which outcome has the higher relative abundance (e.g., “Yes" means the genus is higher in the “Yes" class). Magnitude indicates rho (Spearman correlation coefficient; for continuous variables) or log10 fold change between the two classes (for binary variables).

https://doi.org/10.1371/journal.pone.0328194.t002

Discussion

The Fathi Camel Microbiome Project pilot provides powerful early insights into the diversity and ecological drivers of the dromedary (Arabian) camel microbiome, and lays the groundwork for subsequent analyses and expansion. Although some significant limitations exist at this stage, such as small sample size and limited geographical range of sampling, the study design compensates for lack of breadth with greater depth: the use of ultra-deep whole-metagenome shotgun at 110 million total read depth per sample is unprecedented in the literature for the study of camels and most other livestock. Accordingly, this work adopts a metagenomic analysis approach more common in human metagenome studies. Furthermore, there is considerable potential for the genomic reference databases generated here to more comprehensively characterize future (including shallower) metagenomic studies in camels, as alignment of metagenomic data against an existing reference database (such as that produced by this study) can substantially reduce the barrier to entry for future work in this agriculturally and culturally important area.

Other limitations include the potential for environmental contamination, a risk inherent in the study of animal fecal communities. This risk was somewhat mitigated by sampling only the inner portion of freshly dropped fecal material, devoid of visible sand and dust. Another limitation is using out-of-bag prediction performance reporting for our random forest models, rather than an explicit cross-validation framework. This is largely due to sample size constraints. Random forests have been previously reported to work well in the high-dimensional low-sample-size domain for microbiome data [32], and out-of-bag prediction performance, which while often representative, may in some cases be optimistic or inaccurate. Multiple steps were taken to mitigate overfitting risk, including using 5000 individual trees in the random forest ensemble (increasing the diversity of decision trees to average out bias and stabilize test error), avoiding hyperparameter tuning, not comparing multiple models in pursuit of any “best" model, and most importantly, including a secondary, independent statistical validation that adds confidence in the results using classical univariate tests with p-values. Nevertheless, the purpose of this approach as implemented in this study is not to create a robust production model, but as a proof of concept to highlight areas of biological interest warranting future follow-up and secondary validation with future sampling efforts.

Some of the strongest associations reported in this analysis relate to age and diet, two interleaving aspects of development that have been shown to be correlated in humans as well [33, 34]. The ability of the microbiome to strongly predict (or strongly associate with) dietary components in our study, most notably wheat, speaks to the ecological ramifications of microbe-food associations writ large. Is it a commensal microorganism that aids digestion of wheat fiber specifically? Or a common plant symbiont merely passing through? Such questions would benefit from longitudinal analysis, interventional studies using dietary swaps, or thorough environmental sequencing of the foodstuff (e.g. plant material) to address these open questions.

Just as notable, but perhaps not unexpected, is the lack of meaningful predictive ability for disease status given microbiome signatures (although there was a weak statistical correlation). Although the microbiome has been implicated in numerous diseases in humans and livestock, we remain underpowered to detect disease trends in this study due largely to the lack of diseased camels in the populations visited at time of sampling, as well as more detailed information on the disease etiology for the few diseased camels that were sampled. Without finer-grain control of the diseases sampled, detection of disease signals may be limited to general dysbiosis signatures, which may require significantly larger numbers of matched diseased and healthy camels for sufficient discriminatory power.

Future directions

In terms of project scope and expansion, our near-term goal is to expand the sample number to 200-500 camels, as well as add targeted sequencing of camel milk, which is of vital social and economic significance to the region. We are also planning to sample a larger (or second) geographical area and explicitly address the question of whether Arabian camel microbiomes have strong geography-specific taxonomic signatures.

Analytically, we are planning to perform cross-domain (viral, fungal, eukaryotic microbial) identification of microorganisms, and attempt to tag consumed plant matter in the fecal DNA. With expanded geographical sampling, we hope to train models to better generalize geographically, and assess that generalization performance explicitly via a leave-one-region-out training approach. Analysis of the results of gene calling, protein clustering, and deep functional annotations of this deep metagenomic data would make attractive targets for future work (and indeed, we have already completed much of the computational work underlying this data, as it involves steps which were necessary for other steps in our Methods above; e.g. gene-calling and KEGG ortholog functional annotations are required inputs for CheckM2). Use of functional annotations may allow us to mechanistically describe the relationship between microbes and the dietary or environmental features with which they were found to be associated by taxonomy; this is also left to future work.

However, due to the high-p low-N dimensionality issues discussed above in the context of the small sample sizes available to us in this initial effort, further increasing the feature space (with genes, proteins, viruses, etc) was deemed disadvantageous without a concomitant increase in sample number, which is planned for the next stage of work for the FCMP.

Conclusions

Overall, our study provides an intensive investigation into the gut microbiome ecology of the Arabian camel. Utilizing deep metagenomic shotgun sequencing, we reveal a remarkable amount of novel microbial diversity within 55 sampled camel gut microbiomes. We develop a comprehensive, publicly available database and genomic resource for use in prokaryotic species identification using metagenome-assembled microorganisms. We hope that our microbiome reference database will prove to be a valuable resource for data analysis in camels, as well as in expanding the catalog of global microbial diversity discovered to date.

Our investigation of the correlations between the Arabian camel microbiome and metadata covariates exposed notable microbial trends with respect to physical features of these Arabian camels, as well as captured camel gut ecological diversity in broad strokes. Our study also provided a noteworthy examination into prospective predictive biomarkers for many of these variables, highlighting the potential to generalize and expand upon these with more samples, and potentially provide clues into the intimate relationship between camels, the microbes they host, the foods they consume, and the environment in which they live.

In conclusion, our study has made significant strides in deeply characterizing the species-level microbiome diversity in Arabian camels. It also holds the potential to further advance metagenomic studies in camels and beyond, and provides a valuable reference database with which to compare results across future studies. It uncovered thousands of novel prokaryotic microorganisms without species (or often even genus-level) representatives in existing databases, opening up new areas to explore in comparative genomics and systemetology. Most importantly, our study’s early insights and exploratory framework may pave the way for future work to further illuminate the previously unexplored microbial communities within these iconic mammals, including potential future insights into the health and productivity of dromedary camels and beyond.

Acknowledgments

I sincerely thank the Deanship of Scientific Research at the University of Tabuk for their support and funding. I am especially grateful to Dr. Gabriel Al-Ghalith for his expertise and guidance. His help with high-performance computing played a key role in the success of this project. I also want to thank Abdulrahman Mubaraki for his valuable support throughout the project.

References

  1. 1. Manor O, Dai CL, Kornilov SA, Smith B, Price ND, Lovejoy JC, et al. Health and disease markers correlate with gut microbiome composition across thousands of people. Nat Commun. 2020;11(1):5206. pmid:33060586
  2. 2. Metwaly A, Reitmeier S, Haller D. Microbiome risk profiles as biomarkers for inflammatory and metabolic disorders. Nat Rev Gastroenterol Hepatol. 2022;19(6):383–97. pmid:35190727
  3. 3. Zhu Y, Li H, Xu X, Li C, Zhou G. The gut microbiota in young and middle-aged rats showed different responses to chicken protein in their diet. BMC Microbiol. 2016;16(1):281. pmid:27887575
  4. 4. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207–14. pmid:22699609
  5. 5. Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature. 2012;489(7415):220–30. pmid:22972295
  6. 6. de Menezes AB, Lewis E, O’Donovan M, O’Neill BF, Clipson N, Doyle EM. Microbiome analysis of dairy cows fed pasture or total mixed ration diets. FEMS Microbiol Ecol. 2011;78(2):256–65. pmid:21671962
  7. 7. Tapio I, Fischer D, Blasco L, Tapio M, Wallace RJ, Bayat AR, et al. Taxon abundance, diversity, co-occurrence and network analysis of the ruminal microbiota in response to dietary changes in dairy cows. PLoS One. 2017;12(7):e0180260. pmid:28704445
  8. 8. von Engelhardt W, Dycker C, Lechner-Doll M. Absorption of short-chain fatty acids, sodium and water from the forestomach of camels. J Comp Physiol B. 2007;177(6):631–40. pmid:17429653
  9. 9. Henderson G, Cox F, Ganesh S, Jonker A, Young W, Global Rumen Census Collaborators, et al. Rumen microbial community composition varies with diet and host, but a core microbiome is found across a wide geographical range. Sci Rep. 2015;5:14567. pmid:26449758
  10. 10. Gharechahi J, Salekdeh GH. A metagenomic analysis of the camel rumen’s microbiome identifies the major microbes responsible for lignocellulose degradation and fermentation. Biotechnol Biofuels. 2018;11:216. pmid:30083229
  11. 11. Hinsu AT, Tulsani NJ, Panchal KJ, Pandit RJ, Jyotsana B, Dafale NA, et al. Characterizing rumen microbiota and CAZyme profile of Indian dromedary camel (Camelus dromedarius) in response to different roughages. Sci Rep. 2021;11(1):9400. pmid:33931716
  12. 12. Dande SS, Bhatt VD, Patil NV, Joshi CG. The camel faecal metagenome under different systems of management: phylogenetic and gene-centric approach. Livestock Sci. 2015;178:108–18.
  13. 13. He J, Yi L, Hai L, Ming L, Gao W, Ji R. Characterizing the bacterial microbiota in different gastrointestinal tract segments of the Bactrian camel. Sci Rep. 2018;8(1):654. pmid:29330494
  14. 14. Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. Quantitative assessment of shotgun metagenomics and 16S rDNA amplicon sequencing in the study of human gut microbiome. OMICS. 2018;22(4):248–54. pmid:29652573
  15. 15. Marcos-Zambrano LJ, Karaduzovic-Hadziabdic K, Loncar Turukalo T, Przymus P, Trajkovik V, Aasmets O, et al. Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment. Front Microbiol. 2021;12:313.
  16. 16. Topçuoğlu BD, Lesniak NA, Ruffin MT, Wiens J, Schloss PD. A framework for effective application of machine learning to microbiome-based classification problems. MBio. 2020;11(3):10–128.
  17. 17. Ramon E, Obón-Santacana M, Khannous-Lleiffe O, Saus E, Gabaldón T, Guinó E, et al. Performance of a shotgun prediction model for colorectal cancer when using 16S rRNA sequencing data. Int J Mol Sci. 2024;25(2):1181. pmid:38256252
  18. 18. Wirbel J, Pyl PT, Kartal E, Zych K, Kashani A, Milanese A, et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat Med. 2019;25(4):679–89. pmid:30936547
  19. 19. Rahman SF, Olm MR, Morowitz MJ, Banfield JF. Machine learning leveraging genomes from metagenomes identifies influential antibiotic resistance genes in the infant gut microbiome. MSystems. 2018;3(1):10–128.
  20. 20. Stein ED, White BP, Mazor RD, Miller PE, Pilgrim EM. Evaluating ethanol-based sample preservation to facilitate use of DNA barcoding in routine freshwater biomonitoring programs using benthic macroinvertebrates. PLoS One. 2013;8(1):e51273. pmid:23308097
  21. 21. Al-Ghalith GA, Hillmann B, Ang K, Shields-Cutler R, Knights D. SHI7 is a self-learning pipeline for multipurpose short-read DNA quality control. Msystems. 2018;3(3):10–128.
  22. 22. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–6. pmid:25609793
  23. 23. Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods. 2023;20(8):1203–12. pmid:37500759
  24. 24. Al-Ghalith G. Knights-Lab/Akronymer: Akronymer V0. 95 Interim Release. Akronymer: Akronymer; 2018.
  25. 25. Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics. 2022;38(23):5315–6.
  26. 26. Al-Ghalith GA, Knights D. Faster and lower-memory metagenomic profiling with UTree. 2017.
  27. 27. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol. 2017;35(8):725–31. pmid:28787424
  28. 28. Letunic I, Bork P. Interactive tree of life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024:gkae268.
  29. 29. Palmqvist H, Höglund K, Ringmark S, Lundh T, Dicksved J. Effects of whole-grain cereals on fecal microbiota and short-chain fatty acids in dogs: a comparison of rye, oats and wheat. Sci Rep. 2023;13(1):10920. pmid:37407634
  30. 30. Gilroy R, Ravi A, Getino M, Pursley I, Horton DL, Alikhan N-F, et al. Extensive microbial diversity within the chicken gut microbiome revealed by metagenomics and culture. PeerJ. 2021;9:e10941. pmid:33868800
  31. 31. Huda MN, Salvador AC, Barrington WT, Gacasan CA, D’Souza EM, Deus Ramirez L, et al. Gut microbiota and host genetics modulate the effect of diverse diet patterns on metabolic health. Front Nutr. 2022;9:896348. pmid:36061898
  32. 32. Knights D, Costello EK, Knight R. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011;35(2):343–59. pmid:21039646
  33. 33. Ottman N, Smidt H, de Vos WM, Belzer C. The function of our microbiota: who is out there and what do they do?. Front Cell Infect Microbiol. 2012;2:104. pmid:22919693
  34. 34. Wu GD, Bushmanc FD, Lewis JD. Diet, the human gut microbiome, and IBD. Anaerobe. 2013;24.