Variation in Rural African Gut Microbiomes Is Strongly Shaped by Parasitism and Diet 1 2

17 The human gut microbiome is influenced by its host's nutrition and health status, and represents 18 an interesting adaptive phenotype under the influence of metabolic and immune constraints. 19 Previous studies contrasting rural populations in developing countries to urban industrialized 20 ones have shown that geography is an important factor associated with the gut microbiome; 21 however, studies have yet to disentangle the effects of factors such as climate, diet, host 22 genetics, hygiene and parasitism. Here, we focus on fine-scale comparisons of African rural 23 populations in order to (i) contrast the gut microbiomes of populations that inhabit similar 24 environments but have different traditional subsistence modes and (ii) evaluate the effect of 25 parasitism on microbiome composition and structure. We sampled rural Pygmy hunter-gatherers 26. CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. 2 as well as Bantu individuals from both farming and fishing populations in Southwest Cameroon 27 and found that the presence of Entamoeba is strongly correlated with microbial composition and 28 diversity. Using a random forest classifier model, we showed that an individual's infection status 29 can be predicted with 79% accuracy based on his/her gut microbiome composition. We 30 identified multiple taxa that differ significantly in frequency between infected and uninfected 31 individuals, and found that alpha diversity is significantly higher in infected individuals, while 32 beta-diversity is reduced. Another factor associated with microbial composition in our data is 33 subsistence mode, notably with some taxa previously shown to differ between Hadza Eastern 34 African hunter-gatherers and Italians also discriminating Pygmy hunter-gatherers from 35 Cameroon from neighboring farming or fishing populations. In conclusion, our results stress the 36 importance of taking into account an individual's parasitism status in studies of the microbiome, 37 and highlight how sensitive the microbial ecosystem is to subtle changes in host nutrition. Our 38 fine-scale analysis allowed us to identify microbial features that are specific to hunter-gatherers 39 versus ones shared by all rural African populations, increasing our understanding of the 40 influence of subsistence mode and lifestyle on gut microbiome composition.


43
The human gut microbiota, the community of microorganisms inhabiting our gastrointestinal 44 tract, is involved in a number of metabolic and immune functions and is now considered to be their primary source of energy comes from cassava (Manihot esculenta), and fish or meat 104 provides the main source of protein.Animal food production for these populations has been 105 estimated to be high compared to elsewhere in Cameroon or Africa 27 .To account for recent 106 changes in diet, we evaluated current dietary regimes using dietary surveys.We also assessed 107 parasitism status by direct observations of fecal samples under the microscope.The focus on 108 populations living in the tropical rainforest is complementary to previous African populations 109 sampled: the Hadza hunter-gatherers and a population from Burkina Faso, living in the East and 110 West African tropical savanna, respectively 11,14 ; and a population from Malawi living in a 111 relatively dry subtropical area of East Africa 12 .To the best of our knowledge, this study  We chose these populations because previous work done in 1984-1985, based on nutritional 131 questionnaires and isotopes analyses, showed they had distinct diets 27,31 .We performed new 132 nutritional frequency surveys to assess how diet had changed during the past 30 years (see 133 Supplementary Table 1).Interestingly, the amount of meat in the hunter-gatherers' diet has 134 substantially decreased, reflecting the lower abundance of wild game in the forest reserve and 135 the hunting ban applied for some species.In contrast, the consumption of fish has increased in    technology.We obtained a total of 12.65 million high-quality reads, resulting in an average of 171 175,784 reads per sample (+/-72,822).The average percent of mapped reads per individual 172 was 83% (SD = 7.5%) and did not vary significantly between populations (Welch's t-test, p > 173 0.2).The dataset was then rarefied to 50,000 reads/sample (see Supplementary Fig. 2), and 174 reads were clustered into 5039 operational taxonomic units (OTU) at 97% identity.

176
The five biological replicates (sampling of the same individual few days apart, see 177 Supplementary Table 1) allowed us to compare the microbial differences within individuals to 178 those between individuals.We calculated the UniFrac distance, a phylogenetic based distance 179 metric, which when weighted, accounts for relative abundance of taxa 32 .Because both weighted 180 and unweighted metrics capture different aspects of microbial diversity 32 , we included both

Influence of parasitism on the microbiome 208
Because of the significant relationship between Entamoeba infection status and patterns of 209 variation in the gut microbial communities found in all populations, we further investigated the 210 relationship between infection by this parasite and composition of the microbiome (Fig. 2).As it 211 is difficult to distinguish between the opportunistic pathogenic species (E.histolytica) and the 212 strict commensal (E.dispar) by microscopy alone, we were unable to characterize this parasite 213 at the species level.However, fewer than 10% of infected individuals were suffering from 214 diarrhea, suggesting that they were not experiencing symptomatic amebiasis 33 .

216
At the phylum level, we found that 8 of the 13 phyla represented are significantly different 217 between Entamoeba infected (Ent+) and uninfected (Ent-) individuals, with most phyla (except 218 Bacteroidetes and Lentisphaerae) occurring at a higher relative abundance in Ent+ individuals 219 (see Table 1).When looking at individual taxa, based on a linear regression model, we also 220 identified a number of notable differences between Ent+ and Ent-individuals (Fig. 2b-c,

221
Supplementary Table 3-4), and we found that eighteen of the 93 most abundant taxa (present at 222 ≥ 0.1% in at least 4 individuals) differed significantly in their relative abundance between Ent+

225
These taxonomic signatures for Entamoeba infection are so strong that an individual's infection 226 status can be predicted with 79% accuracy using a Random Forests Classifier (RFC) model 227 based on gut microbiome composition (p < 0.001; See Supplementary Fig. 5).Of the ten taxa 228 identified as being the most important in their predictive power, all but Prevotella stercorea were

246
Furthermore, when looking at the microbial diversity of Ent+ versus Ent-individuals, we found 247 that the presence of Entamoeba is associated with a significant increase in alpha (intra-host) 248 diversity using the Phylogenetic Distance Whole Tree metric (Welch's t-test: p < 0.0001, Fig.

Relationship between specific taxa and microbial community diversity 260
Because of the striking relationship between Entamoeba infection status and alpha diversity, we 261 sought to identify any phyla for which abundance was significantly correlated with community 262 diversity.To account for the effect of Entamoeba infection, we added infection status as a binary 263 covariate to our linear model and identified 11 phyla that are significantly correlated with alpha 264 diversity (q < 0.05; see Supplementary Fig. 8).Although, as expected, the majority of these taxa 265 increase in abundance with higher diversity, Bacteroidetes and Proteobacteria exhibit a 266 decrease in relative abundance as alpha diversity increases.This negative relationship 267 suggests that these taxa might be more competitive than others and drive down diversity.and Ent-individuals (linear regression: q < 0.05, see Supplementary Table 6 and Fig. 2d).Of 276 these 19, of particular interest are an increase in amoebiasis (q = 0.001), biosynthesis of the 277 antibiotic tetracycline (q = 0.03), and yeast MAPK signaling pathways (q = 0.01) in Ent+ 278 individuals.These changes are largely attributed to Clostridiales and Ruminococcaceae, which 279 occur at significantly greater abundance in Ent+ individuals (6.53% vs. 4.53%, q = 0.044; and 280 29.58% vs. 16.34%,q < 0.0001, respectively, Fig. 2d).Interestingly, the Cellular Antigens 281 pathway, potentially involved in host-microbe and microbe-microbe interactions, is more 282 represented in the predicted metagenomes of Ent-individuals (linear regression: q = 0.01).This 283 pathway is predominantly attributed to members of the Enterobacteriaceae family, which was 284 found to be twice as abundant in individuals lacking the parasite.14 individuals of the hunter-gatherer, farmers from the South and fishing populations were 338 distributed evenly across all other subsistence groups, with the exception of farmers from the 339 North, to which no individual was predicted to belong.Only five of the top ten taxa identified in 340 the RFC model were determined to be significant in the linear regression (see Supplementary 341 Fig. 14b).This suggests that rather than an individual signature taxon, it is the pattern of 342 abundances of multiple taxa that is important for predicting subsistence.In agreement with our 343 linear regression, the taxon identified as being the most important in distinguishing subsistence 344 groups was Bifidobacterium uncl (see Supplementary Fig. 14b and Fig. 4b), occurring at 345 significantly higher frequency in the fishing population (q = 0.0003, Supplementary Table 5).

346
Ruminococcus bromii, important for degradation of resistant starch 38 , was the second most 347 important taxon, occurring at 0.01%, 0.01%, 0.15%, and 0.12% in the fishing population, 348 farmers from the North, the South, and hunter-gatherers, respectively (q < 0.0001) (see 349 Supplementary Fig. 14c).The third, fourth, fifth and eighth most important taxa include 350 members of the Lachnospiraceae family, two of which were found to be significant in the linear 351 model (see above).When grouped together, taxa in this family are less abundant in the hunter-352 gatherers relative to other subsistence groups (11.3% vs. 15.6-19.6%,respectively), a 353 difference significant only when comparing hunter-gatherers to both farmer populations.Finally, 354 two species of the family Succinivibrionaceae family, Succinivibrio sp. and Ruminobacter sp.,

355
were also identified as being important taxa in the model, both of which were more abundant in 356 the hunter-gatherers at 9.7% and 3.7%, respectively, vs. less than 5.7% and less than 0.1% for 357 the other three subsistence modes (q = 0.068 and 0.057, respectively; see Supplementary Fig. 358   14c).Both of these taxa, associated with the bovine rumen, were also found in higher frequency 359 in the Hadza hunter-gatherers 14 .Finally, only five of the top ten taxa identified in the random 360 forest classifier model were determined to be significant in the linear model (see Supplementary 361 Fig. 14b).This suggests that rather than an individual signature taxon, it is the pattern of 362 abundances of multiple taxa that is important for predicting subsistence.
fishing population, has considerable influence on their gut microbiomes, but this requires further 389 investigation.

391
According to the taxonomy-based predicted metagenome for each subject's gut microbiota, we 392 found that only one pathway, bacterial invasion of epithelial cells, differed significantly across all 393 subsistence types; represented at the highest relative abundance in the hunter-gatherers and 394 lowest in the farmers (linear regression: q = 0.03, Supplementary Fig. 18 and Supplementary 395 Table 6).This pathway includes proteins expressed by pathogenic bacteria that are important 396 for entry into host cells.The importance of this difference is unclear, but could be indicative of 397 an increased abundance of pathogens in the microbiomes of hunter-gatherers.
112represents the first comparison of the gut microbiome of human populations with limited 113 geographic separation and contrasting subsistence modes, as well as the first characterization 114 of the relationship between microbial communities and intestinal parasites.
individuals in seven different villages in Southwest Cameroon (average age 50 120 years, ranging from 26 to 78 years) corresponding to 20 hunter-gatherers, 24 farmers and 20 6 Diet 130

136
inland populations (especially in farmers), due to the construction of new roads connecting the 137 coastal and inland populations.Similar to the results from 1984-1985, the farmers eat less 138 starchy foods (cassava) than hunter-gatherers and individuals from the fishing population 139 (Wilcox Rank Sum test: p = 0.005 and 0.017, respectively).A principal component analysis on 140 all dietary components revealed roughly three clusters corresponding to the three dietary 141 regimes, with the first axis distinguishing hunter-gatherers from the others, and the second axis 142 separating the farming and fishing populations (see Fig. 1b).The one exception to this pattern 143 concerns farmers from the North (living along the same road as the hunter-gatherers), which 144 cluster with the hunter-gatherers.In the following analyses, we therefore consider this 145 population separately from the farmers living in the South.dietary questionnaires, we assessed the nutritional status of individuals by 149 measuring their BMI (Body Mass Index) (see Supplementary

168
The fecal microbiota of 69 samples (including 5 biological replicates) were characterized by 169 sequencing of the V5-V6 region of the bacterial 16S ribosomal RNA with the Illumina MiSeq 170 229 significant in our linear regression model (of which all are in higher abundance in Ent+ 230 individuals except Prevotella copri).The reason for the association between Entamoeba and 231 these microbes have yet to be identified, but it is noteworthy that the two most important taxa 232 identified in the RFC model, Elusimicrobiaceae unclassified (uncl) and Ruminococcaceae uncl, 233 .CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder.It is made available under a The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/016949doi: bioRxiv preprint first posted online Mar. 24, 2015; 10 include established endosymbionts of protists and common inhabitants of the termite gut 34 .234 Spirochaetaceae Treponema, the third most important taxon, include species that are 235 established human pathogens and others that have been reported to inhabit the cow rumen, the 236 pig gastrointestinal tract, and the guts of termites 35 .Christensenellaceae, the fourth most 237 important taxon, was recently identified as being the most heritable taxon in an analysis of twins 238 from the UK 3 .Two taxa in the order Bacteroidales, Prevotella stercorea and Prevotella copri, the 239 seventh and eighth most important taxa, are the only ones occurring at significantly reduced 240 abundance in infected individuals; Prevotella is an important genus of gut bacteria and is 241 underrepresented in Western versus African microbiomes 11 .While members of the Clostridia 242 and Gammaproteobacteria are more abundant in infected individuals, the pattern for 243 Bacteroidales is the opposite (see Fig. 2b.Oscillospira uncl and Parabacteroides uncl, the ninth 244 and tenth most important taxa, are associated with the rumen and human intestine, respectively.

255
diversity increases, there are fewer potential stable states for individual gut communities, or that 256 infection by Entamoeba drives changes in the microbiome that are dominant over other factors.
-NC-ND 4.0 International license peer-reviewed) is the author/funder.It is made available under a KEGG (Kyoto Encyclopedia of Genes and Genomes) database 36 and the PICRUSt 271 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) pipeline 37 272 to predict abundances of pathways across individuals (see Supplementary Fig. 9).Considering 273 the 220 most abundant KEGG pathways (comprising ≥ 0.01% of all assigned reads in at least 4 274 individuals), we identified 19 pathways with significant differences in abundance between Ent+ 275 the first investigation of the relationship between intestinal parasitism and 401 the human gut microbiome, and found that infection by the amoeboid parasite, Entamoeba, 402 outcompetes diet, geographic location and ancestry in predicting composition and structure of 403 the gut microbiome.Furthermore, we have conducted the first analysis assessing the role of 404 subsistence, location and genetic ancestry in shaping the gut microbiota at a local scale.We 405 showed that there is striking variation amongst different rural African populations, indicating that 406 there are multiple signatures of rural, unindustrialized microbiomes.

409
The importance of gastrointestinal parasites in human disease is well established, both as 410 infectious agents and in shaping immunity22,41  , and infection by helminths has notably been411found to be a major force underlying the evolution of interleukin genes in humans23  .It has also 412 been demonstrated that loss of helminth exposure removes the enhanced T helper cell 2 (Th2)413and regulatory immune response imparted by these organisms, which is correlated with the Table 1.(a) Frequency (in %) of phyla for Entamoeba negative (Ent-) and positive (Ent+) individuals and for the four subsistence groups (Fis = Fishing population; Far(S) = Farmers from the South; Far(N) = Farmers from the North; HG = Hunter-gatherers).(b) Frequency (in %) of specific taxa of interest previously associated with geography in the four subsistence groups.Pvalues are based on a linear regression model.The last column indicates whether previous studies 11,14 found an enrichment of each phylum in Hadza adults (H) versus Italians adults (I), or in Burkina Faso children (BF) versus Italians children (I).The first letters in parenthesis indicate to which phylum each taxa belongs (Act.= Actinobacteria, Bact.= Bacteroidetes, Firm.= Firmicutes, Prot.= Proteobacteria, and Spir.= Spirochaetes).