Metagenome analysis from the sediment of river Ganga and Yamuna: In search of beneficial microbiome

Beneficial microbes are all around us and it remains to be seen, whether all diseases and disorders can be prevented or treated with beneficial microbes. In this study, the presence of various beneficial bacteria were identified from the sediments of Indian major Rivers Ganga and Yamuna from nine different sites using a metagenomic approach. The metagenome sequence analysis using the Kaiju Web server revealed the presence of 69 beneficial bacteria. Phylogenetic analysis among these bacterial species revealed that they were highly diverse. Relative abundance analysis of these bacterial species is highly correlated with different pollution levels among the sampling sites. The PCA analysis revealed that Lactobacillus spp. group of beneficial bacteria are more associated with sediment sampling sites, KAN-2 and ND-3; whereas Bacillus spp. are more associated with sites, FAR-2 and ND-2. This is the first report revealing the richness of beneficial bacteria in the Indian rivers, Ganga and Yamuna. The study might be useful in isolating different important beneficial microorganisms from these river sediments, for possible industrial applications.


Introduction
Rivers are known to be important for the development of human civilization, culture, and welfare. They are one of the crucial components of freshwater ecosystems, maintaining large biodiversity which is vital for sustenance of the terrestrial biome. Since rivers are significant reservoirs of the microbiome, they are relentlessly being explored for the search of de novo microbiota. These bacteria are of greater importance due to their different benefits to humans as well as all other strata of organisms present in the trophic pyramid [1]. It provides its rewarding effects generally through four main mechanisms i.e. enhancement of barrier function, intervention with host pathogens, immunomodulation, and assembly of neurotransmitters [2]. These organisms are gaining increasing importance as functional foods as well as prophylactic, therapeutic, and growth supplements for humans [3][4][5]. Some of the most common human gut probiotics viz. Lactobacillus and Enterococcus are reported to counteract diabetes, obesity, autoimmune disorder, and cancer through the production of metabolites like short-chain fatty acids [6]. Not only for humans, nowadays, the important microbiome is also being used in agriculture, including veterinary and fisheries, to benefit the animal physiology by improving their internal and external environment [5,7,8]. However, in fisheries, the scope of microbial treatment is enormous and the use of the same is gaining day by day. The latest study on Labeo rohita established that dietary administration of a probiotic bacterium, Bacillus aerophilus KADR3, improves the disease resistance and enhances the immunity against Aeromonas hydrophila infection [9]. Similarly, the dietary application of B. amyloliquefaciens CCF7, in L. rohita, challenged with a fish pathogenic bacteria, A. hydrophila MTCC 1739, showed beneficial effects [10]. Though many reports are present on discovering microbiome from natural streams of other countries, there is very insufficient literature available on the same in context to the Indian subcontinent especially in the large riverine ecosystems like Ganga and Yamuna. Therefore, in the present study, the abundance of different beneficial microbiota in the selected stretches of the river Ganga and Yamuna have been identified through the metagenomics approach. The metagenomics study has overcome the problem of culture-oriented microbiological studies associated with different environmental samples and came out as a potential search tool for detailed screening of supportive microbiome species present in an ecosystem [11]. Since the total DNA extracted from an environmental sample is a snapshot of the entire microbial community, metagenomics analysis makes it easier for a comprehensive evaluation of the native microbial ecology [12]. The recent computational advancement and evolution of next-generation sequencing, which can generate millions of sequences at improved cost and speed, make it possible to detect microbial biodiversity easily and their abundance directly from the environmental samples [13][14]. As per our knowledge, this is the first report, presenting an analysis of a large sediment metagenome dataset from these rivers in search of beneficial bacteria.

Sample collection
A total of nine sediment samples were collected from the river Ganga (Fig 1).

DNA extraction
The obtained samples from different locations from river Ganga and Yamuna were kept in sterile plastic bags, sealed and transported on ice (4˚C), and afterward stored at -80˚C until further processing. Metagenomic DNA from these sediment samples were extracted using a soil gDNA isolation kit (Nucleospin Soil). After the isolation of metagenomic DNA, the quality was checked in Nanodrop 2000 and Qubit1 3.0 Fluorometer. The metagenomic library was prepared using sufficient amounts of extracted good quality DNA.

Metagenomic library preparation
The paired-end sequencing libraries were arranged using Illumina Trueseq Nano DNA Library Prep Kit. Approximately 200ng of eDNA was fragmented by Covaris M220 to produce a mean fragment allocation of 350bp. Covaris shearing produced dsDNA fragments with 3' or 5' overhangs. The fragments were then subjected to end-repair. As per the description in the kit, the products were PCR amplified with the index primer. The D1000 Screen tape was used to investigate the PCR enriched libraries in the 4200 Tape Station system (Agilent Technologies).

Whole metagenome sequencing and quality assessment
After obtaining the mean peak size from Agilent Tape Station profile and Qubit concentration for the libraries, PE Illumina libraries were loaded into NextSeq 500 for cluster generation and sequencing. After trimming, a minimum length of 100 nt was applied. The CLC Genomics Workbench PLOS ONE version 8.5.1 (CLC bio; https://www.qiagenbioinformatics.com/products/clc-genomics-workbench) was used to assemble the filtered high-quality reads of each sample into scaffolds.

Sequence annotation and bioinformatics analysis
For the detection of the beneficial microbiome in the sediment metagenome, filtered metagenomic reads were used for taxonomical binning by the Kaiju web interface. Classifier-Kaiju used Burrows-Wheeler transform algorithm for taxonomic classification on the protein-level [15]. On the other hand, to highlight the phylogenetic relationship among the beneficial microbiome species found in the sediment metagenome, multiple sequence analysis was carried out using MEGA 6 software. The Neighbor-Joining method was used to infer evolutionary history [16]. The Maximum Composite Likelihood method [17] was used to compute evolutionary distances. To understand the evolutionary relationship among the 69 identified beneficial microbial species, derived from the sediments of the rivers, Ganga and Yamuna, a multiple sequence analysis (MSA) was carried out using MEGA 6 software [18]. Relative abundance of beneficial bacteria was calculated using Kaiju Web Server. Comparison was done based on standard student t-test [19]. Heat map presentation was arranged using multiple experiment viewer (MeV), a standalone tool for visualizing the clustering of multivariate data [20]. The Principal Component Analysis (PCA) biplot and Scatterplot matrix along with correlation values between sampling sites and relative abundance of helpful bacteria were developed in JMP Pro 10 after the standardization of the estimated data.

Sequence generation
Sediment samples from nine sites (Fig 1) of river Ganga (KAN-1, KAN-2, and KAN-3; FAR-1, FAR-2, and FAR-3) and river Yamuna (ND-1, ND-2, and ND-3) were analyzed using high throughput next-generation sequencing to identify the microbial biodiversity. The total number of high quality reads with their consequent data volume of each sediment samples are presented in Table 1. All the high quality reads obtained from the sediments of different sites

Taxonomical classification of sediment metagenome
Based on the taxonomical classification, a large number of beneficial bacterial species (

Phylogenetic analysis
MSA revealed that the majority of the species showed diversity. Phylogenetic tree analysis delineated that, all the species shaped five different clusters (Fig 2). In the first CLUSTER, S. thermophilus and L. brevis derived from Yamuna and Farakka sediment samples respectively were found phylogenetically very close to each other with the bootstrap value of 34. In CLUS-TER-2, E. faecium and L. johnsonii, derived from Yamuna and Farakka sediment samples respectively, were found very close to each other with a bootstrap value of 14. Similarly, in CLUSTER-3, L. fermentum and L. helveticus derived from Kanpur and Yamuna sediment samples respectively, were found phylogenetically related with a high bootstrap value of 71. In CLUSTER-4, P. pentosaceus and B. adolescentis both derived from Yamuna sediment samples were found close to each other with the bootstrap value of 19. The highest numbers of evolutionary closed beneficial microbiome species were found in CLUSTER-5. L. gasser and B. mycoides derived from Kanpur and Yamuna sediment samples were found close to each other with a bootstrap value of 54.

Relative abundance at different sites
In the classified metagenomics data, a total of 69 species of the bacteria from 18 different genera were considered for analysis. Heat map analysis showed a clear distinction in the relative abundance of different bacteria between Kanpur and Farakka sediment samples of river Ganga. Similarly, the prevalence of beneficial bacterial species in the sediment samples of river Yamuna was also different from Kanpur and Farakka stretches of river Ganga (Fig 3). Relative abundance analysis revealed that the species L. curvetus and L. brevis were present in similar proportion in sediment samples of all the nine sampling sites of the two rivers; however, L. casei was present in relatively high proportion at Farakka stretch of river Ganga with statistical significance (p-value of 0.02). B. clausii was found in a high proportion (p�0.05) at Farakka stretch whereas, B. mycoides found in a high proportion (p�0.05) at Kanpur stretch of PLOS ONE river Ganga. Our metagenomic data showed that, one species of Vibrio (V. harveyi) which showed differential relative abundance between three locations (Kanpur, Farakka and New Delhi) and was found relatively lower (p�0.05) proportion at New Delhi stretch of river Yamuna as compared to Kanpur stretch of river Ganga. Similarly, S. colwelliana was found in a higher proportion (p�0.05) at Kanpur stretch of river Ganga. E. faecium was found in high proportion at New Delhi stretch of river Yamuna as compared to other locations (p�0.05) ( Table 2).
Based on the taxonomical hierarchy, it was revealed that, in all the three locations (Kanpur, Farakka, and New Delhi), L. curvatus had similar relative abundances. The species, L. brevis also showed a similar trend, however, its relative abundance was comparatively higher in the sediment samples of Farakka stretch of river Ganga. The L. casei showed lower abundance in sediment samples at New Delhi stretch of river Yamuna as compared to the other two sites of river Ganga (Fig 4A). Among the Pediococcus population, it is interesting to note that, in the sediment metagenome of Kanpur site of river Ganga, the P. acidilactici was (Student's t-test,

PLOS ONE
p �0.05) dominant over the all taxonomical profile; however, P. pentosaceus and P. ethanolidurans showed equal distribution among the sediment metagenomes at Farakka of river Ganga and New Delhi of river Yamuna (Fig 4B). Likewise, Pseudomonas population showed an equal distribution of relative abundance in all the nine sites. However, P. fluorescens, P. chlororaphis showed (Student's t-test, p �0.05) relative abundance value at Kanpur (Fig 4C). Among the Enterococcus spp., E. durans, E. malodoratus, E. raffinosus, E. hirae, and E. mundtii showed non-significant differences among the nine sampling sites. E. faecium and E. faecalis showed higher abundance (Student's t-test, p �0.05) in sediment metagenomes of river Yamuna compared to Kanpur and Farakka stretch of river Ganga (Fig 4D).
The biplot of principal component analysis (PCA), the PC1, and PC2 altogether could explain 64% variability in the data which showed that the sites at Farraka are closely associated and sites at Kanpur and New Delhi are diverse about the relative abundance of beneficial bacteria (Fig 5A). The relative abundance of beneficial bacteria is found to be closely associated at site FAR-1, KAN-2, KAN-3, ND-1, ND-2, and ND-3. Further, PCA showed that Lactobacillus spp. group of beneficial bacteria are more associated with sites KAN-2 and ND-3; whereas Bacillus spp. are more associated with FAR-2 and ND-2. The Scatter plot matrix showed the correlation between the sites about the relative abundance of beneficial bacteria (Fig 5B). Highest positive correlation was found between ND-2 and ND-3 (r = 0.48) followed by FAR-1 and FAR-2 (r = 0.36) and KAN-1 and KAN-3 (r = 0.33).

Discussion
The study found that, the river Ganga and Yamuna host several beneficial bacterial genera with enormous taxonomical diversities. Altogether the study could identify 69 beneficial

PLOS ONE
species belonging to 18 genera ( Table 2). All the identified beneficial bacteria with their proposed mechanism of action are represented in S1 and S2 Tables. The bacterial communities and their functional genomics in sediments and water of the Apies River, South Africa were analyzed using Metagenomic data. Higher diversity in the microbial species associated with the different land uses in the water and sediments of the Apies River was revealed in this study [21]. The taxonomic classification was also previously used to classify microbe strains with consistent categorization at the species level with appropriate safety evaluation, quality assurance, and non-fraudulent labeling [15,[22][23][24][25][26]. In the present study, the beneficial bacterial species under genus Lactobacillus (L. curvatus, L. brevis, L. helveticus, L. gasseri, L. crisptus and L. casei, etc.) were identified. These Lactobacillus species were reported to exert their beneficial effects by reducing soreness in inflammatory bowel disease (IBD) by producing anti-inflammatory cytokine [27], antibiotic and bacteriostatic activity by the production of bacteriocins [28], and anti-stress activity by the production of β-galactosidase enzymes [29]. L. curvatus, was reported to lower the cholesterol level through enhancement of esterase, lipase, cysteine arylamidase, and β-galactosidase activities in the host organisms [30]. Vibrio spp. were reported to cause health benefits to the host organism by improving disease resistance through the production of bacteriocin-like substance [31], alteration in the hepatosomatic index, and haemocytes number [32]. Bacillus spp. found in the present study were reported to enhance growth, survivability and disease resistance of Labeo rohita, and Macrobrachium rosenbergii etc. through increased alkaline phosphatase activity, globulin content and lysozyme level [33], enhancement of serum lysozyme activity and serum IgM level [34], increased LYZ gene expression [35], etc. The identified Bifidobacterium spp., (B. animalis, B. bifidum, B. longum,  B. breve and B. adolescentis, etc.) were reported to attenuate autoimmune encephalomyelitis by inhibiting mononuclear infiltration into the central nervous system [36], diminish gastrointestinal distress by stimulating the production of gastric mucin and other gastrointestinal or neuropeptide hormones [37], anti-obese activities by inhibition of lipid deposit in the liver and adipose tissues [38], alleviate of high-fat diet-induced colitis by inhibition of NF-κB activation and lipopolysaccharide production by gut microbiota [39]. Similarly, Pediococcus spp. was reported to cause many health benefits viz. P. acidilactici was reported to advance reproductive performance [40], P. pentosaceus has anti-inflammation and anti-cancer effects through mitigation of azoxymethane-induced toxicity [41], P. ethanol idurans enhances health through the production of high levels of cellular antioxidant and amplified bile salt hydrolase activities [42]. The identified Enterococcus faecalis, was reported to enhance anti-oxidative activity and anti-tumor activity by NK cells and TNF-α [42]. E. raffinosus which was reported to prevent bacterial infection in Labeo rohita and Labeo catla from E. coli, A. hydrophilla, S. aerous, S. typhimurium [43]. The identified E. hirae, reported producing lipase and bile salt hydrolase enzyme with antioxidant properties, and E. mundtii reported with antimicrobial activity [44]. Four Roseobacter spp., identified from the sediment metagenomes, were reported with therapeutic value for commercial aquaculture. Earlier, several Roseobacter sp. were also reported to reduce fish pathogenic bacteria V. anguillarum by R. clade [45].
The phylogenetic tree analysis showed that the majority of the species are evolutionary diverse. The phylogenetic tree of all the identified beneficial bacteria species was shaped in five different clusters. In CLUSTER-3, L. fermentum and L. helveticus derived from Kanpur and Yamuna sediment samples respectively were found phylogenetically related with a high bootstrap value of 71. A similar observation was reported from Lactobacillus spp. isolated from animal faeces and it was found that, L. salivarius phylogenetic group was closely related to L. animalis, L. apodemi, and L. Murinus [46]. The present finding could be corroborated with a previous report where Lactococcus and Streptococcus appeared to be closely related and Lactobacillus was found to be phylogenetically diverse [27]. The intermixing of phylogenetic distribution, as observed from our study, was also reported previously where Lactobacillus and Pediococcus were phylogenetically intermixed with 5 species of Pediococcus [47]. The Lactobacillus chromosomes also expressed the high heterogeneity at phylogenetic, phenotypic, and ecological levels amid the different members of this genus [48]. The present study also found heterogeneity of clustering in Lactobacillus species and other beneficial bacteria.
Relative abundance study showed that beneficial bacteria species of different genera were variedly distributed among the three locations; few species are highly dominant in one location over others, viz. Pediococcus acidilactici was highly abundant in Kanpur location of river Ganga as compared to other locations. The PCA analysis also showed that the sites at Farraka are closely associated and sites at Kanpur and New Delhi are diverse about the relative abundance of beneficial bacteria. This location-specific change of microbial diversity in the river sediments might be due to differential physiochemical properties and pollution level of the collected sediments. The primary reason for this difference might be due to the release of heavy organic loads and toxic substances (heavy metals, hazardous chemicals, etc.) in some of the selected locations (Kanpur and New Delhi) of these riverine ecosystems through the release of untreated sewage and industrial wastes. It was reported that Kanpur stretch of river Ganga is highly polluted by the untreated effluents from hundreds of tannery industries present in the river bank [49][50]. Very high quantities of diverse heavy metals like Cr, Cu, Pb, Ni, Zn, etc. were found extensively in the water and in the sediments of river Ganga in Kanpur, where pesticide residue like α-HCH, γ-HCH, Dieldrin and Malathion were also reported with a concentration range from 0.190±0.02 to 2.61±0.05 μg/L 2 [51]. However, the Farakka stretch of the river Ganga was reported to be less polluted [52]. Like Kanpur stretch of river Ganga, the New Delhi stretch of the river Yamuna was also reported to be severely polluted by heavy metal pollutions due to the release of untreated metropolitan swages, factory effluents, etc. [53,54]. Therefore, we presume it might be the reason for differences in the relative abundance of beneficial bacteria species among different locations in the river Ganga and Yamuna. Our results could be supported by the previous finding, where the proportion of beneficial microbes in the gastrointestinal microbiota of Bufo raddei was altered due to heavy-metal pollution [55]. This is the first report on the identification of beneficial bacteria in the sediments of the river Ganga and Yamuna, using a metagenomic approach. This study revealed extensive insights on the abundance of native important beneficial microorganisms in these rivers and their functional properties.

Conclusion
Our research indicates that the sediment metagenome of the river Ganga and Yamuna manifests the enriched microbial distribution of beneficial bacteria. The phylogenetic study of identified useful microbial species revealed that the majority of the species are evolutionarily diverse. This study also refers to the clear distinction in the relative abundance of different beneficial bacteria across the sampling sites. Isolation of different beneficial bacteria from these riverine ecosystems would be highly useful for industrial applications in the future.
Supporting information S1 Table. Health benefit of identified bacteria and their proposed mechanism of action. (DOCX) S2 Table. Relative abundance of beneficial bacteria species identified from the nine sediment metagenome of river Ganga and Yamuna. (DOCX)