Metagenomic analysis for taxonomic and functional potential of Polyaromatic hydrocarbons (PAHs) and Polychlorinated biphenyl (PCB) degrading bacterial communities in steel industrial soil

Iron and steel industries are the major contributors to persistent organic pollutants (POPs). The microbial community present at such sites has the potential to remediate these contaminants. The present study highlights the metabolic potential of the resident bacterial community of PAHs and PCB contaminated soil nearby Bhilai steel plant, Chhattisgarh (India). The GC-MS/MS analysis of soil samples MGB-2 (sludge) and MGB-3 (dry soil) resulted in identification of different classes of POPs including PAHs {benzo[a]anthracene (nd; 17.69%), fluorene (15.89%, nd), pyrene (nd; 18.7%), benzo(b)fluoranthene (3.03%, nd), benzo(k)fluoranthene (11.29%; nd), perylene (5.23%; nd)} and PCBs (PCB-15, PCB-95, and PCB-136). Whole-genome metagenomic analysis by Oxford Nanopore GridION Technology revealed predominance of domain bacteria (97.4%; 97.5%) followed by eukaryote (1.4%; 1.5%), archaea (1.2%; 0.9%) and virus (0.02%; 0.04%) in MGB-2 and MGB-3 respectively. Proteobacteria (44.3%; 50.0%) to be the prominent phylum followed by Actinobacteria (22.1%; 19.5%) in MBG-2 and MBG-3, respectively. However, Eukaryota microbial communities showed a predominance of phylum Ascomycota (20.5%; 23.6%), Streptophyta (18.5%, 17.0%) and unclassified (derived from Eukaryota) (12.1%; 12.2%) in MGB-2 and MGB-3. The sample MGB-3 was richer in macronutrients (C, N, P), supporting high microbial diversity than MGB-2. The presence of reads for biphenyl degradation, dioxin degradation, PAH degradation pathways can be further correlated with the presence of PCB and PAH as detected in the MGB-2 and MGB-3 samples. Further, taxonomic vis-à-vis functional analysis identified Burkholderia, Bradyrhizobium, Mycobacterium, and Rhodopseudomonas as the keystone degrader of PAH and PCB. Overall, our results revealed the importance of metagenomic and physicochemical analysis of the contaminated site, which improves the understanding of metabolic potential and adaptation of bacteria growing under POP contaminated environments.

Introduction Persistent organic pollutants (POPs) are anthropogenic chemicals that are enlisted as priority environmental pollutants due to their toxicity and persistence in the environment for a prolonged period [1]. PAHs/PCBs are strongly lipophilic and hence they easily enter the food chains. These characteristics are important since they are responsible for the detrimental effect on the environment and induce health threats to plants, animals, and humans [2]. Increased industrialization has led to the extensive production of such POPs, which are also emitted during steel production [3]. The rise in these pollutants has led to adverse health and the environment effects, resulting in extensive studies on the remediation of contaminated soil. Various physical and chemical technologies, including chemical oxidation, electrokinetic remediation, solvent extraction, photocatalytic degradation, and thermal treatment, are widely applied in remediation [4]. However, most of these treatment methods are unsustainable, disruptive and carry these PCBs/PAHs to the environment. Therefore, the utilization of existing contaminated soil-based bacterial communities can be an alternative strategy for effective and viable degradation of POPs [5] as it has comparatively fewer technical hindrances than other remediation technologies.
A series of studies have been performed by the culture-dependent approach to isolate the most efficient biodegrader from such polluted sites [6]. The contaminated soil environment consists of the genetic, species, and metabolic diversity of microbial biodegraders. Only a minor fraction of POP-degrading bacteria can be obtained using a culture-dependent method o. Furthermore, it has been reported that enrichment of these cultures under lab conditions is less efficient in biodegradation than indigenous bacteria present in the contaminated soil [7]. To date, information related to taxonomic and functional interaction amongst the microbial communities during the biodegradation process within the contaminated environment is skewed. The recent development of powerful culture-independent metagenomic approaches and the advancement of next-generation sequencing (NGS) technology provides a comprehensive insight into the total microbial community inhabiting contaminated sites and their metabolic capabilities. Several metagenomics studies conducted on PAH/PCB contaminated soil samples [8] have highlighted microbial interaction playing a pivotal part in bioremediation of these POPs. However, most of these studies are based on 16S rRNA gene sequence, which does not highlight the metabolic potential of resident microorganisms. Therefore, the present study aimed to investigate taxonomic diversity and their metabolic potential to degrade POPs employing Oxford Nanopore Technology (ONT). ONT is very sensitive and can detect very low abundant microbial members that are otherwise missed in the metagenome.
The PCBs congeners and PAHs have been reported to be present in the waste sites of the industrialized area of this steel plant, India [9]. The major pollution source of steel industries includes sinter, coke, and the blast furnace [10]. Therefore, the present study aimed at investigating and providing an insight into the diversity of the bacterial community, metabolic potential of the dominant bacterial community in the contaminated soil collected from nearby regions of Bhilai steel plant (one of Asia's biggest steel plants) in Chhattisgarh, India, and to correlate their functional characteristics to the biodegradation pathways. From our result, we concluded that our results provided potential bacterial candidates for the exploitation bioremediation of PAH and PCB.

Study site and sampling
The soil samples were collected from 2 different sites i.e., sludge site (Metagenomics Bhilai; MGB-2) and dry soil waste site (Metagenomics Bhilai; MGB-3) from the polluted area near Bhilai steel plant, Chhattisgarh (21.1915˚N, 81.4041˚E), in India. Soil samples were collected in sterile containers from a depth of about 0 to 10 cm of two sampling sites. The soil samples were randomly collected from three sites of each site and pooled for further analysis. It was then transported on ice pack and stored at 4˚C (to be used immediately) in the lab for analysis. Physicochemical parameters such as pH, electrical conductivity, organic C, N, P, Mg, K, Na, Cl, Ca, S, Zn, Fe, Cu, and Mn of MGB-2 and MGB-3 were estimated using the standard protocol at the National Horticultural Research and Development Foundation, Nasik, India.

Extraction and determination of PAH and PCB in sediments
PCB and PAH were extracted following He et al. [11] with minor modification. Briefly, 5 g of collected sample (dry weight) was added into 50 ml Milli Q (MQ) water and was homogenized by vortexing for 15 min. After allowing it to stand for 30 min, 10 ml of acetone and hexane (1:1; v/v) were added to the falcon and vortexed for 3 min. 2g NaCl was added and shaken vigorously for a few min. It was then centrifuged at 4000 x g for 5 min. The supernatant was subjected for further solid-phase extraction (SPE) of PCB and PAH using bond elute cartridge as per manufacturer's instruction (Agilent technologies, USA). Further, the sample elution was performed with methanol and hexane 1:1 (v/v) in 5 ml MQ by centrifuging for 2 min at 1000 x g. The final elute was then collected through a Polytetrafluoroethylene (PTFE) filter (0.22 μ) in a separate vial and adjusted to 1 ml with nitrogen [12].
GCMS-TQ8040 (Shimadzu, Japan) fitted with Scan/SIM was used to qualitatively analyze PAHs and PCBs that are potentially present in the MGB-2 and MGB-3 samples. GC-MS/MS fitted with Flame ionization detector (FID), and an RTX-5 column (30 m × 0.32 mm × 0.25 μm) was used for analysis. GC conditions were set at 40˚C with a 2 min hold and 10˚C/min increment to 80˚C, then 6˚C/min to 225˚C with 10 min hold. The presence of PCB was detected through SIM mode of GC-MS/MS.

Metagenome sequencing and analysis
DNA extraction and processing for metagenome. Two soil samples were collected in triplicate and pooled together for each sample. DNA extraction was done using Powersoil 1 DNA Isolation Kit (Qiagen, USA) following the manufacturer's instructions. The metagenomic DNA was checked for integrity by agarose gel (1%) using a BioRad Gel documentation system and was quantified by Qubit 3.0 Fluorometer (Invitrogen, USA).
Preparation of library and whole metagenome sequencing. Metagenomic DNA extracted from the collected soil (MGB-2 and MGB-3) were end-repaired using NEBnext ultra II kit (New England Biolabs, USA), cleaned up with 1x AmPure beads (Beckmann Coulter, USA). Native barcode ligation was performed with NEB blunt/TA ligase using NBD103 and cleaned with 1x AmPure beads. Qubit quantified barcode ligated DNA samples were pooled at an equimolar concentration to attain a 1 μg pooled sample. Adapter ligation (BAM), cleaning of library mix and elution of sequencing library was done as per Kumar et al., 2021 [13] and was further used for whole-genome sequencing. The whole-genome library was prepared by using a Native Barcoding kit (EXP-NBD103). Barcode sequences are detailed in the (S1 Table). The sequencing was performed using SpotON flow cell (R9.4) on MinKNOW 2.1 v18.05.5 with a 48 h sequencing protocol [14] on GridION X5 (Oxford Nanopore Technology (ONT, UK).

Data processing and analysis
The Nanopore raw reads ('fast5' format) were base-called ('fastq5' format) and demultiplexed using Albacore v2.3.1 and were uploaded to MG-RAST server (version 4.0.3) for taxonomic and functional analysis. Functional annotation by SEED subsystems helps in predicting the abundance of genes assigned to metabolic pathways in soil. The sequenced reads were interpreted using a multisource non-redundant ribosomal RNA database for taxonomic diversity. They were determined using the contigLCA algorithm against the M5NR database for samples analyzed via whole-genome sequencing (WGS) (MG-RAST metagenome MGB-2 and MGB-3 identification numbers = mgm4822000.3, mgm4822001.3). The interpretation was based on E-value cut-off = 1 x e -5 and sequence identity of 60% [15,16]. Raw reads of whole genome metagenome shotgun sequence of two sample MGB-2 and MGB-3 were deposited to the NCBI Sequence Read Archive under BioProject PRJNA765179 with the accession numbers SRR16004303 and SRR16004304, respectively.

Statistical analysis
Various alpha diversity indices were calculated to study species richness and evenness of the MGB-2 and MGB-3 using PAST4.03 software. The principal component analysis (PCA) plot was constructed using Bray-Curtis matrices with R studio v3.1.2. Comparison of samples MGB-2 and MGB-3 was done using Statistical Analysis of Metagenomic Profiles software [17] with a twosided G-test (w/Yates'+ Fischer's). Comparative metagenome analysis was done mainly with RefSeq and SEED subsystem to obtain genus/functional abundance, respectively. Cytoscape software v3.7.1 was used to generate networking plots for the study of the interaction of microbial communities of MGB-2 and MGB-3 involved in xenobiotic biodegradation pathways.

Physico-chemical analysis of the MGB-2 and MGB-3
Microbial community structure and function are determined by various environmental factors, including nutritional status and other parameters, such as pH, salinity, presence of metals, and various physicochemical parameters. Therefore, the physicochemical properties of the MGB-2 and MGB-3 were determined and are summarized in Table 1. The sample MGB-3 was richer in terms of macronutrients such as carbon (C), nitrogen (N), and phosphorus (P) which greatly influenced the composition of the microbial community. The organic carbon content, representing the energy flow in the carbon cycle, was 0.85% (slightly high) in MGB-2 and 1.39% (high) in MGB-3 compared to reference values. Our results indicated high carbon content in the given samples because of aromatic organic hydrocarbons present in the contaminated soil. Industrial soil and effluent are considered sources of organic contaminants including POPs like PAHs [18] and PCBs. Because of the hydrophobic nature of these POPs, they tend to bind with the soil and hence add to the organic carbon content of the soil. The sample MGB-3 had very high nitrogen and phosphorus content (N, 734 kg ha -1 ; P, 56.9 kg ha -1 respectively) whereas MGB-2 had moderate nitrogen (430 kg ha -1 ) and low phosphorus (17.66 kg ha -1 ) content. MGB-3 exhibited high P and low C/P ratios, indicating the possibility of higher microbial diversity than the MGB-2. Several studies have confirmed that the soil with high microbial diversity has high P and low C/P ratios, while the environment with less P and high C/P ratios shows a low microbial diversity [19]. A higher level of these micronutrients in MGB-2 can be due to the high water content in the sample. Further, these results indicate that both MGB-2 and MGB-3 can support diverse microbial communities.
Overall, micronutrients, including K, Mg, Na, and Mn, were higher in MGB-2 than MGB-3. The level of K (1232 kg ha -1 ) and Mg (768 kg ha -1 ) was found to be higher than the reference value in MGB-2. The availability of inorganic nutrients serves for structural as well as catalytic functions. Therefore, the bacterial communities' total taxonomic and functional profile is predominantly driven by the availability of C and N and the presence of inorganic nutrients, i.e., Ca, K and Mg, to some extent [20]. The result of the metal analysis indicated that the MGB-3 had a comparatively higher metal content (Zn, Cu, and Fe) than the MGB-2 ( Table 1). The result also showed a high percentage of Mn (44.12 mg kg -1 ) in MGB-2 and Fe (14.98 mg kg -1 ) in MGB-3, providing a suitable environment for microbes to undergo anaerobic biodegradation. The high metal content in both samples could be due to the additives used in a steel factory. The presence of the metals in a soil sample can limit many microbial species, whereas they can support the survival and growth of metal tolerant species. The presence of metals is in accordance with several reports highlighting heavy metals and organic contaminants such as PAH/ PCB from iron and steel industrial soil sites [21]. In addition, Mn (IV) and Fe (III) are known to act as terminal electron acceptors that efficiently remove aromatic compounds from the soil. Fe is the most widely found cofactor involved in deoxygenation reactions in biodegradation studies. It has been reported that Fe containing dioxygenases is incorporated into the active site either as iron centre, Rieske [2Fe-2S] cluster or as heme prosthetic group during PAH and PCB biodegradation [22].

Determination of PAH and PCB residues in MGB-2 and MGB-3
Cities with long industrial history contribute to the addition of organic pollutants in soil [23,24]. Hence it was deemed fit to estimate the level of POPs mainly PAHs and PCBs, in given soil samples by GC-MS/MS using Scan/SIM mode. GC-MS/MS triple quadrupole allows detection at very low (femtogram) limits in the matrix through the use of even greater selectivity with selected reaction monitoring (SIM) mode. Based on the data obtained from GC-MS/ MS analysis, various PAHs and PCB in industrial soil samples were identified. The structure and the relative percentage abundance of PAH identified in MGB-2 and MGB-3 are given in Fig 1A. F (15.89%) and BkF (11.29%) were found to be dominant PAH species in MGB-2, while Pyr (18.7%) and BaA (17.69%) in MGB-3. High-molecular-weight (HMW; 4-6 rings)  PAHs namely BcPhe (0.51%), BkF (11.29%), F (15.89%), IcdP (6.63%), Per (5.23%), and Tpl (4.70%), predominated in the MGB-2 sediments, which were not detected in MGB-3 ( Fig 1B). Similar to our results, PAHs such as Indeno (1, 2, 3-cd) pyrene, benzo (k) fluoranthene, dibenzo (a, h) anthracene, chrysene, fluoranthene, acenaphthene, and fluorene have also been reported to be abundantly present in different operational units of steel industry [25].
In addition to PAHs, MGB-2 and MGB-3 were also found to be contaminated with PCBs such as biphenyl, 1,1'-biphenyl, 4,4'-dichloro (PCB-15) in MGB-2 and biphenyl, 1,1'-biphenyl 2,2',3,5',6 pentachloro (PCB-95) and 1,1'-biphenyl 2,2',3,3',6,6' hexachloro (PCB-136) in MGB-3 (S1 and S2 Figs) [26]. It is well studied that the carcinogenic risk increases with the molecular weight or the aromatic ring of PAHs [27] and with the chlorine atoms in the case of PCBs. However, the data of source of PCBs has been scarce in industrial regions. The present results corroborate with previous studies conducted on the Bhilai steel plant (Raipur, Chhattisgarh) soil, reporting the presence of PCBs ranging from di-chlorinated to hexachlorinated biphenyl in the sludge [28]. From the data of organic contaminants, it appears that the high carbon content in MGB-2 and MGB-3 mentioned in the previous section could be co-related with the abundance of PAH and PCB in the soil sample.

Metagenomic analysis
Whole genome sequencing and assembly summary. The metagenomic approach provides a complete picture of biodegradation vis-a-vis microbes present within the environment and the functional genes involved in the bioremediation of contaminants. Whole-genome metagenomics studies are used not only to study the taxonomic diversity but also to elucidate the metabolic pathways required for understanding pollutant degradation [29]. In the present study, the ONT platform for community analysis was used to enable unbiased assembly of complete genome sequencing [30]. Nanopore GridION X5 generated real-time, long-read, high-fidelity DNA sequence data. MG-RAST statistical analysis of dataset provided 275,844 sequences (totalling 539,360,072 bp; average length 1,955 bp) for MGB-2 and 193,221 (totalling 532,031,111 bp; average length 2,753 bp) for MGB-3. The downstream analyses of the total number of reads are detailed in S2 Table. The datasets were used for various taxonomic, ecological, and functional analyses as described in the previous section.
Analysis of sequence data for the extent of microbial diversity. The horizontal rarefaction curve indicated the significant sampling depth, representing sufficient sample coverage for diversity analysis (S3 Fig). MGB2 and MGB-3 comprised 1839 and 1884 species, respectively, indicating higher species richness in MGB-3 that is also evident from the data of Chao-1 and Shannon's diversity. The various diversity indices depicted in Table 2 suggest equivalent overall diversity that considers both richness and abundance. Beta diversity among MGB-2 and MGB-3 based on Bray-Curtis dissimilarity (Fig 2A) defined the overall distribution pattern of bacterial communities in MGB-2 and MGB-3 samples, with Principal coordinates showing 98.7%, bacterial communities' similarity in the two sites (MGB-2 and MGB-3).

PLOS ONE
Cenarchaeum (p = 5.18 e -3 ) in MGB-2 while Methanobrevibacterium (p = 0.012) in MGB-3 to predominate indicating more methanogenesis occurring in MGB-3 [39]. The contaminated soil also possesses many members of Methanosarcina, Halobacterium, Euryarchaeota, and Crenarchaeota of uncultured genera. The diversity of Archaea is found to be higher within hydrocarbon-degrading communities in the contaminated environment than the non-contaminated counterpart [40]. Relative abundance of Archaeal communities were more in MGB-2, which was associated with the higher soil moisture and POPs content.
Functional diversity and metabolic potential of MGB-2 and MGB-3. The SEED subsystem analysis in MG-RAST assigned reads based on various functions that identified 60% of the total function constituting metabolisms of amino acid, carbohydrate, energy, lipid, cofactors, vitamins, biosynthesis of glycan, polyketides, terpenoids, xenobiotic biodegradation, and biosynthesis of secondary metabolites. The remaining 40% functions include cellular processes, organismal systems, genetic & environmental processing, and human diseases.
The comparative analysis of gene sequences revealed the abundance of P transporter in MGB-2, which can correlate with the low P content than MGB-3. Various genes encoding proteins involved in phosphate-recycling mechanisms, such as phnA, phnE, phnW and phnX, (phosphonate transporters), pstA, pstB, pstC, and pstS (high-affinity phosphate transporters), and phoR, phoA, phoP and phoD (two-component systems) were detected across both the metagenome (S5 Fig). The availability number and abundance of reads for phosphate mechanism in low phosphate-phosphorous environments indicated these mechanisms that help them cope within such environments. Moreover, bacteria can survive due to resistant genes toward toxic metals such as copper, lead, and nickel as a part of their defense mechanism, which is recruited further for cleaning the contaminated environments. The high abundance of Mn content in MGB-2 can be correlated with the significantly high reads for Mn transporter in MGB-2 than MGB-3. Also, a comparison between MGB-2 and MGB-3 of functional gene annotation using SEED subsystem for membrane transport revealed a significant level (p > 0.05) of abundance for Na + /H + antiporters and Mn transporter MntH, Mn ABC transporter SitD, TadA, Zn ABC transporter ZnuA (S5 Fig). MGB-3 showed higher Na + /H + transporters that/ which indicate the exchange of the ions across the membrane to maintain homeostasis.

PLOS ONE
linked with xenobiotic biodegradation and metabolism were identified though at varying levels of abundance (S6 Fig) in two samples. The annotated pathways for degradation and metabolism of xenobiotic compounds, including POPs enlisted by the US Environmental Protection Agency, accounted for 2% of the 60% metabolic functions (Fig 4A). The presence of reads for biphenyl degradation, dioxin degradation, PAH degradation pathways can be further correlated with the presence of PCBs and PAHs as detected in the MGB-2 and MGB-3 samples ( Fig  4B and 4C) Ko00624] degradation pathways were also observed in MGB-2 (3.27%; 2.54%) and MGB-3 (4.29%; 2.80%), respectively. Similar to our observations, the presence of chlorobenzene, PAH, PCB, and benzoate are known to be prominent contaminants of dye and steel industries [9,10]. The annotated genes encoding enzymes, namely chqB, pcpB, pheA, clcD, hadL, bedC1/todC1,bphC,catA, and catB, highlighted the degradation of chlorocyclohexane and chlorobenzene compounds within the two communities (Table 4A). The presence of the above-mentioned enzymes suggests that the degradation of chlorobenzene is catalyzed via the ortho -cleavage pathway in two communities [42], and the meta-cleavage pathway [43].

PLOS ONE
The degradation pathway mainly of PAHs and PCBs is highlighted in the present study as the collected soil samples MGB-2 and MGB-3 were found to be highly contaminated with these organic pollutants. The annotated genes encoding enzymes required in PAH degradation pathway in MGB-2 and MGB-3 are listed in Table 4B. The presence of genes encoding hydroxychromene-2-carboxylate isomerase (nahD), naphthalene dioxygenase (nahAc), salicylate hydroxylase (nahG), and other genes including nidA, nidB, nidD, phdF, phdG, phdI, and phdJ indicated the presence of a complete pathway of PAH degradation. The presence of reads for nidA gene could be correlated with the degradation of pyrene [5], which was found to be abundantly present in MGB-3 as detected by GC-MS/MS. There are several reports indicating nidA gene being responsible for the synthesis of the large subunit of PAH dioxygenase involved in the degradation of PAHs such as phenanthrene, pyrene, benzo[a]pyrene, etc. [47]. The pathway analysis demonstrated that the enzymes involved in the PAH biodegradation pathway were affiliated to the members of genera Pseudoalteromonas, Aromatoleum, Dechloromonas, Agrobacterium, Mesorhizobium in MGB-2, while Mycobacterium, Parvibaculum, Ruegeria, Burkholderia, Aromatoleum, Bradyrhizobium in MGB-3 communities. The contribution of enzymes for degradation by different genera suggests synergistic degradation of these PAH.
The annotated reads for various genes, including bphA, bphC, bphD, bphE, and bphF, corresponding to the biphenyl degradation pathway, were identified in both samples in Table 4B. The presence of bphA in both metagenomes MGB-2 and MGB-3 indicated biphenyl degradation assigned to genus Mycobacterium in both samples and Polaromonas in MGB-3. Biphenyl    Fig 6B). The greater abundance of these genes and genera in sample MGB-2 compared to sample MGB-3 suggests a higher degrading capacity in sample MGB-2. However, the key biodegraders were Bradyrhizobium, Burkholderia, Mycobacterium, and Rhodopseudomonas in both the metagenome.

Conclusions
The present metagenomic study highlighted microbial function annotation, extensive degradation capabilities in terms of xenobiotic degradation pathways and correlated with the presence of PAH and PCB in the contaminated soil of steel plants. In addition, physicochemical profiling of the soil samples provided valuable information regarding the presence of organic (C/N/P), inorganic nutrient (Ca, K, Mg, Na, Mn), and metal (Fe, Mn, Cu, Zn) present that can be an essential parameter for designing biodegradation strategies. Higher proportions of Proteobacteria and Actinobacteria indicated the two samples possess good biodegradation potential. Moreover, the coordination among different biodegraders and the presence of functional genes involved in biodegradation pathways and energy metabolism has provided an in-depth understanding of their survival under stress conditions of persistent organic pollutants. Moreover, these potential biodegraders such as Bradyrhizobium, Burkholderia, Mycobacterium, Rhodopseudomonas, and Pseudomonas identified in the present study can be further selected and further could be exploited for enhancing bioremediation. Therefore, it can be concluded that investigating microbial community and exploring their potential for biodegradation is a critical factor in maximizing the efficacy of the bioremediation process.
Supporting information S1 Table. Barcodes used for sequencing.
(JPG) S2 Table. The total number of unassembled and assembled reads used for downstream analyses.