Diversity and functional prediction of microbial communities involved in the first aerobic bioreactor of coking wastewater treatment system

The pre-aerobic process of coking wastewater treatment has strong capacity of decarbonization and detoxification, which contribute to the subsequent dinitrogen of non-carbon source/heterotrophic denitrification. The COD removal rate can reach > 90% in the first aerobic bioreactor of the novel O/H/O coking wastewater treatment system during long-term operation. The physico-chemical characteristics of influent and effluent coking wastewater in the first aerobic bioreactor were analyzed to examine how they correlated with bacterial communities. The diversity of the activated sludge microbial community was investigated using a culture-independent molecular approach. The microbial community functional profiling and detailed pathways were predicted from the 16S rRNA gene-sequencing data by the PICRUSt software and the KEGG database. High-throughput MiSeq sequencing results revealed a distinct microbial composition in the activated sludge of the first aerobic bioreactor of the O/H/O system. Proteobacteria, Bacteroidetes, and Chlorobi were the decarbonization and detoxification dominant phyla with the relative abundance of 84.07 ± 5.45, 10.89 ± 6.31, and 2.96 ± 1.12%, respectively. Thiobacillus, Rhodoplanes, Lysobacter, and Leucobacter were the potential major genera involved in the crucial functional pathways related to the degradation of phenols, cyanide, benzoate, and naphthalene. These results indicated that the comprehensive understanding of the structure and function diversity of the microbial community in the bioreactor will be conducive to the optimal coking wastewater treatment.


Introduction
Coking wastewater is generated from the high-temperature carbonization of raw coal, coal gas purification, and refinement of the products generated during coke production. In coking wastewater, the pollutants are typically recalcitrant, highly toxic, and carcinogenic, including inorganic compounds such as ammonia, cyanide (CN − ) and thiocyanate (SCN − ), and organic compounds, such as phenolic compounds, polycyclic aromatic hydrocarbons (PAHs), polycyclic nitrogen-containing aromatics, as well as oxygen-and sulfur-containing heterocyclics [1,2]. Coking wastewater is regarded as one of the most toxic industrial effluents, which can cause severely adverse effects on the environment [3]. As the most economical and environmentally friendly treatment process, the activated sludge technique has been widely used to remove pollutants from coking wastewater. Previously, a novel coking wastewater biological treatment process O (first aerobic bioreactor)-H (hydrolytic bioreactor)-O (second aerobic bioreactor) could achieve simultaneous removal of contaminants, especially, the first aerobic bioreactor has a high capacity of decarbonization and detoxification [4]. However, the microbial diversity and function involved in the first aerobic bioreactor of the O/H/O treatment system have not yet been precisely characterized. Compared to the use of anaerobic activated sludge at the start of the coking wastewater biological treatment process, aerobic activated sludge is more resistant to various toxic pollutants and conducive to significant organic pollutants degradation [5]. Based on our previous research, the first aerobic bioreactor of the O/H/O system can remove more than 90% of the chemical oxidation demand (COD) and the toxic compounds [6]. Aerobic activated sludge can provide a suitable environment for the microorganisms to grow that can improve the efficiency of the removal of pollutants [7,8]. Therefore, the prefixed aerobic process is the key step for high-loading decarbonization, partial degradation of refractory and inhibitory organic compounds to reduce the loading of subsequent processes. However, the bioreactor has not been optimized because of our limited knowledge of diversity and function of the microbial communities involved in this bioreactor.
In view of the powerful pollutant removal capacity of the first aerobic bioreactor, and the significance of the sequent dinitrogen process design, in this study, we tried to illustrate the diversity and function of the microbial communities involved in the first aerobic bioreactor of a full-scale O/H/O coking wastewater treatment system. Pollutant removal efficiency was measured, and the operating parameters were monitored during long-term stable operation. Illumina MiSeq high throughput sequencing was performed to reveal the relative abundance and diversity of the microbial community. Phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) was used to predict the function based on 16S rRNA gene information. The aims of this work were to illustrate the diversity of the microbial communities, identify the dominant members of the bacterial community responsible for the removal of major pollutants, and predict the function of the dominant bacterial taxa. Our results provided a scientific basis for the development of new control strategies on nitrogen removal during coking wastewater treatment.

The first aerobic bioreactor of the O/H/O system and sample collection
Coking wastewater activated samples were collected from the plant with permission of the administrator in the Tianjin Iron and Steel Corporation, Shexian Hebei Province of China. This article does not contain any studies with human, animals or endangered or protected species.
The first aerobic bioreactor of a full-scale three-phase fluidized bed biological O/H/O coking wastewater treatment system in the Tianjin Iron and Steel Corporation, Shexian Hebei Province of China, was studied. To relieve the accidental event shock, this system contains two separate, independent parallel subsystems, named south and north subsystem, respectively. This O/H/O coking wastewater treatment system has been in operation for 7 years and can maintain a stable performance during operation. This system has the ability to process about 2,400 ± 160 m 3 daily.
Activated sludge samples were collected in triplicate on different days from the corresponding south and north subsystems of the first aerobic bioreactor in May 2018. All samples were placed in ice boxes during the sampling and transportation to the laboratory. Aliquots (2-3 mL) of each sample were centrifuged at 12,000 g for 10 min at 4˚C. The cell pellets were washed twice with sodium phosphate buffer (120 mM, pH 8.0) and stored at −20˚C before DNA was extracted.

Physico-chemical analysis
The collected coking wastewater samples were centrifuged at 3,500 g for 3 min. Subsequently, the supernatants were used for analysis of COD, biochemical oxygen demand (BOD), total suspended solids (TSS) by the standard spectrophotometric methods (APHA 2005). Mixed liquor suspended solids (MLSS) was measured according to standard methods (APHA 2006). The contents of ammonium, phenolic compounds, sulfides (S 2− ), total phosphorus (TP), and total nitrogen (TN) were analyzed by the colorimetric methods with a spectrophotometer (Genesys TM-5, Spectronic Inc., USA) [9]. The concentration of CN − was measured by the pyridine-pyrazolone method after distillation [9]. SCN − was measured using a UV-vis spectrophotometer (UV-1800, Shimadzu, Japan) by the colorimetric method with ferric nitrate [10]. The total organic carbon (TOC) was measured by TOC analyzer (Shimadzu TOC-VCPH, Japan). The dissolved oxygen (DO) concentration was measured using a DO meter (CellOX 3310i, WTW, Germany), and the pH was monitored with a pH meter (PHS-3D, Shanghai Precision & Scientific Instrument Co. Ltd., China). Concentrations of PAHs were determined using the Agilent 7890A GC equipped with 5975C mass selective detector with a 30 m × 0.25 mm i.d. × 0.25 μm film fused silica capillary column (HP-5 MS, Agilent Technology) under the selected ion mode.
Reaction efficiency and microbial population were directly affected by the design of bioreactor. In a previous study, an internal-loop multiphase airlift fluidized-bed reactor has been developed to deal with high COD load rates (> 2.00 kg COD/m 3 d) [11], which was applied in this study. The internal-loop design ensured efficient mixing and mass transfer with low energy consumption [12]. The average pH of the first aerobic bioreactor was kept at 7.35 ± 0.11. Long-term operating temperatures were relatively constant between 21 to 25˚C ( Table 1) and the DO level was maintained in the range of 2.95-3.57 mg/L. Sludge retention time (SRT) was 8 ± 2 days. The hydraulic retention time (HRT) was 60 ± 8 h, and the concentrations of activated sludge ranged from 3,700 to 4,700 mg/L. The operational parameters of the first aerobic bioreactor are listed in Table 1.

MiSeq sequencing of 16S rRNA genes
Microbial DNA was extracted using the PowerSoil TM DNA isolation kit (Mobio, USA) according to the manufacturer's instructions. The DNA quality was assessed by the ratios of 260/280 nm and 260/230 nm absorption measured by the ND-2000 spectrophotometer (Thermo Fisher Scientific, USA), and agarose gel electrophoresis. The V4 hypervariable region of the 16S rRNA gene was PCR-amplified (triplicate reactions for each sample) using primers F515 (5'-GTGCCAGCMGCCGCGGTAA-3') and R806 (5'-GGACTACVSGGGTATCTAAT-3'), following the procedures described by Zhu et al. (2016) [13]. These primers are universal for almost all bacterial and archaeal taxa [14,15]. The obtained sequences were deposited in the NCBI Sequence Read Archive under the accession number PRJNA615645.

Sequence analysis
The composition of the PCR products of the V4 region of 16S rRNA genes was determined by Illumina MiSeq PE300 sequencing platform. Samples were individually barcoded to enable multiplex sequencing. After MiSeq sequencing, the raw data was processed and analyzed following the pipelines of Mothur (v1.35.1) [16] and QIIME v 1.7 (quantitative insights into microbial ecology) [17]. Operational taxonomic units (OTUs, defined at the 97% sequence similarity level) were picked using the average neighbor method after Needleman alignment and a single-linkage pre-cluster procedure. Taxonomic classification was performed using the RDP Classifier at 80% confidence threshold, by default. Alpha-diversity (number of OTUs, Chao 1, and Simpson and Shannon indices) was statistically analyzed via rarefaction, with all data sets normalized to 17,668 reads per sample. Good's coverage estimators representing subsample coverage were calculated using the Mothur software. The functional microbiota content was predicted based on the 16S rRNA gene sequencing data using the PICRUSt (v 1.0.0.6) software [18]. This analysis was derived from the observation of an association between phylogeny and gene content. For the analysis, OTUs were closed-reference picked against the GreenGenes (released 13.5) database using QIIME v 1.7 according to the online protocol [19], removing the OTUs not matching the GreenGenes 13−5 reference sequences at 97% similarity. The OTUs were normalized by dividing their relative abundance values by known or predicted 16S rRNA gene copy numbers prior to final metagenomic predictions. The resulting data set was rarefied at 17,668 16S rRNA sequences per sample. Predicted functional counts were rarefied to the same depth, and relative abundance analyses were calculated by comparing Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/) orthology values (levels 1, 2, and 3) in activated sludge samples from the first aerobic bioreactor. The potential metabolic functions for pollutant removal were established by the comparison of all sequences from the coking wastewater activated sludge to the KEGG using functional assignments from PICRUSt. The accuracy of metagenome predictions was measured by the Nearest Sequenced Taxon Index (NSTI), whose lower values indicate a closer mean relationship [18]. For each OTU in a sample, the sum of the branch lengths between that OTU in the GreenGenes tree to the closest tip in the sequenced genomic tree was weighted by the relative abundance of the OTU. Subsequently, all OTU scores were added up to obtain a single NSTI value for each microbiome sample. The NSTI values for each sample were calculated by PICRUSt, and NSTI scores and PICRUSt accuracies for all metagenome validation datasets were compared.

Data analysis
The statistical significances of the differences in the concentrations of physico-chemical pollutants, the microbial diversity indices, and the abundances of phyla, genera, OTUs, and gene functions were analyzed by independent-samples student t tests with the SPSS Statistics Software version 25 (IBM), considering statistically significant differences those with p value < 0.05. The stacked bar plots and interleaved bar plots were constructed by GraphPad Prism (v 8.0.2). The heat map was constructed based on OTUs with relative abundance > 0.5% in at least one sample using the pheatmap packages of R software (v 3.6.1). Redundancy analysis (RDA) based on genera and pollutants was conducted using the vegan package and the ggplot2 package of R software (v 3.6.2).

Physico-chemical characteristics of the first aerobic bioreactor
The characteristics of the coking wastewater were defined and quantified with 13 coking wastewater indices, including COD, BOD, phenols, CN − , SCN − , ammonia, TN, TP, TSS, TOC, sulfides, oil, and PAHs. Physico-chemical characteristics of influent and effluent in the first aerobic bioreactor were measured as the efficiency indicator ( Table 2). Based on the results of the independent-samples t tests, the removal efficiencies for all contaminants did not significantly differ between the south and the north subsystems (p > 0.05). The influent and effluent COD concentrations in the bioreactor were 5,167 ± 270 mg/L and 492 ± 50 mg/L, respectively. It was observed that the COD removal rate could be maintained at above 90% with high organic loading in the bioreactor. The influent BOD was 1,993 ± 185 mg/L, and its removal rate reached 97.19 ± 0.38%. It should be noted that the BOD/COD ratio of the influent was 0.38 ± 0.02, and the corresponding value in the effluent was 0.11 ± 0.01, indicating most of the compounds were degraded. Phenolics, including phenol, o-, m-, and p-cresol, are the main organic constituents of coking wastewaters and account for most of the total COD [20]. The phenols removal efficiency reached 99.59% . SCN − is a CN − derivative with lower toxicity [21], and SCN − removal efficiency reached 92.49 ± 0.50%. Compared with the removal of phenol and CN − , SCN − degradation is the slowest and most sensitive process. Therefore, it determines the HRT required for the treatment of coking wastewater [22]. The concentration of CN − decreased from 39.0 ± 7.0 mg/L to 0.9 ± 0.6 mg/L, and the removal rate was up to 97.95 ± 1.21%. In previous studies, CN − not only significantly inhibited degradation of SCN − [22], but was also one of the most toxic chemicals for living organisms [23,24].
The removal efficiency for PAHs, a typical group of toxic pollutants, was 52.74 ± 2.03%. In addition, sulfide concentration decreased from 149.0 ± 9.0 mg/L to 12.4 ± 4.6 mg/L after the bioreactor, and the removal rate reached 91.9 ± 2.8%. The pH values ranged between 7.24 and 7.46. On the contrary, previous study showed that the prefix anaerobic bioreactor of traditional treatment processes had limited pollutant removal efficiency. One such example is the A/A/O treatment process that had low pollutant removal efficiency and the effluent quality did not meet the wastewater discharge standard in China [25]. Some literatures also reported below 35.00% COD removal by mesophilic anaerobic treatment [13,26,27]. In A/O/O system, the CN − removal rate was below 64.71% [13,28]. These results confirmed that the prefix aerobic bioreactor can remove > 90% of the COD, including toxic compounds of SCN − and CN − , with a higher efficiency than the prefix anaerobic bioreactor of traditional treatment processes, such as

Phylogenetic diversity analysis of sequences
MiSeq sequencing was used to analyze the bacterial and archaeal 16S rRNA genes across the six samples from the coking wastewater activated sludge of the first aerobic bioreactor. In total, 72,034 and 74,132 effective sequences were retrieved from the north and the south subsystem, respectively. The effective sequences and OTUs of each sample are presented in Table 3. We assigned these sequences into different OTUs, using RDP Classifier with 3% of nucleotide cutoff. The OTUs of activated sludge samples from N1, N2, N3, S1, S2, and S3 were identified based on a subset of 17,668 randomly selected sequences, with 539, 471, 543, 464, 510, and 530 OTUs, respectively. The effective sequences and OTUs were not significantly different between the north and the south subsystems of the first aerobic bioreactor (t test, p = 0.300 and 0.300). Results of good's coverage in each sample ranged from 99.06 to 99.25%, indicating that sampling depth was sufficient to accurately characterize the bacterial community ( Table 3). The rarefaction curves of OTUs from six activated sludge samples were relatively flat, demonstrating that the sequencing data were reasonable (S1  Table 3). Based on these results, the microbial diversity in the activated sludge was moderate, although more complex than that in some extreme environments, such as hydrothermal springs [30] and acid mine drainage [31], but simpler than that of activated sludge from WWTPs [32,33] and soil [34]. Compared with the first anaerobic bioreactor of the traditional coking wastewater treatment A/O/O system, the bioreactor evaluated here had higher values of the Shannon-Weaver index (H), Simpson, and Chao1 [13]. The microbial community in the bioreactor should acclimatize to the operating conditions. Because the first treatment process can switch from anaerobic to aerobic conditions with different DO concentrations, the microbial community richness and diversity in the sludge samples changed dramatically, and the microbial function would make the adaptive to the environment [35].

Taxonomic complexity of the microbial community
The effective sequences were assigned into different phylogenetic taxa by RDP Classifier. All classifiable MiSeq sequencing reads across the samples in the north and the south subsystems were assigned to the domain of bacteria. To reveal the microbial community structure and function, the classified sequences were analyzed at phylum, order, family, genus, and OTU levels. There were 23 bacterial phyla in the activated sludge samples taken from the bioreactor, including five classes in the Proteobacteria (S1 Table). Phyla with relative abundance > 0.5% in at least one samples were defined as major phyla, and phyla that accounted for a proportion of < 0.5% were defined as minor phyla (Fig 1A). The classified phylum results showed that the major phyla were represented by Proteobacteria, Bacteroidetes, and Chlorobi, without significant difference between the two subsystems (p > 0.05). The β-Proteobacteria related sequences were the predominant class, with 64.49 to 75.98% relative abundance. According to previous studies, β-Proteobacteria is mainly responsible for organic matter and nutrient removal [36]. The genera Thauera and Thiobacillus, belonging to β-Proteobacteria can break down aromatic compounds and biotransform SCN − , respectively [37,38]. The relative abundance of α-Proteobacteria and γ-Proteobacteria classes were 11.04 ± 1.32% and 3.49 ± 0.63%, respectively, which played important roles in carbon mineralization in coking wastewater [7,39]. The subdominant phyla were Bacteroidetes (10.89 ± 6.31%) and Chlorobi (2.96 ± 1.12%). Members of Bacteroidetes can degrade residual recalcitrant substances and high-molecular-weight organic compounds [40,41].
To deepen our understanding of the microbial community composition, the 23 identified phyla were further analyzed, and 79 genera were identified (S3 Table). Relative abundance of genus > 0.1% in at least one sample was defined as major genus. Taxa that account for a proportion of < 0.1% were defined as minor genus (Fig 1B). It is possible that these rare taxa may gain predominance during environmental perturbations or toxic shock events and thus are important for the stable performance of coking wastewater treatment systems [13]. The most prevalent genus was Thiobacillus, with relative abundance of 5.44 ± 1.28%, which is known for its sulfur metabolism and related to the biodegradation of SCN − in coking wastewater treatment systems [42]. In addition, in another study, SCN − was aerobically degraded to a combination of ammonia, carbonate, and sulfate by several species of the genus Thiobacillus, using SCN − as a carbon and nitrogen source [10]. The occurrence and role of Rhodoplanes have not frequently been reported in coking wastewater treatment processes [43], but it was a subdominant genus with relative abundance of 4.28 ± 1.08% in this study. Some strains of this genus in municipal wastewater treatment plants can grow on organic carbon sources, such as acetate and pyruvate [44]. The genus Leucobacter ranked third, with relative abundance values of 0.58 ± 0.41%. Previous metagenomics and RT-qPCR studies have indicated that the Leucobacter strain GP is responsible for the initial ipso-hydroxylation of the parent molecule through a sulfonamide monooxygenase encoded by a gene homolog [45]. Nitrospira were the major nitrite-oxidizing bacteria (NOB), with relative abundance of 0.34 ± 0.04%, involved in nitrogen removal and carbon consumption [46][47][48]. The relative abundance of NOB is typically below 1% in activated sludge from coking wastewater treatment plants [27,43,49]. Unclassified genera belonging to the order Burkholderiales represented the predominant sequences, accounting for 62.97 ± 6.40% at the genus level (S3 Table). At this level, nearly three quarters of the sequences could not be assigned, suggesting the existence of numerous novel genera in the first aerobic bioreactor of the O/H/O coking wastewater treatment system. Most abundant genera showed no significant difference between the two subsystems (p > 0.05). The minor differences may be attributed to the elasticity of the biological diversity, but will not affect their function [50].
Examination of the identified 117 orders in the first aerobic bioreactor further captured the structure of microbial communities at the order level (S2A Fig and S2 Table). The predominant order was Burkholderiales, with relative abundance of 64.53 ± 6.53%. This order plays a key role in the removal of aromatic compounds, such as phenol, benzoate, toluene, naphthalene, and phenanthrene [51][52][53][54]. The subdominant order Flavobacteriales occupied 9.84 ± 5.74% relative abundance, while that of the order Rhizobiales was 6.38 ± 1.81%. Most aromatic pollutants, including benzene, toluene, ethylbenzene, xylene, and xenobiotic related compounds could be degraded mainly by Rhizobiales, Burkholderiales, Actinomycetales, as reported in a previous study [7].
We determined 192 families in the first aerobic bioreactor (S2B Fig and S2 Table). The distribution of the 21 major OTUs (with relative abundance > 0.5% in at least one sample) is shown in Fig 2 and S4 Table. Comamonadaceae was the predominant bacteria at level of family and OTU, with relative abundance of both more than 60.00%. Four major OTUs, including OTU_846710, OTU_940737, OTU_838837, and OTU_654788, were classified to this family. Our previous study showed that Comamonadeceae played an important role in the biodegradation of phenols, SCN − , CN − , PAHs, and heterocyclic aromatics in coking wastewater [42]. In addition, the family Comamonadeceae is characterized for the endurance of the toxic compounds of SCN − , CN − , phenols, PAHs, etc. [55]. Members of Comamonadeceae can also promote PAHs degradation in other organisms, such as species of the strain Comamonas testosteroni, which could grow on phenol and its derivatives as carbon and energy sources Color intensity represented the relative abundance of OTUs, with the darker roseate the higher the relative abundance and the darker royal blue the lower the relative abundance. The OTUs of relative abundance > 1% are marked by the star key. Refer to Table 1 for N and S. https://doi.org/10.1371/journal.pone.0243748.g002 [56,57]. Two major OTUs (OTU_4300908 and OTU_4424932) belonged to the family Weeksellaceae, which was the subdominant family with 8.43 ± 5.54%. As an aerobic chemoorganotrophic pathogenic family, Weeksellaceae may pose health risks for workers [58]. Nitrosomonadaceae were the dominant ammonium oxidizing bacteria (AOB) with 0.18% relative abundance in the bioreactor of coking wastewater treatment [43,59]. Relative abundance of family Ignavibacteriaceae was more than 2.59%. Interestingly, the Ignavibacterium of Ignavibacteriaceae played an important role in the nitrogen removal as a newly identified AOB, which increased the abundance and activities of AOB [60,61]. The long SRT can be conducive to the reproduction and evolution of specific functional microbial populations, such as anammox bacteria, AOB, and NOB [24,62,63]. In a previous study, family Cryomorphaceae played an indispensable role in the carbon and energy cycles of the marine environment [64]. Cryomorphaceae may also play an important role in COD removal in coking wastewater, although this remains to be proved.

Predictive functional profiling
Function prediction was done to investigate the presence of function pathways related to the biological treatment performance in the first aerobic bioreactor. The north and south subsystems showed satisfactory NSTI values of 0.110 ± 0.002 and 0.114 ± 0.006, respectively. These NSTI of the activated sludge were lower than those recorded for diverse communities such as soil inhabitants 0.170 ± 0.020 [18]. Therefore, the analyzed activated sludge samples from the first aerobic bioreactor provided a data set suitable for examination of PICRUSt predictions.
KEGG pathway database was used to analyze the biodegradation pathways of contaminants. The complex biological behaviour and function of genes can also be intensively and accurately investigated [65]. There were overall 39 of 43 second-level KEGG Orthology categories (KOs) in the predicted metagenomes (Fig 3). Predicted ammonia oxidation gene (amoA) just possessed the relative abundance of 0.0002% in nitrogen metabolism pathway. Moreover, xenobiotics biodegradation and metabolism pathway had a relatively high relative abundance of 4.74 ± 0.09%, and were involved in the removal of organic exogenous contaminants from coking wastewater. Therefore we further focused on the xenobiotics biodegradation and metabolism pathway. There was relative abundance of the 20 specific metabolic sub-pathways involved in xenobiotics degradation and metabolism pathway (Fig 4).
Considering the massive KOs number of xenobiotics degradation and metabolism pathways in KEGG, the top 10 KOs were picked for the analysis, including K01692, K00799, K00257, K00626, K00128, K00100, K01951, K01426, K01061, and K00121, which related to major xenobiotics degradation and metabolism pathways, such as benzoate, aminobenzoate, caprolactam, naphthalene and styrene degradation pathway. The predominant sub-pathway of benzoate degradation (20.92 ± 0.25%) was known to play a role in the degradation of exogenous phenolic compounds, which contribute to the majority of COD in coking wastewater (Fig 4). The predictive results of this pathway showed that at least 21 enzymes participated in benzoate degradation (S3 Fig). Benzoate degradation is known to play a role in the degradation of a variety of aromatic compounds [66]. The aminobenzoate degradation pathway was an important xenobiotics biodegradation and metabolism pathway that was related to the degradation of phenol and phenyl cyanide (S4 Fig). Phenyl cyanide belongs to a type of cyanides that contributes to the high toxicity of coking wastewater. Relative abundance of caprolactam degradation pathway ranked third, and four enzymes were predicted in this pathway related to decarbonization (S5 Fig). Naphthalene degradation pathway is related to PAHs, such as naphthalene and its derivatives (1-methylnaphthalene and 2-methylnaphthalene, 1-hydroxymethylnaphthalene, and 2-naphthalenemethanol) (S6 Fig). Naphthalene 1,2-dioxygenase (E 1.14.12.12) of this pathway, possessing a wide substrate specificity, permits the cis-hydroxilation of a variety of aromatic compounds including HMW-PAHs [67]. Two major KOs contributed to the styrene degradation pathway, which can degrade the toxic pollutant CN − , such as in benzyl cyanide and vinyl cyanide (S7 Fig). In the bioreactor, predicted pathways of benzoate degradation, caprolactam degradation, and naphthalene degradation were essential to carbon degradation, such as phenolic compound. Predicted pathways of aminobenzoate degradation and styrene degradation were essential to toxic substance degradation, such as cyanide. The above results indicated that the first aerobic bioreactor could provide function pathways involved in COD removal and detoxification.

Relationships between major genera and enzymes
Based on literature research, several microbial genera, including Thiobacillus, Comamonas, Burkholderiales, Pseudomonas, Ottowia, and Corynebacterium, are related to the degradation  Table 1  of typical pollutants in coking wastewater, such as phenol, SCN − , NHCs, and PAHs, while the biodegradation pathway has not been clearly described [68]. In this study, we attempted to illustrate the relationships of major taxa and enzyme-coding genes in the first aerobic bioreactor of the O/H/O coking wastewater treatment system. Our predicted results are presented in S5  (S3-S7 Figs). Combined with the representative functional pathways mentioned, we inferred that these genera may be responsible for the removal of phenols, CN − , and degradable aliphatic hydrocarbon (S8 Fig). Because of the high removal rates of phenols, CN − , and COD in the bioreactor, the identified major genera may contribute in the degradation of these pollutants and in reducing the toxicity of coking wastewater.

Conclusions
We characterized the diversity and function of the microbial communities involved in the first aerobic bioreactor of an O/H/O coking wastewater treatment system. Proteobacteria, Bacteroidetes, and Chlorobi constituted the main decarbonization and detoxification phyla in the prefix aerobic bioreactor, with β-Proteobacteria and α-Proteobacteria existing, unexpectedly, as  Table 1  the most abundant classes. Compared to the first anaerobic bioreactor of the traditional A/O/ O process, γ-Proteobacteria was the predominant class. Our results showed that the community structure was significantly different between the first aerobic bioreactor and the anaerobic bioreactor. Thiobacillus, Rhodoplanes, Lysobacter, and Leucobacter were the major genera and might be responsible for decarbonization. Functional genes involved in the degradation of benzoate, phenyl cyanide, naphthalene, and vinyl cyanide were enriched. The dominant bacterial communities may facilitate pollutant degradation in the first aerobic bioreactor. Our results provide a speculated evidence for the refinement of the O/H/O systems and for the development of efficient strategies for coking wastewater treatment. In the future study, we would focus on controlling the threshold of complete ammonification/partial nitrification/ complete nitrification, performed by microbes in the first aerobic bioreactor of coking wastewater treatment. Orders/ families with relative abundance > 1% at least one of samples were defined as major orders/ families, and orders/families that accounted for a proportion of < 1% were defined as minor order/family. The average values of the three samples were calculated to represent the value of the corresponding sample. Refer to Table 1