Characterization of Wastewater Treatment Plant Microbial Communities and the Effects of Carbon Sources on Diversity in Laboratory Models

We are developing a laboratory-scale model to improve our understanding and capacity to assess the biological risks of genetically engineered bacteria and their genetic elements in the natural environment. Our hypothetical scenario concerns an industrial bioreactor failure resulting in the introduction of genetically engineered bacteria to a downstream municipal wastewater treatment plant (MWWTP). As the first step towards developing a model for this scenario, we sampled microbial communities from the aeration basin of a MWWTP at three seasonal time points. Having established a baseline for community composition, we investigated how the community changed when propagated in the laboratory, including cell culture media conditions that could provide selective pressure in future studies. Specifically, using PhyloChip 16S-rRNA-gene targeting microarrays, we compared the compositions of sampled communities to those of inocula propagated in the laboratory in simulated wastewater conditionally amended with various carbon sources (glucose, chloroacetate, D-threonine) or the ionic liquid 1-ethyl-3-methylimidazolium chloride ([C2mim]Cl). Proteobacteria, Bacteroidetes, and Actinobacteria were predominant in both aeration basin and laboratory-cultured communities. Laboratory-cultured communities were enriched in γ-Proteobacteria. Enterobacteriaceae, and Aeromonadaceae were enriched by glucose, Pseudomonadaceae by chloroacetate and D-threonine, and Burkholderiacea by high (50 mM) concentrations of chloroacetate. Microbial communities cultured with chloroacetate and D-threonine were more similar to sampled field communities than those cultured with glucose or [C2mim]Cl. Although observed relative richness in operational taxonomic units (OTUs) was lower for laboratory cultures than for field communities, both flask and reactor systems supported phylogenetically diverse communities. These results importantly provide a foundation for laboratory models of industrial bioreactor failure scenarios.


Introduction
In recent years, there have been increasing numbers of genetically engineered microorganisms (GEMs) constructed for biotechnology, medicine, bioenergy, bio-based chemicals and materials, and bioremediation applications [1][2][3][4]. While containment strategies including engineered auxotrophy, induced lethality, and gene-flow barriers have been developed to mitigate the environmental risks posed by GEMs, naturally occurring processes may negate these genetic safeguards [5]. Currently, we cannot confidently predict how well a given GEM will survive and adapt if released into a given environment, nor how it will impact ecological biodiversity or interact with indigenous microbes (e.g., horizontal gene transfer and subsequent phenotype alteration [6,7]).
A potential scenario to consider is an industrial bioreactor failure that results in the exposure of GEMs to the microbial communities indigenous to a downstream municipal wastewater treatment system. Such a wastewater treatment system typically performs primary and secondary biological treatment (activated sludge) of municipal wastewater before release, and possibly tertiary treatment (microfiltration) for applications including landscape and agricultural irrigation, industrial processes, and wetlands restoration. Domestic sewage and other wastewater received by the treatment plant first undergo primary sedimentation. Effluent from the primary sedimentation basins flows to aeration basins where divers microorganisms (originating from the received wastewater) consume the soluble organic matter. Effluent from the aeration basins flows to clarifiers, which sediment the microorganisms from the water. The clarifier effluent is either discharged or further processed for re-use. The clarifier sediment, or ''activated sludge'', is recycled back to the aeration basins to maintain a high cell density of microorganisms and removal rates of soluble organic matter, while the net growth of microorganisms is sent to anaerobic digesters, which also receive primary sedimentation sludge. The material of the aeration basin is referred to as ''mixed liquor'', reflecting the mixture of the recycled activated sludge and the inflow to the aeration basin. The aeration basin contains a complex and high-density microbial community, which could serve as a hot spot for horizontal gene transfer.
While a given laboratory GEM itself may be unlikely to survive in an aeration basin, the portion, if any, of the GEM's genetic material that confers advantage to others in the microbial community (post horizontal gene transfer) may have a greater chance for survival. Our long-term primary questions of interest (only partially resolved by the work reported herein), then, are which selectable capabilities found within GEM genetic material might provide sufficient advantage to be effectively transferred to other members of the community, how the composition of the community would be affected by gene transfer, and how is this transfer limited by environmental constraints.
Efforts to characterize microbial structure and gene expression within a Hong Kong MWWTP activated sludge community [8], to investigate metagenomic determinants of reduced antimicrobial drug susceptibility within MWWTP bacteria [9,10], and to propagate MWWTP microbial communities in replicate membrane bioreactors [11] in the presence of the model toxin 3chloroaniline [12], have all been recently reported. In order to further advance our understanding of potential gene transfer from GEMs to a diverse microbial community representative of a municipal wastewater treatment plant (MWWTP), it is important to establish suitable selection conditions (media and genetic markers) for the gene transfer, and to characterize the baseline microbial communities resulting from each condition. To this end, we inventoried and compared aeration-basin bacterial communities and laboratory-grown communities, which were cultured with various carbon sources using preserved and maintained mixedliquor communities as inoculum. For the laboratory model system, we selected carbon sources as selective media components that promote the activity of carbon catabolic genes. These selective markers, which could be overexpressed in GEMs in future experiments were hypothesized to be rare in MWWTP microbial communities. For example, we selected the ionic liquid 1-ethyl-3methylimidazolium chloride ([C2mim]Cl), a microbial growth inhibitor and deconstruction agent for lignocellulosic feedstocks for biofuels-producing GEMs [13], as a selective medium that promotes the activity of rare efflux-pump tolerance genes. An efflux pump for [C2mim]Cl was recently identified in Enterobacter lignolyticus SCF1 and cross-validated in E. coli [14], which is evidence that horizontal gene transfer could confer the [C2mim]Cl tolerance phenotype. We also screened 192 carbon sources employing the phenotype microarray, OmniLog PM MicroPlate (BioLog, Hayward, CA) and identified carbon sources, including D-threonine, that neither MWWTP microbial communities nor E. coli (a potential parental lineage for GEMs) appeared to metabolize well, showing very poor growth compared to other carbon sources (data not shown). Note that the selected carbon sources, including D-threonine, chloroacetate and [C2mim]Cl, were not chosen because they are likely be found in (or especially relevant to) MWWTPs, but rather because they could potentially serve well as selective media components for future laboratory experiments. We hypothesize that the selected carbon sources will prove more effective than antimicrobial drugs as selective agents for future laboratory experiments since resistance genes for various antimicrobial drugs already appear to be prevalent in activated sludge microbial communities [9,10].
In this study, we have used PhyloChip G3, a third generation high-density phylogenetic DNA microarray that has been successfully employed in microbial ecology [15][16][17][18] to investigate microbial community structure.

Overall experimental design
An overview of the experiments conducted herein is schematically depicted in Fig. 1.

Aeration basin samples
Samples of mixed liquor from the secondary treatment unit of a local San Francisco Bay Area wastewater treatment plant with typical secondary processes that discharges into San Francisco Bay were obtained from the channel that flows from the aeration basins to the clarifiers. (John R. Cloak, who is authorized to give all required permission and should be contacted for future permission, facilitated sample acquisition on private land and provided plant operation data. Sample acquisition did not involve endangered or protected species.) On three sampling dates (July 20, 2011, October 19, 2011, andApril 25, 2012), four sterile 1-L bottles were each filled with 500 mL of sample and transported at ambient temperature to the laboratory for cryo-preservation as inoculum for subsequent culturing and PhyloChip characterization (described below). Inoculum was preserved by adding 1.5 mL of sterile 45% (w/v) glycerol solution to 3 mL of sample, and then storing it at 280uC. Larger 75 mL aliquots (50 mL sample with 25 mL 45% (w/v) glycerol solution for bioreactor inoculation) were also preserved from the April 25, 2012 samples. Glycerol stock aliquots were prepared under constant stirring to mitigate sludge microbe flocculation and maintain homogenous samples.

Laboratory cultures
Batch cultures were grown at 26uC in baffled Erlenmeyer flasks (50 mL culture in 250-mL flask shaken at 250 rpm on an orbital shaker) and bioreactors (500 mL custom-manufactured vessels used with controllers manufactured by Fairmentec GmbH, Göttingen, Germany, Bioreactor RK01-40). Each liter of the defined basal medium contained 0.87 g K 2 HPO 4 , 0.88 g KH 2 PO 4 , 5.5 mg CaCl 2 , 0.11 g NH 4 3 (nitrilotriacetic acid trisodium salt). The bioreactors were aerated with sterile filtered (0.2 mm) air through glass frits, pH controlled to 6.8 with 4% HCl and 20 g/L Na 2 CO 3 and agitated by means of a magnetic stir bar at 400 rpm. Dissolved oxygen monitoring of the bioreactors indicated that oxygen levels remained at saturation throughout growth. The reactors and flasks were inoculated to an initial OD 600 nm of 0.05, using the preserved April 25, 2012 samples, which were washed three times with the basal medium to remove residual glycerol before inoculation.
Carbon sources were added to the basal medium to provide an equivalent carbon concentration of 0.26 g C/L, to approximate the range of carbonaceous biochemical oxygen demand values of the primary sedimentation effluent at the wastewater treatment plant. Glucose was selected as a carbon source that could potentially be metabolized by a broad range of microorganisms. D-threonine (5.5 mM), chloroacetate (11 mM), as well as the ionic liquid [C2mim]Cl (270 mM) were used as candidates for selective carbon sources. Chloroacetate was also tested at higher concentrations (23 mM and 50 mM) that would potentially be growth inhibitory to the aeration-basin microorganisms. As controls, cultures were grown in basal medium without any additional carbon source (no-carbon). Negative controls in glucose-amended basal medium were also performed in the flasks.
Growth of the batch cultures was monitored by taking samples for OD 600 nm and substrate concentration measurement. In the reactors, OD 600 nm was continuously monitored by pumping a recycle line through a cuvette in a small spectrophotometer (USB400, Ocean Optics, FL). Samples for PhyloChip were taken once cultures reached stationary phase, or immediately following the peak OD 600 nm . All conditions were tested in triplicate, with the following exceptions: one negative-control was run per batch of flask tests, and two no-carbon reactor tests were performed.

DNA extraction and PCR for microarray analyses
Microbial community genomic DNA was extracted from three biological replicates per sample whenever available. For each aeration-basin sample, a 4.5 mL stock vial containing 3 mL sample and 1.5 mL of sterile 45% (w/v) glycerol was considered a biological replicate and gDNA was extracted from three stock vials per sample-date. For laboratory-cultured samples, three flask-, or three reactor-cultures per each treatment were used for gDNA extraction. When the microbial cell mass was very low ([C2mim]Cl, and no-carbon flask control cultures), the three biological replicates were pooled to obtain sufficient amount of gDNA.
Biomass for gDNA extraction was concentrated either by centrifugation for the aeration-basin samples, or by filtration through Sterivex GP filter units with 0.22 mm pore size (Millipore; Billerica, MA) for the laboratory cultures. Sample was pushed through the filter with a syringe until no more filtrate would pass. The amount of samples filtered ranged from 5 to 60 mL, depending upon the cell density of the sample. The filters were aseptically removed from the filter units and placed into separate sterile 1.5 mL microcentrifuge tubes and stored overnight at 4uC in RNAlater solution (Ambion; Austin, TX). On the following day, the RNAlater solution was removed and the filters were stored at 280uC until further processing.
A DNA-EZ Kit (GeneRite; North Brunswick, NJ) was used for gDNA extraction following the manufacturer's instruction enhanced by bead beating. Briefly, filters were cut aseptically and each filter piece, or one third of the pellet, was placed in Lysing Matrix E tubes (MP Biomedicals; Solon, OH) with 500 mL of the kit's lysing buffer and treated in a FastPrep bead beater (Savant; Carlsbad, CA) at 5.0 m/s for 45 s twice. The lysed solution was applied to a DNA binding column. After several washes, gDNA was eluted with 100 mL of elution buffer. The eluted sub-samples were pooled and the gDNA quantified using a Qubit dsDNA HS Assay Kit (Invitrogen, Grand Island, NY).

Microarray processing and data treatment
PhyloChip assay was conducted as previously reported [19] with the following modifications: (1) TDT enzyme (P/N: PM1875; Promega; Madison, WI) and GeneChip Labeling Reagent (P/N: 90042; Affymetrix; Santa Clara, CA) were used for the labeling reaction, and (2) 2.2 mL of 3 nM Control Oligo B2 (Affymetrix) was used for the hybridization reaction. In brief, 500 ng of purified PCR product was fragmented to 50-200 bp, labeled with biotin, hybridized onto the microarray, stained and washed according to the manufacturers instructions. Fluorescent images were scanned with Affymetrix GeneChip Scanner 3000 7G. Probe intensities were background-subtracted and scaled to the quantitative standards. The hybridization score for an OTU was calculated as the mean intensity of the perfectly matching probes excluding the maximum and minimum intensity values. Technical replication of the PhyloChip was not analyzed in this study, since high reproducibility of the PhyloChip has been reported [19].
On the PhyloChip, the average number of probe pairs assigned to each OTU was 37 (s.d. 9.6) [19]. Probe pairs comprised one perfectly matching (PM) 25-mer probe and one mismatching (MM) 25-mer probe, containing a substitution at the central base. The process for scoring an OTU as present/absent occurs in two stages of analysis. Stage 1 of the analysis had the following requirements. (1) There was a minimum of seven probe pairs scored. (2) At least seven probe pairs that were designed for a given OTU were scored positive. Probe pairs were positive if the PM probes had higher intensities than the corresponding MM probes.
(3) Ranked response scores (r scores) of probe pairs in Quartile_1 were $0.8, Quartile_2$0.93, and Quartile_3$0.98. The r score was calculated based on the A and T count in the probe sequences, and it measures the potential that a probe pair was responding to a target and not the background (see the reference [19]). (4) The OTU probe set had a positive fraction $0.92 (positive fraction is the fraction of probe pairs that scored positive in a given OTU). An OTU passing all these criteria passed for consideration at Stage 2. For Stage 2 of the analysis, the crosshybridization potential of the remaining OTU was determined. Cross-hybridization adjusted response scores (rx) were calculated in order to account for potential false positive OTUs, which can occur when a sequence in a sample matches some probes used in multiple, closely related OTU (for the equations, see the reference [19]). The cutoffs for adjusted rx scores for calling species present was: Q_1$0.22, Q_2$0.40, and Q_3$0.42. The presence/ absence (binary) data of species passing these criteria, the OTU binary data (for OTU belonging to the species called present as long as they passed Stage 1 criteria), and the fluorescence intensities for those OTUs were used for analyzing community structures. An OTU was considered present if it was called present in at least two of the three sample replicates. Taxa at higher levels (phylum, class, order, family, and genus) were considered present if at least one species was called present in a given lineage.

Community comparison using hierarchical clustering, MRPP and indicator analyses of the relative abundance of OTUs
To investigate similarities between the samples, hierarchical clustering analysis was performed based on the relative abundance of OTUs using PC-ORD (v.6.12; MjM Software, Gleneden Beach, OR). In the analysis, distance was measured by Bray-Curtis and group average was used as a group linkage method. All relative intensity data across the entire sample set was included in the analysis as long as an OTU was called as present in at least one sample. Raw microarray intensity data were imported into PC-ORD, standardized by general relativization by columns (samples), and subsequently transformed by arcsine square root to convert fluorescence intensity data with an arbitrary unit to the numbers that are closer approximations of amplicon amounts used for microarray hybridization.
Multi Response Permutation Procedure (MRPP) in PC-ORD was used to examine the differences among particular a priori groups, which were resolved at a particular level of cluster by hierarchical clustering analysis employing Bray-Curtis distance measurement. MRPP is non-parametric procedure, which does not require distributional assumptions, so it is suitable for analyzing ecological community data. MRPP tests the null hypothesis of no difference between within-group distances and across-group distances [20][21][22]. It calculates d, which is a weighted pairwise distance within each group and determines the probability (p) value of d being equal, or smaller. It also generates the A statistic value, which indicates chance corrected within-group . If all items are identical within groups (d = 0), then A = 1, if heterogeneity within groups equals expectation by chance, then A = 0, and if heterogeneity within groups higher than expectation by chance, then A,0. As A value approaches to 1, pre-defined groups are more similar each other.
To determine which OTUs were responsible for the differences among the seasonal field samples, or among the laboratory-grown communities cultured with various carbon sources, indicator species analysis was conducted following the method by Dufrene and Legendre [23] using PC-ORD. The analysis calculates indicator values of each species (or, OTU) for a particular group by considering both the abundance of the species in a specific group and the faithfulness of occurrence of the species in members of the group. The equation for the indicator value of species i in group j is as follows: INDVAL ij = A ij 6B ij 6100. A ij is a measure of specificity and is calculated by dividing the mean number of species i across the members of group j by the sum of the mean numbers of species i over all groups. B ij is a measure of fidelity and is obtained by dividing the number of members in group j, where species i is present, by the total number of members in that group. B ij is maximized when species i is present in all members of group j. When we compared the MWWTP seasonal field communities in this study, ''group'' corresponds to community (for example, the April 25, 2012 sample) and ''members'' corresponds to the three replicates of that community. The statistical significance of indicator values was tested by Monte Carlo randomization with 1,000 permutations. The indicator analysis was performed at a particular cluster level resolved by hierarchical clustering analysis. The relative abundances of indicator OTUs were visualized by R (version. 2.15.1; Institute for Statistics and Mathematics, Wirtschaftsuniversitä t Wien [http://www.r-project.org]) [24]. For the R heat-plot, microarray intensities of individual OTUs were centered and scaled by subtracting the mean of the row (OTUs) across all samples and then dividing the resulting values by the standard deviation of the row. The Z scores in heat-plots generated by R represent the distances between the scaled microarray intensities of individual OTUs and the mean of the scaled OTUs. Note that the heat-maps generated by this method are applicable only for comparing OTUs across samples, but not for comparing among OTUs within samples. Average linkage methods were used for hierarchical clustering by R.
Examination of phylogenetic dissimilarities among the field-sampled and laboratory-grown communities using Principal Coordinate Analysis (PCoA) A representative 16S rDNA sequence for each OTU was obtained from the GreenGenes database and aligned by the NAST aligner (Lawrence Berkeley National Laboratory [http:// greengenes.lbl.gov/cgi-bin/nph-export_records.cgi]) [25,26]. The filtering criteria for the sequences were: a minimum base number of 1,250 and a maximum non-ACGT character count in the sequences of 50. The alignment comprised the sequences of 7,799 OTUs, which were detected in at least one sample. Using FastTree (v.2.1.7; Lawrence Berkeley National Laboratory [http://www.microbesonline.org/fasttree]) [27], a phylogenetic tree was constructed from the multiple sequence alignment employing the generalized time reversible (GTR) models of nucleotide evolution for distance measurements. FastTree infers approximately-maximum-likelihood phylogenetic trees. FastTree uses a heuristic variant of neighbor joining to get a rough tree topology first, corrects branch lengths using a mix of nearestneighbor interchanges and subtree-prune-regraft moves, and then improves topology and branch lengths by maximum likelihood rearrangements. Principal Coordinate Analysis (PCoA) was performed using Fast UniFrac (Department of Chemistry and Biochemistry, University of Colorado at Boulder [http://unifrac. colorado.edu]) [28] and the PCoA result was plotted using SigmaPlot (v.10; Systat Software Inc., San Jose, CA). A UniFrac distance matrix was calculated from the tree considering within sample normalized abundance weights. In previous PhyloChip validation experiments, the concentrations of PCR amplicons were shown to correlate with the log2 values of the fluorescence intensities with the correlation coefficient (r) of 0.917 [29]; therefore, we used the log2 values of the fluorescence intensity scores as the abundance weight.
Construction of a phylogenetic tree and visualization of the microbial composition of the April 25, 2012 field-sampled community and the laboratory-cultured communities A single OTU per family was chosen from the GreenGenes database to represent each family in a phylogenetic tree (Table  S1). Non-chimeric 16S rDNA sequences of 120 OTUs were obtained from the Hugenholtz 7,682 character data set at GreenGenes and aligned by NAST aligner of GreenGenes as described above. An approximate-maximum-likelihood phylogenetic tree was constructed from the multiple sequence alignment using FastTree as described above and the tree was visualized by iTOL (Interactive Tree Of Life) [30]. An archaeal sequence (DQ300318.1) was included in the tree for rooting.

Microbial communities detected in the aeration-basin samples
A species was considered present if it was called present in at least two of the three sample replicates. A phylum was considered present if at least one species was called present in a given phylum (see the Materials and methods for details). The July 20, 2011 sample was the most diverse community consisting of 24 phyla, 234 species, and 2,200 OTUs (Fig. 2). The October 19, 2011 sample contained 17 phyla, 192 species, 1,704 OTUs, and the April 25, 2012 sample had 16 phyla, 182 species and 1,886 OTUs. The numbers of the OTUs at phylum level (or, class level for the major phyla) detected in the seasonal samples are listed in the  Table S2. The majority (96%-98%) of OTUs in each sample belong to four phyla: Proteobacteria, Bacteroidetes, Firmicutes, and Actinobacteria. While probes that target these four phyla are overrepresented on the PhyloChip G3, other phyla similarly overrepresented on the PhyloChip G3 were not detected at comparable levels in the aeration-basin samples (Table S2).
The common Proteobacteria consisted of the classes b-Proteobacteria (61%), c-Proteobacteria (28%), a-Proteobacteria (10%), and d-Proteobacteria (0.7%). In the Bacteroidetes phylum, the major classes were Flavobacteria, Sphingobacteria, and Bacteroidia. In the Firmicutes phylum, 67% was of the class Clostridia, and the rest was Bacilli. All of the common Actinobacteria OTUs belong to the class Actinobacteria.

OTU variations by aeration-basin seasonal sampling dates
The bacterial communities collected at various sampling dates were distinguished from each other at hierarchical cluster level 3 based on the relative abundance of OTUs (Fig. S1). The differences among the bacterial communities by sampling dates based on the relative abundance of OTUs were moderate (MRPP: A = 0.47, p = 0.0004).
Indicator analysis was used to identify OTUs whose relative abundances significantly differed between sampling dates. The  community belong to the classes b-Proteobacteria, c-Proteobacteria, Bacilli, Sphingobacteria, a-Proteobacteria, Clostridia, and e-Proteobacteria.
Taken together, the above results importantly establish a baseline for the seasonal variation of aeration-basin community composition.

Laboratory culturing from aeration-basin inoculum
Flask-culture growth curves are presented in Figs. S3A and B. Growth with glucose had approximately a 10 h lag phase, and a peak OD 600 nm value of 0.5 was reached 21 h after inoculation. Growth with D-threonine and chloroacetate commenced following a 45 to 75-h lag phase. The standard deviations for all the replicates are small, except for the 50 mM chloroacetate condition, which attained the highest OD 600 nm (0.8) and contained 4.5 times more initial carbon than all other carbon sources. For both the [C2mim]Cl and no-carbon conditions, the OD 600 nm values did not change significantly, suggesting that [C2mim]Cl was tolerated but likely not metabolized. These results are important because they demonstrate that in addition to glucose, D-threonine and chloroacetate (up to 50 mM) can be effectively utilized as carbon sources by some of the aeration-basin microbes, although D-threonine and chloroacetate were associated with significantly longer lag phases than that observed for glucose. These results also show that no significant growth was observed in the [C2mim]Cl flask cultures despite monitoring three replicates through 140 h post inoculation.
Growth curves for the reactor-cultures are shown in Fig. S3C for glucose, D-threonine, and 11 and 23 mM chloroacetate. Despite irregularities in the curves due to air-bubble entrainment in the recycle line, and on some occasions due to adhesion of biomass, trends are consistent among replicates and these curves show more detail than the flask growth curves derived from intermittent sampling. For glucose, a 10 h lag period was followed by growth, which peaked at 20 h, similar to the flask-cultures. In the case of D-threonine, OD 600 nm values declined over the first 20 h, then increased beginning at 40 to 50 h, peaking at around 80 h, followed by a sharp decline. The flask culture did not show the sharp decline, although the peak occurred at around 70 h. 11 mM chloroacetate growth had two plateaus in succession, however the timing between the replicates varied; PhyloChip samples were taken after the second plateau. The replicates of the 23 mM chloroacetate runs were much more consistent than for 11 mM chloroacetate, each showing a 30 h lag, followed by growth to a plateau at 40 h. The flask-culture with 23 mM chloroacetate had much longer lag and growth phases compared to the reactor-cultures. In the no-carbon reactors (data not shown), OD 600 nm values dropped by 50% from their initial values in first 20 to 50 h, and remained constant thereafter.

Composition of the laboratory-cultured bacterial communities
The number of OTUs in each laboratory culture was reduced compared to its inoculum (the preserved and maintained April 25, 2012 aeration-basin sample; Fig. 3A, Table S2). Interestingly, large numbers of OTUs were detected in the [C2mim]Cl, and nocarbon cultures despite their very low cell mass. This is possibly a consequence of the less stringent positive OTU criteria used for these low cell mass bacterial communities (see Materials and methods). Briefly, for the low cell mass cultures, three replicates were pooled together and analyzed using a single PhyloChip, with any detected OTUs considered positives. In contrast, for the higher cell mass cultures, the three replicates were separately analyzed and positive OTUs were more stringently constrained to those detected in at least two replicates (Fig. 3A). The smallest number of OTUs (458) was detected in the 50 mM chloroacetatecultured community, for which the largest OD 600 nm was measured.
The most abundant phyla in the laboratory-cultured communities were Proteobacteria, Bacteroidetes, and Actinobacteria. The c-Proteobacteria was greatly enriched in almost all laboratorycultured communities (Fig. 3B). While the c-Proteobacteria made up only 15% of the total OTUs in the April 25, 2012 field community, it represented 45% of the total OTUs in the 11 mM chloroacetate, 59% in the 50 mM chloroacetate, 59% in the glucose, 46% in the D-threonine, and 53% in the [C2mim]Cl flask-cultured communities. In contrast, b-Proteobacteria was reduced in most laboratory-cultured communities. The Clostridia and Bacilli of the Firmicutes phylum were significantly reduced in all laboratory-cultured communities. The phylum Bacteroidetes, especially the Flavobacteria class, was greatly reduced in the communities cultured with chloroacetate at high concentrations, or with [C2mim]Cl. The Actinobacteria class was detected in all laboratory-cultured communities.
The result shown in Fig. 3 is important because it demonstrates that laboratory flasks and reactors-cultured diverse microbial communities (although the observed OTU richness was lower than that for the sampled aeration-basin community) and specific classes were differentially affected by both the culturing method and selective media components.

Similarities among the aeration-basin field communities and the laboratory-cultured communities
Microbial composition similarities among the communities were inferred by PCoA, which employed UniFrac distance measurement (Fig. 4). The aeration-basin communities were differentiated from the laboratory-cultured communities along the P1 axis, which explained the majority of total variation (60.8%). The P2 (21.0% of total variation) and P3 (6.8% of total variation; Fig. S4) axes further differentiated the laboratory-cultured communities.
Hierarchical cluster analysis using Bray-Curtis distance matrix and group average linkage method supported the PCoA analysis result (Fig. 5). The April 25, 2012 field communities and nocarbon-cultured communities were grouped together (cluster I) and were distinct from the laboratory-cultured communities. Cluster II, which consisted of the 11 mM and 23 mM chloroacetate and the D-threonine-cultured communities, was more closely located to cluster I containing the April 25, 2012 communities compared to cluster III, which contained the glucose,  cultured communities are shown together in Fig. 7. The majority of enriched microbes were c-Proteobacteria.
Ninety-three percent of the top 100 enriched OTUs in the glucose and [C2mim]Cl -cultured communities belong to Enterobacteriaceae (the genus Cronobacter, Erwinia, Escherichia, Klebsiella, Pantoea, Raoultella, Salmonella, Serratia, Sodalis, and unclassified) and Aeromonadaceae families (cluster 5 and cluster 4, Fig. 7). Interestingly, these Enterobacteriacea OTUs appeared to be enriched in the no-carbon added flask culture as well, but not in the no-carbon added reactor cultures. In contrast, the Aeromonadaceae OTUs were not enriched in any of no-carbon added cultures.
The Pseudomonadaceae family was also enriched in the 23 mM chloroacetate reactor culture. However, no indicator OTU belonged to the Alcaligenaceae family, unlike all other chloroacetate-cultured communities.
The high concentration (50 mM) of chloroacetate enriched various b-Proteobacteria (the families Burkholderiaceae, Alcaligenaceae, Oxalobacteraceae, and Methylibiaceae), as well as Pseudomonadaceae. The enrichment of Burkholderiaceae was characteristic for this community. The Burkholderiaceae OTU indicators consisted genus Burkholderia (18 OTUs) and Ralstonia (2 OTUs). Some of the Burkholderia OTUs in one of the replicate cultures were extremely enriched showing approximately 10-60 times stronger microarray intensities (cluster 1-a, Fig. 7).
The results shown in Figs. 7 and S5 demonstrate that glucose, D-threonine, chloroacetate, and [C2mim]Cl differentially affect community composition, a key characteristic of selective media. Furthermore, these results show specifically which subsets of the community are respectively enriched in the presence of glucose, Dthreonine, chloroacetate, or [C2mim]Cl, establishing baselines for aeration-basin community composition changes following propagation in laboratory flask or reactor culture.

Discussion
We report here our foundational efforts towards the development of a laboratory model of an industrial bioreactor failure scenario that introduces genetically engineered microorganisms (GEMs) to a downstream municipal wastewater treatment plant (MWWTP). Our first step was to sample microbial communities from the MWWTP's aeration basin (a potential hot spot for horizontal gene transfer) at three seasonal time points to establish a baseline for a MWWTP community composition (Fig. 2). Since microbial community composition in activated sludge treatment systems is dynamic and can be affected by various environmental factors including temperature and sources of wastewater influx, it is not surprising that we observed moderate variation in composition between our July 20, 2011, October 19, 2011, and April 25, 2012 aeration-basin samples. Although the number of OTUs detected in common across all samples was relatively low, the microbial composition at the higher taxa level was very similar (Fig. 2). It is noteworthy, though, that the OTUs of the PhyloChip G3 are at the very fine taxonomic level (sub-species) and have very low sequence divergence among them (an average of 0.5%) [19]. The major phyla of the aeration-basin community that we found (Proteobacteria, Bacteroidetes, Firmicutes, and Actinobacteria) are similar to those found in a previous study of the microbial community of a Hong Kong wastewater plant that employed DNA and cDNA sequence analysis [8]. In contrast to this previous study that examined only one seasonal community at the phylum level (except selectively examining the microbes involved in nitrification at the genus level), our work investigated microbial communities at various taxonomic levels in depth and also compared the communities between seasons. Our observations place weight on the importance of pursuing multi-seasonal assessment of GEM survival and gene transfer rates within MWWTP microbial communities in future research efforts. At the same time, the moderate extent of seasonal variation in microbial community composition provides a degree of confidence that an assessment made at a particular seasonal time point will be representative of other time points.
Having established a baseline for seasonal variation in aerationbasin community composition, we then investigated how the community changed when propagated in the laboratory.
Laboratory flask and reactor systems cultured fairly similar yet distinct microbial communities (Figs. 3-5, 7, S3-5). While a proper bench-scale simulation of an activated sludge treatment system requires the use of a bioreactor, the observed similarity supports the utility of screening carbon sources and growth conditions in flasks before progressing to reactors. While laboratory-cultured communities generally contained fewer taxa, the major phyla and classes present in the aeration-basin communities were also detected in the laboratory-cultured communities, which retained reasonably high phylogenetic diversity. Some OTUs, especially belonging to Firmicutes that was one of the major phyla in the MWWTP communities, were much less abundant in laboratory cultures than in aeration-basin communities. We speculate that differences in oxygen concentrations and distributions (and the dynamics thereof) between small-scale laboratory flask and reactor cultures and the large-scale wastewater treatment plant could explain some of the observed composition differences, such as the decreased abundance of anaerobic Clostridia in laboratory cultures (Fig. 3B). While the mixed liquor is intensively aerated and mixed in the wastewater treatment plant aeration basin, the passage of the mixed liquor through a nonaerated clarifier could transiently support the growth of Clostridia. There is no equivalent passage through a non-aerated compartment in the laboratory culture system, which is constantly aerated to saturation. In contrast, some OTUs were more abundant (or, detected only) in laboratory-cultured communities. The various laboratory culture conditions may have enriched for microbes otherwise in low-abundance in the aeration-basin.
Beyond assessing the extent to which our laboratory growth conditions simulated those of the aeration-basin, we tested the effectiveness of chloroacetate and D-threonine carbon sources and the microbial growth inhibitor [C2mim]Cl as selective media component candidates for future GEM survival and gene transfer experiments. We anticipated that these media amendments would promote the activity of carbon catabolic or efflux-pump tolerance genes, rare in MWWTP microbial communities, which could serve as selective markers in these future studies. Such a future study would introduce a GEM (containing a carbon catabolic or efflux-pump tolerance selectable marker) to our aeration-basin laboratory model under the appropriate selective growth condition, and then monitor the survival of the GEM and the horizontal transfer and propagation of the portion of its DNA encoding the marker.
Although the 50 mM chloroacetate flask cultures attained the highest OD 600 nm (Fig. S3), they contained the smallest number of OTUs (485) of all the conditions. This suggests that the diversity of microbes that can survive in high concentration of chloroacetate is relatively low. The genus Burkholderia, in particular, was enriched in the 50 mM chloroacetate flask cultures (Figs. 7, S5). The haloacetate dehalogenase (EC.3.8.1.3) from a Burkholderia species had been characterized [32] to catabolize chloroacetate to glycolate. Glycolate, in turn, can be oxidized to glyoxylate by glycolate oxidase and used as a carbon and energy source. The FastTree was used to generate an approximate-maximum-likelihood phylogenetic tree, which was rooted with archaea. The leaves of the tree list taxa at the family level, or for unclassified (unc.) families the lowest ancestor. Families belonging to the same phylum (or, class level for the phylum Proteobacteria) are shaded in the same color. Band intensity graphs outside each leaf (family) indicate the number of species in the family detected in each sample. Pie charts at the nodes of major phyla (or, classes) show the number of species in each sample. Pie chart area correlates with the total number of the species of the given phylum, or class. The pie and band intensity graphs do not share the same scale (the minimum number of species in each band intensity graph is 1, and the maximum is 15; in each pie chart, the minimum number of species is 1, and the maximum is 38  [33] indicates that these three species have genes encoding haloacid dehalogenase (EC.3.8.1.2). To this end, the unclassified Burkholderia species detected in this study may be similarly able to metabolize chloroacetate. The Pseudomonas genus was enriched by chloroacetate at all concentrations tested (11, 23, and 50 mM; Fig. 7). The haloacid dehalogenase of Pseudomonas sp. YL and P. putida had been characterized, and an E. coli overexpressing haloacid dehalogenase can metabolize chloroacetate as a sole energy source [34][35][36][37][38][39]. While Pseudomonas sp. YL and P. putida were not detected in this study, the unclassified Pseudomonas species found in this work may have a similar ability to catabolize chloroacetate. The Alcaligenaceae family (Achromobacter genus, unclassified species) was also enriched by chloroacetate (Fig. 7). Although not detected in this work, A. xylosoxidans, a species belonging to Achromobacter, has a gene encoding haloacetate dehalogenase. Taken together, these three observations support chloroactetate as a selective media component candidate, and haloacid dehalogenase as a selective marker, for our future GEM survival and gene transfer experiments: 1) Burkholderia, Pseudomonas, and Achromobacter were all significantly enriched in the chloroacetate-amended laboratory cultures; 2) in these same genera, haloacid dehalogenases have previously been identified, heterologously expressed and characterized; and 3) haloacid dehalogenase can not only detoxify chloroactetate to glycolate, but also enable the metabolism of chloroacetate as a sole energy source.
Beyond chloroacetate, many of the Pseudomonas and some of the Alcaligenaceae were also enriched by D-threonine. Dthreonine aldolase catabolizes D-threonine into glycine and acetaldehyde, and acetaldehyde feeds into the TCA cycle. The D-threonine aldolase of Pseudomonas has not been studied in depth, but the enzyme was characterized in Xanthomonadaceae (Xanthomonas oryzae) and Alcaligenaceae (Achromobacter xylosoxidans), which are closely related to the OTUs detected in the Dthreonine cultures [40,41]. These observations support D-threonine as an additional selective media component candidate, and D-threonine aldolase as an additional selective marker, for our future experiments.
Few microbes have been shown to grow in the presence of the ionic liquid [C2mim]Cl [14]. Although our OD 600 nm measurements did not detect any net growth in the presence of [C2mim]Cl (Fig. S3B)  live/dead staining could be performed. Given the relatively high abundance in the [C2mim]Cl sample of the Xanthomonadaceae, Rhizobiaceae, Micrococcaceae, Enterobacteriaceae, Aeromonadaceae, and Alcaligenaceae families (Figs. S5, 7), it appears that microbes belonging to these families may have persisted or propagated (under the OD 600 detection limit) better than others in the presence [C2mim]Cl. Since an efflux pump for [C2mim]Cl tolerance was identified in Enterobacter lignolyticus SCF1 and cross-validated in E. coli [14], [C2mim]Cl may also prove to be a valuable selective media component candidate in our future experiments.
Here, we demonstrated the capacity to culture a microbial community using preserved inocula from a municipal wastewater treatment plant, and maintain adequate diversity and repeatability to test GEM survival and gene transfer rates in laboratory-scale model system. Our results show specifically which subsets of the community are enriched in the presence of glucose, D-threonine, chloroacetate, or [C2mim]Cl, establishing baselines for aerationbasin community composition changes following propagation in laboratory flask or reactor culture. The observation that microbial communities cultured with 11 or 23 mM chloroacetate and Dthreonine are more similar to the sampled aeration-basin communities than those cultured with glucose, [C2mim]Cl, or 50 mM chloroacetate (Fig. 5), may prove useful when choosing a particular selective medium over another. Our analysis supports haloacid dehalogenase, D-threonine aldolase, and the [C2mim]Cl efflux pump from Enterobacter lignolyticus SCF1 as candidate selective markers, and chloroactetate, D-threonine, and [C2mim]Cl as selective media components, respectively, for our future GEM survival and gene transfer experiments. This work establishes a methodological foundation for assessing and mitigating the risks of future large-scale microbial metabolic engineering projects, including those extending beyond the bioreactor. Figure S1 Hierarchical clustering of the seasonal aeration-basin samples based on the relative OTU abundances. The Bray-Curtis distance measurement and group average linkage methods were employed for the clustering.