Metagenomic Profiling of Microbial Composition and Antibiotic Resistance Determinants in Puget Sound

Human-health relevant impacts on marine ecosystems are increasing on both spatial and temporal scales. Traditional indicators for environmental health monitoring and microbial risk assessment have relied primarily on single species analyses and have provided only limited spatial and temporal information. More high-throughput, broad-scale approaches to evaluate these impacts are therefore needed to provide a platform for informing public health. This study uses shotgun metagenomics to survey the taxonomic composition and antibiotic resistance determinant content of surface water bacterial communities in the Puget Sound estuary. Metagenomic DNA was collected at six sites in Puget Sound in addition to one wastewater treatment plant (WWTP) that discharges into the Sound and pyrosequenced. A total of ∼550 Mbp (1.4 million reads) were obtained, 22 Mbp of which could be assembled into contigs. While the taxonomic and resistance determinant profiles across the open Sound samples were similar, unique signatures were identified when comparing these profiles across the open Sound, a nearshore marina and WWTP effluent. The open Sound was dominated by α-Proteobacteria (in particular Rhodobacterales sp.), γ-Proteobacteria and Bacteroidetes while the marina and effluent had increased abundances of Actinobacteria, β-Proteobacteria and Firmicutes. There was a significant increase in the antibiotic resistance gene signal from the open Sound to marina to WWTP effluent, suggestive of a potential link to human impacts. Mobile genetic elements associated with environmental and pathogenic bacteria were also differentially abundant across the samples. This study is the first comparative metagenomic survey of Puget Sound and provides baseline data for further assessments of community composition and antibiotic resistance determinants in the environment using next generation sequencing technologies. In addition, these genomic signals of potential human impact can be used to guide initial public health monitoring as well as more targeted and functionally-based investigations.


Introduction
Coastal ecosystems continue to be subjected to increasing human pressures. Over 40% of the global population currently lives within 100 km of coastlines, and this percentage continues to increase [1]. Human impacts have led to significant degradation in marine environments in the form of habitat loss, eutrophication, hypoxia, organic and inorganic pollution, invasive species, pathogen spread and ocean acidification, among others [2]. Based on a spatial analysis of 17 anthropogenic drivers, it has been estimated that over 40% of the world's oceans have medium to high impacts associated with human activities [3]. These impacts subsequently now pose risks to human health in the form of bacterial and viral pathogens, harmful algal blooms, contaminated seafood and decreased well-being [4,5,6,7]. There is thus considerable interest in investigating the distribution and magnitude of these impacts in marine environments, including nearshore, coastal and open ocean locations.
Assessing human impacts on such a global scale is a challenge, and current approaches to environmental monitoring are not well suited for large-scale spatial and temporal analyses. This is changing with advancements in the field of environmental genomics. Metagenomics, in tandem with next generation sequencing, now provides a technical means by which to monitor environmental microbial communities in a high-throughput manner. Metagenomics refers to the sequencing of DNA directly from environmental samples (i.e. metagenomes), and as such simultaneously provides access to the genetic information from mixed environmental microbial communities [8]. With this approach, taxonomic and functional microbial diversity can be profiled and community change subsequently monitored over space and time in response to environmental or anthropogenic impacts relevant to human health [9]. Microbial community composition has been shown to change in differentially impacted coastal environments [10,11,12] and other anthropogenically impacted environments [8,13,14,15] and thus provides a potential indicator of community stressors and stress response.
Despite the potential role aquatic ecosystems may play in harboring and disseminating antibiotic resistance, little is known regarding the distribution of antibiotic resistance determinants within marine microbial communities. Antibiotic resistance continues to be a serious public health concern, as the overuse and misuse of antibiotics has led to the selection of antibiotic resistance genes in bacterial populations and has thus compromised our ability to treat bacterial infections. Coastal environments are subject to a multitude of anthropogenic impacts such as sewage, hospital waste, agricultural runoff and aquaculture that increase the prevalence of antibiotic resistance determinants in microbial communities, and thus these ecosystems have the potential to be important sources and sinks for antibiotic resistant bacteria and genes and may contribute to the dissemination of resistance determinants across organisms [16,17]. In fact, many resistance genes found in pathogenic bacteria have evolved or are sourced from resistance genes found in environmental microbial communities [18]. Clinical resistance in many cases is therefore the result of horizontal transfer of resistance genes via mobile genetic elements from ecologically and taxonomically distant bacteria [19], and this gene transfer has been shown to occur across a wide range of bacterial species, including pathogens to non-pathogens and vice versa [20,21,22]. Furthermore, the marine environment in particular has been shown to have a high rate of horizontal gene transfer [23].
The majority of studies assessing the prevalence and distribution of antibiotic resistance genes in natural environments to this point have focused on specific known resistance genes using PCR-based techniques [24,25,26]. While these techniques provide resistance profiles for specific organisms or targeted genes, they are limited for ecosystem level applications because of these gene-by-gene and single species approaches. Thus there is limited knowledge regarding the global prevalence and distribution of antibiotic resistance genes and other resistance determinants including mobile genetic elements in natural microbial communities, and this data gap has been exacerbated by our inability to access the genomic information for unculturable bacteria, which represent .99% of bacteria in the environment [27]. Metagenomics now offers a culture-independent and high-throughput approach to investigate antibiotic resistance determinants in the environment at the microbial community-level, and a number of studies have already successfully employed this approach [28,29,30,31,32,33,34].
This study uses a metagenomic approach in combination with next generation sequencing to evaluate potential differences in community composition and antibiotic resistance gene signals across a natural ecosystem using the case study of Puget Sound, Washington. Puget Sound is a partially mixed fjord-like estuary that is connected to the Pacific Ocean via the Strait of Juan de Fuca and that receives the majority of its freshwater input from rivers emptying into the Whidbey Basin region (Figure 1) [35]. There are 96 publicly owned wastewater treatment plants (WWTP) in the Puget Sound Basin that serve over 3.5 million people from urban population centers such as Seattle, Tacoma, Olympia and Everett and process over 124 million gallons of sewage per day [36]. Antibiotics, heavy metals and other pollutants also enter the Sound through agricultural and urban runoff. At the same time, Puget Sound supports one of the largest shellfish industries in the country, with over 100 commercial growing areas covering 900 miles of shoreline [37]. The potential for these anthropogenic impacts to alter microbial community and resistance determinant composition in Puget Sound is uncertain, and baseline data is needed in order to spatially and temporally monitor these potentially health relevant human impacts. This is the first baseline survey of the taxonomic and resistance potential of microbial communities in Puget Sound. Using highthroughput and sequence-based comparative metagenomics, we identified common taxonomic and antibiotic resistance determinant signatures for the open Puget Sound locations. Comparison of these metagenomic profiles between the open Sound, a nearshore marina and effluent from a proximal WWTP revealed unique profiles across the different environments that follow a gradient of human impact. This investigation provides an initial framework by which to monitor the marine environment for genomic determinants relevant to public health.

Ethics Statement
All necessary permits were obtained for the described field studies. This statement applies to the collection of the wastewater treatment plant (WWTP) effluent sample. Permission for the WWTP sample was granted by the West Point Treatment Plant (King County, WA), specifically Betsy Cooper (NPDES Administrator, Wastewater Treatment Division, King County Department of Natural Resources and Parks) and Rick Hammond (Chief Process Analyst, West Point Treatment Plant). No specific permits or permissions were required for the water samples collected in Puget Sound. These cruise sample locations are routinely monitored by the University of Washington Puget Sound Regional Synthesis Model (PRISM) program, and are not privately owned or protected in any way. The marina sample was taken from a public boat ramp run by the City of Seattle. These field studies did not involve endangered or protected species.

Sample Collection
Surface water samples (,5 m depth) were collected aboard the R/V Thomas Thompson from October 29-30, 2010 at five stations in Puget Sound, WA ( Figure 1). These stations have been monitored since 1998 by the University of Washington Puget Sound Regional Synthesis Synthesis Model (PRISM) program. A nearshore surface water sample was also collected at Shilshole Bay Marina in Seattle, WA on December 20, 2010 and an effluent sample from the King County West Point Treatment Plant (WWTP) in Seattle, WA was collected on January 31, 2011. The Marina is located approximately 8 miles north of central Seattle and 2 miles north of the WWTP, and includes 1,400 moorage slips (300 slips for liveaboards), public water access and a public promenade. The West Point Treatment Plant has an average daily inflow of 133 million gallons, and the wastewater is sourced from stormwater/groundwater (53%), residential (29%), commercial (17%) and industrial (1%) processes [38]. At each station and sampling location, 80 liters of water were pumped through a peristaltic pump system (Cole-Palmer, U.S.A.) and filtered sequentially (i.e. size fractionated) through 3-mm (147 mm) polycarbonate membranes (Sterlitech, WA) and 0.2-mm (147 mm) Supor membranes (Pall Corporation, U.S.A). Filters were covered with sucrose lysis buffer (50 mMTrisNHCl, 40 mM EDTA, and 0.75 M sucrose) and stored at 280uC on board and then transferred to 280uC in the laboratory. For the cruise samples, physicochemical conditions (e.g. temperature, salinity, oxygen, fluorescence) were measured using a conductivity-temperature-depth (CTD) sensor array mounted on a Niskin bottle rosette (Table 1).

DNA Extraction and Sequencing
DNA was extracted from the 0.2-mm filters as these filters contain the bacterial fraction of the marine community. The filters were thawed and cut into eighths. Total metagenomic DNA was extracted using the Powerwater DNA Isolation Kit (Mo Bio Laboratories, CA) following the manufacturer's protocol with modifications. This kit is specifically designed for isolating DNA from filtered environmental water samples and includes inhibitor removal technology aimed at removing humic acid and other organic matter commonly found in environmental samples that can interfere with downstream analyses. The protocol modifications included an incubation at 65uC for 10 minutes following addition of Solution P1 and a lysozyme digestion (final concentration = 10 mg/ml) at 37uC for 1 hour and an RNase digestion at 37uC for 20 minutes immediately following the bead-beating step. DNA quantity and quality were estimated using the Quant-iTPicogreen dsDNA Assay Kit (Invitrogen, U.S.A.) and the Nanodrop-1000 Spectrophotometer (Nanodrop Technologies, DE). DNA concentrations ranged from 75 to 130 ng/ml per sample. Gel electrophoresis showed high molecular weight DNA fragments 3-12 kb in size. Approximately 3 mg of total DNA for each sample was used for a multiplexed pyrosequencing run on the Roche/454 GS FLX Titanium platform in the Department of Microbiology at the University of Washington.

Bioinformatic Analyses
The 454 sequence reads were initially trimmed to remove barcoded sequences and any secondary adapter sequences. Reads containing ,5% of any one nucleotide were also removed. For each sample, raw reads were assembled into contigs using Newbler v. 2.5.3 (Roche Diagnostics-454 Life Sciences) with default settings except for a more stringent 95% minimum overlap identity. For taxonomic annotation, the unassembled reads were run through the Meta Genome Rapid Annotation using Subsystems Technology (MG-RAST) pipeline [39]. This pipeline includes QC steps to eliminate low quality sequences. Reads meeting any of the following three criteria were omitted: read length .2 standard deviations from the mean sample read length, read containing .5 ambiguous bases or reads identical to another sequence over the first 50 bp. Unassembled reads were taxonomically annotated using the lowest common ancestor (LCA) algorithm and only sequence matches with $50% identity over $50 amino acids were retained. This algorithm assigns each read to the LCA within the set of matching taxa from the BLASTX search [40]. For example, if a given read had sequence similarity to 3 different families within the same order, the read is assigned at the order level. The LCA algorithm is thus predicted to have a lower rate of false positive assignments than the best hit BLAST approach but with a higher number of unspecific assignments or no hits. Genus level annotation was performed using 16S rRNA analysis. An Escherichia   coli 16S rRNA reference sequence was searched against the 454 reads using BLASTN (default settings) and matching 454 reads were run through the Ribosomal Database Project (RDP) Classifier [41] to classify the 16S rRNA sequences. Only those classifications with at least a 50% confidence estimate were included in the analysis. Additional environmental metagenomes used for taxonomic comparison to the Puget Sound dataset included: Sargasso Sea (GS001a, GS001b and GS001c), Chesapeake Bay (GS012) and the Gulf of Mexico (GS016) from the Global Ocean Sampling Expedition [42], Lake Lanier [43] and farm soil [44]. All the water samples were collected in 1-5 m depth.
Read recruitment to alpha proteobacterium HTCC2255 was performed using BLASTN (E-value ,10 25 ). The 2.23 Mbp genomic scaffold (GI number: 211594581) was downloaded from GenBank. The recruitment plot was generated using in-house custom visualization software.
To identify putative antibiotic resistance genes, an expanded version of the Antibiotic Resistance Gene Database (ARDB) [45] was generated that incorporates all sequences deposited in GenBank to date that are $80% identical to sequences already in the ARDB. The ARDB is a commonly used database consisting of over 23,000 resistance genes from nearly 1,700 species. A nonredundant version of the ARDB was first compiled and consisted of 3,185 sequences. An expanded dataset, termed the ARDB+, was generated by searching the sequences from this nonredundant list against GenBank using the 80% identity cutoff [45]. The ARDB+ increases the number of nonredundant ARDB sequences to a total of 11,500 sequences. A list of 103 antibiotic resistance gene sequences from metagenomic samples that were functionally verified to confer resistance [28,29,30,33] was also compiled and included in the ARDB+. MetaGeneMark [46] (default settings) was used to predict potential protein coding sequences for the unassembled Puget Sound and WWTP reads, and these predicted peptides were then BLASTP searched [47] against the ARDB+ (E-value ,10 25 ). Reads with $80% identity over an alignment length $50 amino acids to a resistance gene within the ARDB+ were annotated according to the best hit. Plasmid sequences were annotated by aligning the unassembled reads and contigs against the plasmid sequences available in the NCBI RefSeq database (1843 sequences) using BLASTN (E-value ,10 25 ). Only those sequences with a nucleotide sequence identity $95% over an alignment length of at least 100 bp were retained [32,48].
Two approaches were used to identify transposable elements in the dataset. First, 167 unique genes annotated as transposable elements were downloaded from GenBank. Unassembled reads were then compared against these sequences using BLAST, and reads with a sequence identity $80% [32] over an alignment length of at least 150 bp were annotated as transposable elements. Second, unassembled reads were compared to 41 protein motifs annotated as transposable elements within the Pfam 26.0 database [49] using an E-value cutoff of 10 210 . Pfam is a database of conserved protein families and domains across species, where the domains represent structural and functional motifs of the protein.
A significance threshold of E,10 210 represents a more stringent cutoff than the manually curated Pfam gathering threshold or trusted cutoff, and thus minimizes the number of false positive assignments. The abundance of transposable elements was normalized to total sequence reads per sample for the BLAST approach and to total motif counts per sample for the Pfam approach.

Statistical Analyses
Overrepresentation or underrepresentation of taxonomic units (domain, phylum or class) was determined using the Statistical Analysis of Metagenomic Profiles (STAMP) software package [50]. Only taxonomic units that had an effect size .1 between two metagenomes were included in the statistical analysis. P-values were calculated using the two-sided Fisher's Exact test, and 95% confidence intervals were calculated with the Newcombe-Wilson method. The Benjamini-Hochberg FDR method [51] was used to correct for the false discovery rate. Hierarchical clustering of metagenomes by taxonomic units using the Euclidian distance metric was performed with the TIGR Multiexperiment Viewer (TMeV) (http://www.tm4.org/mev/). Counts were normalized to total taxonomic counts to give a measure of relative abundance within a metagenome. Significance tests were run to determine whether there were statistical differences in the abundance counts across samples.

Results and Discussion
Shotgun metagenomics in combination with pyrosequencing were used to explore the taxonomic and antibiotic resistance determinant composition at six locations from north to south Puget Sound in addition to an effluent sample from a WWTP that deposits into the main basin of Puget Sound. The environmental data and summary statistics for the pyrosequencing run are shown in Table 1. Approximately 1.4 million reads were generated that passed the QC filter, comprising nearly 530 million base pairs of sequence data. There were 22 Mbp of assembled contigs greater than 500 bp and 47 contigs greater than 10,000 bp. The N50 contig size was 1.1 kb.

A Taxonomic Signature of Puget Sound
Approximately 42% of the unassembled sequence reads were assigned by phylum to protein sequences within GenBank using our sequence similarity criteria of $50% identity and an alignment length $50 aa. Bacteria accounted for 93% of all annotated sequence reads, with Archaea (4%), Eukaryota (2%) and viruses (1%) comprising the remaining proportion. All communities were dominated by the phylum Proteobacteria, which accounted for 49-67% of all sequence reads ( Figure 2  the Rhodobacterales bacterium HTCC2255 reference genome assembly with a minimum 95% identity per read (Figure 4). This genome was also previously assembled from a metagenomic sample taken from Puget Sound [55]. The high abundance of Rhodobacterales is in contrast to the other marine metagenomes where Rickettsiales sp., another group of a-Proteobacteria that includes the ubiquitous SAR11 clade, was the dominant organism ( Figure S1). Recent metatranscriptomic samples collected from Monterey Bay, CA also contained a high abundance of Rhodobacterales sp. HTCC2255 and a large Rhodobacterales: Rickettsiales ratio [56], suggesting Rhodobacterales may be a dominant organism in surface waters on the U.S. West coast. Members of the phylum Bacteroidetes have also been shown to be widespread in marine environments, and the increased proportion of this taxon in the open Sound relative to the other metagenomes is attributable to an overrepresentation of Flavobacteriales sp. (Figure 3). The proportion of Actinobacteria significantly increased from ,2% of reads in the open Sound to 13% in the Marina and 25% in the WWTP sample (p-value ,10 215 for all comparisons). Actinobacteria was also overrepresented in the other freshwater and freshwater-impacted environments, and further analysis revealed a strong negative correlation between Actinobacteria abundance and salinity (r = 20.952; p,10 26 ) ( Figure S2). Actinobacteria in coastal areas has previously been suggested to be a potential signal of terrestrial or freshwater runoff [11,57], and our results support this notion. a-Proteobacteria (r = 0.904; p,10 24 ), b-Proteobacteria (r = 20.913; p,10 25 ), c-Proteobacteria (r = 0.860; p,10 24 ), Bacteroidetes (r = 0.612; p,0.05) and Firmicutes (r = 20.644; p,0.05) were also correlated with salinity, while cyanobacteria abundance was correlated with temperature (r = 0.804; p,0.005). The GC-content distribution also reveals the taxonomic similarity between the open Sound samples and the differences in community composition between the open Sound, Marina and WWTP effluent ( Table 1). The open Sound had a mean %GC of 43.62%60.40, with a slight increase for the Marina sample (46.8%). The GC-content of the WWTP sample was significantly higher than for all other samples (56.4%).
Approximately 3.3% of sequence reads were classified at the genus level by the RDP. Open Sound samples were characterized by a high abundance of Candidatus Pelagibacter (SAR11 clade) and Thalassobacter (family Rhodobacteraceae) from the phylum a-Proteobacteria and Polaribacter from Bacteroidetes. The Marina was dominated by the c-proteobacterium Glaciecola, accounting for 28.3% of genera diversity. Glaciecola was absent in all other samples except P5 where it comprised 0.7% of genera diversity. The abundance of bacteria from the genus Acinetobacter, which has been linked to hydrocarbon degradation in marine environments and has been shown to be seasonally prevalent in a coastal marina [11], was not significantly different between the Marina (0.1%) and open Sound (,0.03-0.07%) but was considerably elevated in the WWTP (1.07%). The WWTP effluent also contained more specific taxonomic markers of potential human impact. Firmicutes, which is one of the dominant phyla of the human gut [58], accounted for a significantly greater proportion (8.4%) of all WWTP phyla than in the other samples except farm soil, and was also associated with a number of the WWTP putative antibiotic resistance gene sequences. Clostridiales sp. was the most abundant group within Firmicutes, members of which are commonly found in human or animal feces [59,60]. Blautia is a core human gut microbe [61], and Hespellia is a member of the family Lachnospiraceae within which phylotypes have been identified as genetic markers of human fecal contamination [62]. Other taxonomic surveys of WWTP environments have also found an increased abundance of Firmicutes [63,64], and one recent study used the ratio of Firmicutes and Bacteroidetes to a-Proteobacteria abundance as an indicator of fecal pollution in watersheds [12]. Thus the overabundance of Firmicutes within the WWTP distinguishes this environment from the other marine and freshwater metagenomes, and may suggest the use of this phylum as a potential indicator of human impact.
Despite the spatial variation between the open Sound samples, the taxonomic composition across these surface water samples was strikingly similar and can likely be attributed to similar salinity and temperature gradients. Salinity has been shown to be one of, if not the most, important environmental parameters determining the level of similarity between isolated microbial communities [65]. This is evidenced by the fact that the samples cluster by salinity (Figure 2). The open Sound samples cluster with other more saline environments including the Sargasso Sea (open ocean) and the Gulf of Mexico (coastal ocean), while the Marina and WWTP effluent cluster within a larger clade that contains other freshwater and freshwater-impacted metagenomes (Lake Lanier, Chesapeake Bay and farm soil). The freshwater clade can be described by a high proportion of Actinobacteria, an increased abundance of b-Proteobacteria and an overall increased level of diversity. The metagenomes composing this freshwater clade can also be defined as being human impacted environments, and while it is not possible to further tease apart the potential influence freshwater input versus human impact has on the taxonomic profiles, it is important to note that the two are related in that environmental pollutants can enter marine ecosystems via freshwater runoff and thus a freshwater signal may in some cases be a potential indicator of human impact.
It is also important to note the potential impact temporal differences may have had on community composition. Seasonality has been shown to be a strong driver of marine microbial composition [66]. While the open Sound samples were collected within a span of two days at the end of October, the Marina and effluent samples were collected approximately 7 and 12 weeks later respectively. By nature of the fact that the freshwater effluent sample is already taxonomically distinct from the marine samples due to salinity and source, temporal variability is not likely a primary explanatory variable for describing the differences in community composition. Seasonal influences are more relevant when comparing the open Sound to the Marina. Freshwater discharges from rivers into Puget Sound are at a maximum in late autumn due to rainfall runoff and this flow continues at elevated levels through the winter [35]. The lower salinity of the December Marina sample (,24 ppt) may therefore correspond to increased freshwater inputs. Subsequently the community composition of the Marina sample, which includes a higher proportion of the potentially freshwater or terrestrial-sourced Actinobacteria, may reflect these seasonal differences. The temperatures of the Marina and open Sound samples were similar despite the two month lag ( Table 1). More temporal monitoring data is needed to better explain the role seasonality has on bacterial composition at these locations.

Proportion of Putative Antibiotic Resistance Gene Sequences Differs Across Metagenomes
In order to determine the abundance of putative antibiotic resistance genes in the Puget Sound and WWTP datasets, the sequence data was searched against an expanded and nonredundant ARDB (termed ARDB+) which also included 103 functional antibiotic resistance genes sequences isolated from metagenomic projects. Approximately 0.0013% (n = 18) of all sequence reads had $80% identity and an alignment length $50 aa to a sequence within the ARDB+. To validate this annotation, the 18 sequences were also searched against GenBank, and results indicated that the best protein hits within GenBank were also to antibiotic resistance genes. Of these 18 sequences, the WWTP effluent had the highest representation (n = 14), followed by the Marina (n = 4) and the open Sound (n = 0) ( Figure 5; Table S1). For comparison, the proportion of putative resistance gene sequences in the WWTP effluent (0.012%) was lower than that for other pyrosequenced metagenomes from different environmental matrices including a river sediment sample taken downstream from an Indian WWTP that processes large quantities of drugs (1.4%) [32] and the plasmid fraction of a WWTP activated sludge sample (1.1%) [31], but slightly greater than that of an activated sludge metagenome from another municipal WWTP (0.008%) [15]. Tetracycline resistance genes had the highest representation (33%) within the 18 putative antibiotic resistance gene sequences found in our samples. The WWTP putative antibiotic resistance gene sequences had similarity to those from clinically relevant bacteria species including Clostridium, Enterococcus and Streptococcus, while resistance gene sequences from the Marina were similar to those from environmental bacteria including Desulfococcus, Polaromonas and Pseudomonas. The average percent protein coverage based on the sequence alignment of the 454 reads to ARDB+ sequences was 33.8%618.9 (mean6SD).
There was a significant increase in the proportion of putative resistance gene sequences from the open Sound to Marina to WWTP ( Figure 5), suggestive of a relationship between resistance gene abundance and human impact across these environments. The fact that a differential resistance gene signal could be detected across the different environments supports using a pyrosequencing approach to identify trends in resistance gene abundance across diverse environments. Challenges still remain though in using this approach for quantification of resistance genes in metagenomes. First, the low sequencing depth per sample likely limits our ability to detect the full resistance gene repertoire of complex microbial community samples. Depth of sequencing is a significant limitation in metagenomic investigations and thus targeting specific genomic elements in whole-genome sequence data has been a considerable challenge [67]. This is especially relevant for aquatic samples where dilution reduces the likelihood of identifying specific genes. That only 18 sequences in the Puget Sound dataset matched to known resistance genes potentially suggests a considerable undersampling of the resistome when using pyrosequencing. Other marine and freshwater pyrosequenced metagenomes also have a relatively rare representation of resistance genes ( Figure S3), suggesting the need for higher sequencing depths than achievable by 454 technology to fully characterize the resistomes of surface water microbial communities. For example, Illumina sequencing technology allows for greater sequencing depth than does pyrosequencing, but with shorter read lengths and therefore a greater reliance on sequence assembly, which still remains a challenge for mixed microbial communities. Secondly, the lack of sequence data for the unculturable prokaryotic majority in combination with evidence from functional metagenomic studies indicating that many environmental resistance genes appear to be distantly related and share low sequence similarity to cultured resistance genes may result in an underestimation of the actual number of resistance genes. This challenge is being addressed through further sequencing of bacterial reference genomes and functional metagenomic projects, however more research is needed. Lastly, sequence-based identification of resistance genes does not necessarily imply these genes are functionally expressed. The resistance potential can be estimated using sequence-based metagenomics, but functional metagenomics or transcriptional analysis is needed to detect functional resistance genes and more clearly define public health risks.
The fate of antibiotic resistance determinants during the wastewater treatment process is relatively unknown at this time. As no influent from the WWTP was collected in this study, it is unclear what impact the wastewater treatment process had on the prevalence of antibiotic resistance genes in the effluent. Previous studies have shown mixed results regarding resistance gene levels in pre-versus post-treated sewage. While concentrations of resistance genes in wastewater may significantly decrease through the wastewater treatment process [68], other studies have found that resistance gene levels are higher in the effluent and that therefore the treatment process may be selecting for resistant bacteria, genes or mobile genetic elements [69,70,71]. Either way, the effluent had a significantly higher abundance of antibiotic resistance genes than Puget Sound, although further effluent sampling is needed to support this individual finding.

Differential Abundance of Mobile Genetic Elements Across Metagenomes
Mobile genetic elements including plasmids and transposable elements are important vectors for the transfer of antibiotic resistance genes [72,73]. Although in-depth profiling of the plasmid fraction of metagenomes is currently not possible for whole genome sequencing projects due to low depth of sequence coverage, insights into the plasmid composition of the Puget Sound and WWTP metagenomic datasets could still be made from the ,0.04% of sequences that matched known plasmids. A total of 124 sequence reads (0.01%) and 3 contigs from the open Sound and Marina metagenomes and 265 reads (0.22%) and 4 contigs from the WWTP matched plasmids within the NCBI RefSeq database using a similarity criteria of $95% sequence identity over 100 bp ( Figure 6A; Table S2 and Table S3). There were 13 plasmids in common between the open Sound, Marina and WWTP metagenomes. The open Sound and Marina plasmid sequences were dominated by aand c-Proteobacteria, while the WWTP plasmids were associated with a-, band c-Proteobacteria, Firmicutes and Actinobacteria. At the species level, nearly half of the plasmid matches to the open Sound and Marina reads were sourced from Silicibacter sp. Silicibacter is a member of the marine Roseobacter clade (a-Proteobacteria) and has been shown to form symbioses with phytoplankton [74]. The remainder of the open Sound and Marina plasmid sequences had taxonomic assignments to other environmental bacteria in addition to a number of human pathogens including Klebsiella pneumonia, Salmonella enterica, Vibrio sp., Pseudomonas aeruginosa and Serratia marcescens. Many of these plasmids contain genes associated with virulence factors, heavy metal resistance, beta-lactamase, tetracycline and aminoglycoside resistance and antibiotic resistance determinants including transposases and integrons. For example, plasmid TC68 from Vibrio sp. is a known chloramphenicol resistance gene that has been previously isolated from a fish farm microbial community [75]. The WWTP plasmid sequences were characterized by both pathogenic and non-pathogenic host-specific bacteria. Human pathogens included Bacteroides fragilis, Campylobacter, Enterococcus, Klebsiella pneumonia, Listeria monocytogenes and Salmonella. Also present were bacterial indicators of fecal contamination, including Bifidobacterium and Escherichia coli. Bacteria associated with food products and used in food production were also prevalent, including Lactococcus lactis, Lactobacillus rhamnosus and Streptococcus thermophiles. The two largest contigs from the WWTP that matched plasmid sequences had 100% identity to plasmids from uncultured bacteria previously identified in activated sludge from other WWTPs [76,77] (Table S2).
Both approaches to identify transposable elements resulted in an increasing abundance of transposable elements from open Sound to Marina to WWTP, with a highly elevated proportion in the WWTP ( Figure 6B and 6C). The Pfam approach did detect a higher proportion of transposable elements in the Marina and WWTP than the BLAST approach, as well as a significant difference in transposable element abundance between the Marina and open Sound. No significant differences in the abundance of transposable elements were seen within the open Sound samples using either approach. The most abundant transposases in the effluent included Tn3, IS4, TnpA (gene product of IS608), the mutator transposases and IS3 (specifically IS911), in addition to a large percentage of unclassified transposases ( Figure 6D). While the most abundant transposable elements in the Puget Sound samples were the same as that for the effluent, they were for the most part present in much lower numbers. Transposable elements from the effluent were more likely to be found on plasmids than the Puget Sound samples. Approximately 20.7% of transposable elements from the effluent were annotated as being sourced from plasmids, compared to 0.039%60.008 for the open Sound and 13.56% for the Marina. Plasmids that were unique to and represented at least 3% of the total plasmids annotated as carrying transposable elements in the WWTP sample included pA81 (biodegradation/heavy metal resistance) [78], pMOL28 (heavy metal resistance) [79] and pSKYE1 (aromatic compound degradation) [77]. Plasmid pAA1 (xenobiotic compound metabolism) [80] was also abundant in both the WWTP and Marina samples. Plasmids pDSHI01 from Dinoroseobacter and pMAQU02 from Marinobacter were commonly found within the Puget Sound samples.
This comparative metagenomic survey of Puget Sound has provided baseline data describing the community composition and antibiotic resistance determinant abundance across this marine environment in addition to an effluent sample from a proximal WWTP. Our results support the use of whole-genome pyrosequencing for comparing community composition across differentially impacted environments and for profiling antibiotic resistance determinants in highly impacted environments. To more completely capture the resistomes of natural environments with low human impact, sequencing technologies allowing for greater depth of sequencing may be needed. This study has highlighted the similarity of these metagenomic components in the open estuarine environment and the differences that emerge when analyzing more nearshore and human impacted ecosystems. Taken together, these results warrant further investigation into the potential for WWTP effluent to disseminate resistance determinants into the marine environment. As only one Marina and one WWTP sample were included in this study, further sampling of these environments is needed in order to support the observed trends. Furthermore, this study only analyzed surface water communities, and community composition has been show to change significantly with depth [81]. Increasing the spatial and temporal extents of metagenomic sampling and analysis will thus be important for longitudinal monitoring and further assessment of human impacts in marine environments. Figure S1 Relative abundance (order level) of major taxonomic groups in the Puget Sound samples and other selected metagenomes. Sequences were assigned to the NCBI taxonomy using MG-Rast [39] and the lowest common ancestor algorithm ($50% identity and alignment length $50 amino acids). Taxa representing .1% of assignable sequences in one or more samples are shown, while taxa present in ,1% of sequences in all samples are grouped in the 'other' category.  Figure S3 Abundance of antibiotic resistance gene sequences in environmental metagenomes with varying sequencing depth. These metagenomes represent a diverse mix of environments, including river sediment samples taken downstream from a wastewater treatment plant (WWTP) processing high volumes of antibiotics [32], coastal surface water samples taken as part of the Global Ocean Sampling Expedition [82], the activated sludge fraction of a WWTP [15] and an urban freshwater lake [43]. Reads that aligned to a sequence within the ARDB+ with $80% sequence identity over at least 50 amino acids were classified as putative resistance genes. (TIF)

Supporting Information
Table S1 Puget Sound 454 sequence reads with $80% similarity and an alignment length $50 amino acids to sequences within the expanded Antibiotic Resistance Genes Database (ARDB+). (PDF)