Distribution and Functions of TonB-Dependent Transporters in Marine Bacteria and Environments: Implications for Dissolved Organic Matter Utilization

Background Bacteria play critical roles in marine nutrient cycles by incorporating and redistributing dissolved organic matter (DOM) and inorganic nutrients in the ocean. TonB-dependent transporter (TBDT) proteins allow Gram-negative bacteria to take up scarce resources from nutrient-limiting environments as well as siderophores, heme, vitamin B12, and recently identified carbohydrates. Thus, the characterization of TBDT distribution and functions is essential to better understand the contribution TBDT to DOM assimilation and its consequences on nutrient cycling in the environment. Methodology/Principal Findings This study presents the distribution of encoded known and putative TBDT proteins in the genomes of microorganisms and from the Global Ocean Survey data. Using a Lek clustering algorithm and substrate specificities, the TBDT sequences were mainly classified into the following three groups: (1) DOM transporters; (2) Siderophores/Vitamins transporters; and (3) Heme/Hemophores/Iron(heme)-binding protein transporters. Diverse TBDTs were found in the genomes of oligotroph Citromicrobium bathyomarinum JL354 and Citromicrobium sp JLT1363 and were highly expressed in the stationary phase of bacterial growth. The results show that the Gammaproteobacteria and the Cytophaga-Flavobacterium-Bacteroides (CFB) group bacteria accounted for the majority of the TBDT gene pool in marine surface waters. Conclusions/Significance The results of this study confirm the ecological importance of TBDTs in DOM assimilation for bacteria in marine environments owing to a wide range of substrate utilization potential in the ubiquitous Gammaproteobacteria and CFB group bacteria.


Introduction
Bacteria play important roles in ocean carbon and nutrient cycling by incorporating and redistributing dissolved organic matter (DOM) and inorganic nutrients [1], [2]. Transport proteins are a primary mechanism in inorganic nutrient uptake and DOM assimilation in microbial cells. Therefore, the identification and characterization of transport proteins may be important towards understanding a broad range of DOM molecules available for assimilation and utilization by microbes.
TBDT sequences exhibited low sequence similarity with one another, indicating an unexpectedly high diversity within TBDT sequences. The accounts of the TBDT system in different bacteria were summarized based on bioinformatic analyses [14], [26], [27], and previous research identified the functions of few TBDTs in cyanobacteria and nitrogen-fixing nodulating bacteria by using CLANS [28]. The siderophores-related TBDTs in marine bacteria and metagenomics have recently been analyzed using OrthoMCL [29]. However, previous attempts on the genomic analysis of TBDT were mainly focused on a rather narrow group of species [26], [27] or on a particular subfamily of TBDTs, such as siderophore transporters [14]. A complete functional characterization of TBDT has not been performed to date.
This study performs a comprehensive survey of TBDT in all available bacterial genomes and in the Global Ocean Survey (GOS) metagenome [30]. The predicted TBDTs were classified into groups according to their pair-wise sequence similarities and substrate type. The prediction of substrates for the selected TBDTs was conducted using gene context analysis. Majority of the TBDT sequences in the metagenomic data sets originated from the Gammaproteobacteria and Cytophaga-Flavobacterium-Bacteroides (CFB) group bacteria. TBDTs for siderophores and DOM transportation were present in the Sphingomonadales organisms as well. The genomes of two marine bacteria, namely, the oligotroph Citromicrobium bathyomarinum JL354 [31] and Citromicrobium sp. JLT1363 [32] from the order Sphingomonadales, were sequenced and annotated, and TBDTs were found to be abundant in their genomes. Moreover, the protein expression profiles of cells show that TBDTs were involved in coping with the low levels of nutrients during the stationary phase. The overall findings have advanced the understanding of the role of TBDTs in oceanic organic matter and nutrient cycles.

Distribution of Transporters in Bacterial Genomes
TBDT genes are distributed among a variety of divergent bacterial taxa, including Proteobacteria, Cyanobacteria, Verrucomicrobia, and Spirochetes. Overall, approximately 36% of the draft and completed genomes from NCBI (3,274 genomes) contain TBDTs. The number of TBDT genes in various bacteria is presented in Table S1. This study primarily focuses on TBDTs in marine bacteria, and thus, species were included such that all the major groups of bacteria are covered. As described in Table  S2, 68% of the 174 analyzed genomes contained 1-15 TBDTs and 14% contained more than 30 TBDTs. The other transporters of the representatives of the major bacterial groups, including members of the Sphingomonadales, SAR11, the Roseobacter clade, Gammaproteobacteria, CFB and cyanobacteria, which are distributed in the surface marine waters [33], were investigated further (Table 1).
TBDT sequences appeared to be highly prevalent in the orders Sphingomonadales, Alteromonadales, in the CFB group bacteria, and in the oligotrophic marine Gammaproteobacteria (OMG) group, which includes members of the BD1-7, SAR92, and OM60/NOR5 clades (Table 1). Citromicrobium bathyomarinum JL354 and Citromicrobium sp. JLT1363 contain 27 and 31 TBDT sequences in their genomes. Except for strain Ruegeria pomeroyi DSS-3, which lacked TBDT sequences in the genome, all the other analyzed Roseobacter strains carried one to five TBDT sequences (Table S1). The genes for TBDT were also present in several cyanobacterial strains, and not all marine bacteria contained TBDT. For example, Candidatus Pelagibacter ubique, which is the most abundant prokaryote in the SAR11 clade, did not contain TBDT.
However, ABC transporters were recognized in all the analyzed genomes, indicating that they are absolutely essential for substrate transport in bacteria. The number of ABC transporter genes ranged from 22 to 117 genes per genome (Table 1), and the number of ABC transporters was higher than that of TBDTs in most bacteria. No clear correlation between the number of transporters and bacterial trophic strategy [34] or genome size was found because many copiotrophic or oligotrophic bacteria had an unusually high number of genes that encode TBDTs or ABC transporters in their genomes. However, the number of ABC transporters per Mbp of each genome was distinctly higher in some bacteria with few TBDTs. For example, a specific enrichment in the number of ABC transporters was present in the Roseobacter clade, and SAR11 and cyanobacteria had abundant ABC transporters (Table 1). A significant negative correlation between the number of ABC transporters and TBDTs per Mbp of each genome was found (Spearman rank correlation: r s = 20.677; N = 39; P,0.0001). The Roseobacter clade and Vibrio had an unusually high number of genes that encode TRAP transporters and PTS systems in their genomes, respectively. The distribution of transporter genes in bacterial genomes suggests that the use of the transporter system among bacteria had considerable differences. However, the presence of a reasonable number of transporters enabled the bacteria to use a broad range of substrates for growth.

Distribution of Transporters in Aquatic Systems
The GOS dataset used in this study contained genomic sequences derived from 57 samples from the open ocean to the coast across temperate and tropical regions as well as few nonmarine aquatic samples [30], [35]. A total of 44073 putative TBDT homologs that varied in number from 65 to 3,946 were found in the samples (Table S3). The frequency of the TBDTs in the total open reading frames (ORFs) provided a rough estimate of the prevalence of TBDT in ocean waters, ranging from a low of approximately 0.1% (GS4) in the North American East Coast to a high of approximately 2% (GS18) in the Caribbean Sea Coastal (Table S3). To compare the distribution of TBDTs in diverse environmental samples, the frequency of TBDT was determined by comparing the average number of control singlecopy gene hits for each site as an indicator to be used for the estimation of the total genome equivalents (see Materials and Methods). The Caribbean Sea Coastal GS18, which has extremely high frequency, was the most abundant station. The Sargasso Sea data sets (GSb, GSc, and GS1a) and the data sets from the Galapagos Islands (warm seep GS30 and GS31) also exhibited high hits, but with moderate frequency. Moreover, the estimations of frequency and number of genes suggested that TBDTs were less common in the Indian Ocean, with the exception of GS117, GS110b, and GS122b, as well as in the coastal and estuarine stations of the North American East Coast (GS4, GS11 and GS12) and reef stations (GS49 and GS51) (Table S3). No significant relationships between the environmental factors (chlorophyll a, temperature, and salinity) and TBDT distributions  were found. Moreover, a greater number of ABC transporters with extremely high frequency were found at hypersaline site GS33 than at the other sites (Table S3). The relatively high number and frequency of ABC transporter gene hits were identified in the warm seeps GS30 and GS31. The open ocean sites (Sargasso Sea GS1a, Indian Ocean GS110b, and GS112b) as well as reef station GS108b appeared to show relatively low frequency. No clear trend was observed in the type of environment in which the TBDT genes or ABC transporter genes were observed frequently. A significant positive correlation between the amounts of TBDT and ABC transporter homologs (Spearman rank correlation: r s = 0.791; N = 57; P,0.0001) seemed to exist. Most of the observed frequencies of the ABC transporter homologs in the total ORFs along the GOS transect were approximately 1% (Table S3).

Functional Characterization of TBDTs in Marine Bacterial Genomes
In this study, the Lek clustering algorithm was used to cluster the GOS sequences and NCBI sequences that contain TBDT sequences of known functions and investigate any sequence homology between TBDTs with known functions and the various putative ones (Table 2). Notably, the clustering results were not influenced by the removal of the GOS sequences.
The results found 3,343 clusters of TBDTs in the bacterial and environmental sequences, in which 17 Lek clusters that contain the experimentally determined TBDT sequences were found ( Table 2). The Lek clusters were subsequently classified into several groups (I to IV) according to the known substrate types (''DOM,'' ''siderophores/vitamins,'' ''heme/hemophores/iron(heme)-binding proteins,'' and ''metals''). Each group contained sequences from two or more Lek clusters (Table 2). Group I, which was recognized in the current analysis, consisted of novel transporters for various types of DOM, including carbohydrates, amino acids, lipids, organic acid, and protein degradation products. Members of Group II include siderophores, vitamin B1, and vitamin B12. Group III consisted of TBDTs that transport iron from heme or iron proteins with high affinity. Nickel and cobalt were the predicted substrates in Group V. The amino acid sequences of clustered siderophores-related transporters were not significantly similar to the TBDTs involved in DOM uptake. However, some of siderophores-related transporters shared some sequence homologies with the vitamin B12/ vitamin B1 transporter proteins and were thus clustered together ( Table 2).
TBDTs in selected marine bacterial genomes mainly functioned as transporters with diverse substrates, as summarized in Table S2. Members of clusters 427 and 3090 (DOM transporters) were mainly found in Gammaproteobacteria, including Alteromonadales (such as Alteromonadales bacterium TW-7) and the OMG group of bacteria (such as gamma proteobacterium HTCC2207), as well as in Sphingomonadales (such as Citromicrobium bathyomarinum JL354 and Citromicrobium sp. JLT1363). Cluster 720 (DOM transporters) was specific in its distribution to species of the CFB group of bacteria. One or two corresponding genes were identified in most of the analyzed CFB species. As many as 46 and 49 genes were found in Pedobacter heparinus DSM 2366 and Pedobacter sp. BAL39, respectively (Table S2).
Members of cluster 410 (siderophore transporters) were found in the genomes of most of the species analyzed using TBDT. Majority of the siderophore transporters came from the gammaand alpha-proteobacteria (such as the Alteromonadales, the OMG group of bacteria, and the Sphingomonadales). As many as 117 genes  that encode siderophore transporters were found in Sphingomonas wittichii RW1 (Table S2). Clusters 1609 and 1856 contained TBDT that transports iron from heme or iron-binding protein as an alternative iron source for the bacteria. The corresponding genes were identified in most of the analyzed species, and their number ranged from one to four genes per genome (Table S2). TBDTs in the Roseobacter clade and cyanobacteria were mainly distributed in clusters 410 and 1856, suggesting that they were responsible for iron acquisition in the bacteria. The members of the last group (clusters 767 and 987) remained as few sequences, indicating that nickel-or cobaltspecific TBDTs were not common among the bacteria (Table S2).

Functional Characterization of TBDTs in Metagenomes
Similarly, the TBDT genes with varying frequencies in all metagenomic datasets were mainly for siderophores/vitamin transporters and DOM-related transporters, followed by heme/ hemophores/iron(heme)-binding protein transporters ( Figure 1A). Nickel and cobalt TBDTs appeared to be uncommon in the surface ocean. Overall, DOM and siderophore transporters were particularly prevalent across 57 GOS sampling sites. Highfrequency DOM-related TBDTs from Gammaproteobacteria (clusters 427 and 3090) and siderophores-related TBDTs (cluster 410) can be found in the Sargasso Sea (GSb and GSc), as shown in Figure 2. The relatively high-frequency DOM-related TBDTs from the CFB group (cluster 720) were distributed in GS1a, GS3, GS31, and GS35 ( Figure 2). In addition, GS31 and GS35 contained heme transporters (cluster 1609) at relatively high frequency and siderophore transporters at relatively moderate frequency, suggesting complementary pathways for acquiring iron (Figure 2).
Predicted substrates for the frequently abundant ABC transporters in marine environments branched into chain amino acids, oligopeptides, alkylphosphonates, dipeptides, tungstates, glutamates, and aspartates ( Figure 1B). An overlap between the TonB-dependent substrates and the ABC transporter substrates, such as nickel, cobalt, and iron complex, was observed. The ABC transporter specificity for various amino acids and peptides was notably prevalent across marine environments, suggesting that the ABC transporters performed essential roles as nitrogen resources in the bacterial uptake of amino acids or peptides. The Fe 3+ ABC transporter genes for ferrichrome and iron dicitrate were more abundant than the Fe 2+ transporter genes in the metagenomes ( Figure 1B), as also indicated in previous studies [14]. However, the overall frequency of the identified siderophores-related TBDT genes (estimated average frequency of approximately 0.8) was close to that of the Fe 3+ ABC transporter genes (estimated average frequency of approximately 1.1). Thus, the Fe 3+ TonB-dependent transportation strategy may be more prevalent than originally believed [14].
The retrieved GOS sequences were dominated by TBDT genes that were closely related to Gammaproteobacteria (primary members of the Alteromonadales and the OMG group of bacteria) with moderate contributions from species in the CFB phyla ( Figure 3). Sequences related to the order Sphingomonadales were generally rare, but were also contributors to siderophores-and DOMrelated transporters in the metagenomic data ( Figure 3). Taxonomic distribution of the retrieved GOS sequences in the major protein families was similar to the TBDT distribution pattern in the bacterial genome.
The stringent E-value cutoff values for the BLASTP search [36] and the Lek clustering [37], [38] were chosen to ensure that the functionally relevant TBDTs are present in the same cluster when the predicted clusters were too high. Approximately 11% of the TBDTs from the non-redundant (NR) database and 40% of the TBDTs from the GOS were categorized in the clusters without function-known genes ( Table 2). For example, many sequences from GS18 were in the clusters that only contained GOS sequences, although the highest number of TBDTs was observed in GS18 (Table S3). The GOS sequences possibly came from uncultured marine bacteria, which showed relatively low similarity to TBDT sequences from the bacterial genome. On the other hand, TBDT subfamilies that only included GOS sequences can be considered likely to be as-yet undiscovered TBDT subfamilies. The current data supported the notion that the number and variety of TonB-dependent substrates were underestimated [19] and especially that the experimental studies for identifying TBDT substrates in marine samples were extremely limited. The large diversity and structural complexity of DOM in the marine environment make it a potential substrate for TBDTs.

Genomics-based Prediction of Substrates for TBDTs in the Genomes of Two Citromicrobium Strains
The genomic localizations of the genes of the identified TBDTs were analyzed to predict their substrate preferences. Citromicrobium bathyomarinum JL354 and Citromicrobium sp. JLT1363 are closely related with each other phylogenetically, and both strains have similar functional distribution of TBDT. However, the genomic organization and gene context surrounding the TBDT gene in the bacteria were often not conserved, suggesting the diversity of the substrates for TBDT (Figures S1 and S2). Most of the coding genes for TBDT were not randomly distributed across the genome, but were located in a physiologically meaningful genomic context.
Based on the clustering analysis ( Figures S1 and S2), majority of TBDTs were found to transport siderophores and DOM. Interestingly, some TBDT genes resided within operons predicted to code for enzymes that transfer fatty acyl substitution; two examples are shown in Figures 4a and 4b. These genes may have essential functions in the production and attachment of the siderophore fatty acyl chain [39]. The siderophore contains a fatty acyl chain or an a-hydroxy carboxylic acid moiety that dominates the marine siderophores [40]. The present findings suggest that Citromicrobium bacteria with a fatty acyl side chain are capable of transporting siderophores. Figure 4c and 4d show the other iron transporters. Aside from putative TBDT for heme and siderophores uptake, both strains carried TBDT sequences that may be used for DOM and vitamin B12 import (Table S2). For example, a TBDT gene in Citromicrobium bathyomarinum JL354 was adjacent to a phytase gene, suggesting that phytic acid is a substrate for TBDT (Figure 4e). A TBDT in Citromicrobium sp. JLT1363 was located in close proximity to a gene-encoding cyanophycinase that degrades the amino acid polymer cyanophycin, which is an important intracellular nitrogen-storer of cyanobacteria ( Figure 4f). Citromicrobium bathyomarinum JL354 had a gene encoding TBDT proximity to genes associated with vitamin B12 biosynthesis, allowing it to be annotated as a vitamin B12 transporter (Figure 4g).

Genomics-based Prediction of Substrates for Selected TBDTs from Metagenomes
Overall, the environmental TBDT sequences in the major clusters that were most abundant were best matches to the Gammaproteobacteria or CFB group bacteria. Among the species, the gamma proteobacterium HTCC2207, gamma proteobacterium HTCC2143, gamma proteobacterium NOR5-3, Pedobacter heparinus DSM 2366, and Flavobacteria bacterium MS024-3C species contained the greatest number and greatest identity of recruited GOS sequences in major clusters. Figure 5 shows the recruitment sequences observed in large amounts and similar to TBDT genes. Many environmental TBDT sequences in cluster 3090 showed greatest similarity to a phytatic acid transporter from gamma proteobacterium HTCC2207 (Figure 5a) or a TBDT gene in gamma proteobacterium HTCC2143 located upstream of the galactose utilization genes (Figure 5b). Environmental TBDT sequences in cluster 427 showed the greatest similarity to a TBDT gene in gamma proteobacterium NOR5-3 located adjacent to a D-aminoacylase gene that catalyses N-acyl-D-amino acid derivates (Figure 5c). Homologs of a peptidoglycan transporter in Pedobacter heparinus DSM 2366 were detected frequently in the environment (Figure 5d). In cluster 180, numerous sequences were homologous to a TBDT in gamma proteobacterium NOR5-3 located near a predicted gene encoding cyanide hydratase (Figure 5e). A previous study indicated that a putative siderophore component was involved in cyanide utilization [41]. Gammaproteobacterium HTCC2143 had a siderophore transporter gene residing within an operon that was predicted to code for enzymes involved in fatty acid metabolism, similar to the siderophore transporter gene in Citromicrobium bathyomarinum JL354 (Figure 5f).
In this study, the results of the genome context analysis suggest that the substrates for TBDTs included siderophores, ferric citrate, vitamin B12, heme, phytic acid, galactose, cyanophycin, N-acyl-D-amino acid derivates, peptidoglycan, and cyanide (Figures 4 and   5). To further detail the DOM utilization patterns, further study must be undertaken to identify more substrates for TBDT, especially from a representative set of bacteria (Gammaproteobacteria and the CFB group).

Proteomic Analysis of Citromicrobium Strains
Proteomic views of Citromicrobium bathyomarinum JL354 and Citromicrobium sp JLT1363 during the stationary phase of culture are presented in Table S4 and Table S5, respectively. Many known abundant bacterial core proteins [42], such as ATP synthase, elongation factors, chaperonin, transporter systems, and ribosomal proteins, contain abundant proteins. The functional categories in the identified proteins are shown in Figures S3 and  S4. Transporter activity in terms of numbers was one of the most represented biological process or molecular function gene ontology (GO). Both strains were shown to produce a large number of proteins involved in the TonB-dependent transport system, which comprises a considerable part of the total periplasmic proteome ( Table 3). Eight TBDTs and three ABC transporter subunits were detected in a total of 18 periplasmic proteomes from Citromicrobium bathyomarinum JL354 (Table 3). Fifteen TBDTs and six ABC transporter subunits were found in a total of 31 periplasmic proteomes from Citromicrobium sp JLT1363 (Table 3). These TBDTs might be involved as DOMs, siderophores, or thiamin transporters. Moreover, remarkably high expressions of various TBDTs were observed for Sphingomonas [43] and Pseudoalteromonas haloplanktis TAC125 [44], which contained a high number of TBDT genes in their genomes. ABC transporters were absolutely required for all bacterial growth. For instance, the ABC transporters were among the most highly detected proteins from the stationary phases of Ruegeria pomeroyi DSS-3 [45] and Candidatus Pelagibacter ubique [46]. The enrichment of TBDTs in the proteome indicated that TBDT proteins might indeed contribute to bacterial growth. More importantly, TonB-dependent transporter systems may partially compensate for a low number of ABC transporters in marine microorganisms.

Discussion
The results of the analysis of the genetic contexts near the TBDT show that some TBDT genes were closely associated with the carbohydrate, amino acid, or other substrate metabolism enzymes in the operons (Figures 4 and 5), suggesting that these genes are functionally linked and may therefore play important roles in diverse DOM uptake for marine bacteria, but are not limited to iron and vitamin B12 uptake only.
Marine organic matter was thought to originate primarily from phytoplankton production [47], [48]. Carbohydrates produced by phytoplankton form an important fraction of the DOM in the ocean [48]. For the Gammaproteobacteria and the CFB group of bacteria, TBDTs located in the gene cluster related to carbohydrate metabolism might play a major role in utilizing labile substrates, such as xylose, arabinose, and galactose derived from phytoplankton. In the surface ocean, the ability of monosaccharide incorporation by bacteria depends on ABC transporters because of the abundance and omnipresence of the bacterioplankton population (such as SAR11 and the Roseobacter clade) in the marine ecosystem [10]. Carbohydrate utilization of bacterial population is further enhanced in the presence of TBDTs from Gammaproteobacteria and CFB bacteria.
Polysaccharides released by phytoplankton were identified as the major constituents of naturally occurring marine highmolecular-weight DOM [49]. Complex molecules that contain N-acetylglucosamine (GlcNA), such as chito-oligosaccharides, are substrates for TBDT [21], [23]. For instance, some CFB can transport and consume chitin and GlcNA within the DOM pool, and thus, it should be a relevant part of chitin-like DOM degradation in marine systems [50]. The N-acyl-D-amino acid derivates may also be used as a putative substrate for TBDT from Gammaproteobacteria. Other polysaccharides, such as pectin and alginates, which were the dominant components in diatom and brown algae cell walls, respectively, were also identified as potential substrates for TBDTs [26]. These data suggest that members of the CFB and Gammaproteobacteria not only contained polysaccharide hydrolases, but also hydrolysate specific-TBDTs with a selective advantage over other heterotrophic bacteria in a marine environment enriched with polysaccharides, such as in biofilm. The sea surface microlayer has commonly been thought to be a gelatinous biofilm and has been shown to be dominated by Gammaproteobacteria and Bacteroidetes [51], [52]. A similar result was observed in the biofilm from the Irish Sea [53]. On the other hand, major storage products in phytoplankton, such as cyanophycin in cyanobacteria and starch, were also found to be potential substrates for TBDTs in some bacteria. This observation explains in part the ability of Gammaproteobacteria and CFB to consume a wide range of phytoplankton-derived DOM.
DOM also originates in part from the degradation of terrestrial plant materials, such as phytic acid, which is transported to the open sea through rivers [54]. Phytic acid can be an abundant source of phosphorus and carbon for bacteria in coastal waters and is it also used for chelating Fe [54]. Phytic acid-specific TBDT sequences were abundant in the metagenomic data.
Aside from the prevalence of TBDT sequence from (meta-) genomic data, the other 'omic' datasets including (meta-) proteomics and (meta-) transcriptomics suggested that TBDTs are physiologically and ecologically significant. The metaproteome sampled from South Atlantic surface waters showed that TBDTs were predominant in membrane proteins, wherein the Gammaproteobacteria and CFB group bacteria were the most abundant from the open ocean to the coastal seawater [6]. These TBDT proteins were closely related to Gammaproteobacteria and members of the CFB, Alphaproteobacteria, and Deltaproteobacteria. TBDTs were detected in abundance in the metaproteomic samples from the upwelling in the South China Sea off Vietnam, where the Alteromonadales represented the abundant taxa in this research (unpublished data). Transcripts associated with TBDT were significantly overrepresented in the high-molecular weight DOM treatment from the new dominant Alteromondales bacteria, whereas Pelagibacter and Prochlorococcus decreased in relative abundance [7]. TBDTs also comprise the most abundant group of transportrelated transcripts within the CFB group bacteria sequences in natural environments [8].
However, ABC transporter transcripts dominated the acquisition of DOM monomers, with minor contributions from the TRAP in the southeastern U.S. coastal seawater wherein the pelagic microorganism Roseobacter and SAR11 are dominant [4]. Carbohydrate-related ABC transporter transcripts from Roseobacter bacteria accounted for more than 30% of all the DOM-related transporter sequences in the coastal ecosystem [4], suggesting that they outcompeted other marine bacteria for carbohydrates. SAR11 bacteria typically contributed to a significant fraction of amino acid assimilation in surface waters [55]. In the Sargasso Sea, the periplasmic compounds of ABC transporter systems for amino acids were found to be the most abundant peptides from SAR11 [3], suggesting that they were able to take up amino acid. Not surprisingly, based on their known abundance in the wild and the high proportion of ABC transporter in Roseobacter and SAR11 bacteria genomic data, the ABC transport proteins were the most frequently detected proteins at such high abundance in marine environments. Hence, the difference in the expressed transporter profiles in marine environments is primarily caused by the geographic distribution of the dominant bacteria population. However, the sampling or analysis method influenced the results of membrane proteomics [5]. The peptide search in the metaproteomic study in the Sargasso Sea revealed only the environmental protein-coding sequences from SAR11 and cyanobacteria [3].
A previous study indicated that relatively abundant bacteria cannot dominate the consumption of all DOMs and that a diverse assemblage of bacteria is essential for the complete degradation of complex DOM in the oceans [50]. The abundant operational taxonomic unit belongs to the members of SAR11, Roseobacter, cyanobacteria, CFB group, and Gammaproteobacteria (such as OMG and Alteromonadales) across the GOS sites [56]. The major types of high-affinity carbohydrate and amino acid transporters known in the Roseobacter and SAR11 bacteria include ABC systems [3], [4]. In contrast, CFB group bacteria, such as Polaribacter sp. MED152, possess relatively few transporters for free amino acids and lack carbohydrate-specific ABC transporters [57]. The Shewanella species (order Alteromonadales) carried TBDTs for GlcNA or chito-oligosaccharides transport across the outer membrane and a specific permease for GlcNA transport, but lacked GlcNAspecific PTS or ABC systems in the cytoplasmic membrane [58]. Thus, a broad array of different substrate utilization patterns for Gammaproteobacteria and CFB bacteria in the marine environment could be assumed to occur ubiquitously because of the expressions of TBDTs, although these bacteria possibly represented a lower percentage of overall microbial community nutrient acquisition compared with the high abundance of ABC transporters in pelagic microorganisms, such as SAR11, Roseobacter, and cyanobacteria.
In conclusion, transporter sequence information, such as diversity and substrate specificity, provide useful and relevant clues for the DOM utilization of a bacterium, bacterial group, or even a microbial community. In this study, TBDTs for DOM transportation were found to be abundant across marine environments. Based on the prevalence of TBDT and ABC transporter sequence data, the major bacterial groups were found to have distinct DOM uptake patterns. Such a metagenomic analysis can potentially contribute to the understanding of the molecular bases of bacterial activities in the ocean, particularly by highlighting the contributions of Gammaproteobacteria and CFB group bacteria with TBDTs to the overall ecosystem function.

Data Preparation
The NR database (3.8 GB, 12,061,831 sequences; 4,118,133,053 total letters) of protein sequences was downloaded from the National Center for Biotechnology Information (NCBI), and the predicted proteomes of Citromicrobium bathyomarinum JL354 and Citromicrobium sp. JLT1363 from our laboratory were compared with them. The amino-acid sequences and the site metadata (sample location and environment conditions were listed in Table S3) derived from 57 samples in the GOS [30] expedition were retrieved from MG-RAST (version 2) [59]. The GSa in the GOS data sets was excluded from analysis because it was a suspected contaminant in samples from that site [35]. Comparison of the functional annotation of the metagenomics from the GOS was performed using the tools provided in MG-RAST with a maximal e-value of 1e-5 and a minimal alignment length .100. DNA sequences matching ''Ton and Tol transport systems'' and ''ABC transporter''were exported. The 'Ton and Tol transport systems' sequences were translated to create a 'GOS Ton and Tol protein', using BLASTX [36] against the GOS AA sequence database (E-value threshold 1e-10). Sequence searches for transporters in the bacterial genome were carried out using BLASTP [36] (E-value threshold 1e-3) and a custom-made database comprising the members of various transporter families based on the Transporter Classification system [60][61][62]. NCBI annotations of the resulting proteins were scanned manually to remove non-transporter proteins.

Identification of TBDTs
NCBI-NR protein sequence searches for TBDTs were carried out using HMMER hmmsearch [63] with Pfam hidden Markov models (PF00715 for the N-terminal plug domain and PF00593 for the C-terminal membrane-spanning B-barrel domain) and the NR database. Both matched HMM profile sequences were additionally verified by manual inspection and kept for further analysis. Furthermore, NCBI annotations of the resulting proteins were scanned manually to remove sequences that were labeled as spurious. A BLASTP search of the ''GOS Ton and Tol protein database'' homolog was conducted against a database containing all TBDT proteins at NCBI with a conservative E-value cutoff of 1e-5. All recruited sequences were subjected to verification using reciprocal BLASTP. The 15,905 and 44,073 homologs of TBDT were identified separately in NCBI-NR protein and GOS metagenomics. The genomic organization of the bacterial TBDT genes was visualized on a linear genome map using the Genome2D program [64].

TBDT Clustering
TBDTs with known substrates were combined with the predicted TBDT homologs. All vs all BLASTP searches were performed for the TBDTs and a Lek clustering algorithm [37], [38] was applied to cluster proteins. An E-value cutoff of 1e-40 for the BLASTP results and a Lek similarity cutoff of 0.6 were used to build gene family clusters. The retrieved predicted TBDT sequences from GOS and from NR were clustered into 783 clusters containing more than two members.

Hit Normalization
To reconcile the effects of gene size on hit retrieval, the number of hits (Ng) at all GOS sampling sites were size-normalized to the relative average length of the gene (Lx) compared to the length of recA from E. coli K12 (LrecA: 1,062 bp) using the equation Nn = LrecA/Lx 6Ng, where Nn represents normalized hits and Lx is the average length of TBDT genes each cluster or the average length of all genes in a ABC transporter system (data is available at http://www.tcdb.org/ [62]). To remove the bias of average genome size on the sampling of gene from a given metagenomic community, the number of recA-normalized hits for six single-copy genes (recA, atpD, gyrB, dnaK, rpoB and tufA) was averaged per site as described previously [65]. The frequency of TBDT andABC transporter genes relative to the number of single-copy gene hits for each site was then calculated as: number of size-normalized gene hits/average number of size-normalized six single-copy gene hits.

Bacterial Culture
Citromicrobium bathyomarinum JL354 and Citromicrobium sp. JLT1363 were maintained and precultured aerobically at 28uC with RO medium as previously [66], and stirred at a rate of 160 rpm in the dark. To establish the starvation-induced stationary phase, the feed of medium to the chemostat was switched off at the end of half-maxima optical density (1/2 ODmax) and subsequently changed (inoculum size: 2% v/v) to a nutrient mix with trace elements (1 mL per L medium) and glucose (final concentration 0.5%). The trace element stock solution contained per L: 3.15 g

Protein Extraction and OFFGEL Digestion
Whole cell lysates were prepared as described previously [67]. Cells were washed with 10 mM Tris-HCl, pH 8.0 and lysed in SDT-lysis buffer using a 1:10 sample to buffer ratio for 5 min at 95uC. Brief sonication was performed to reduce the viscosity of the lysate, which was centrifuged for 5 min at 16,0006g to remove debris. OFFGEL digest was processed with Endoproteinase Lys-C (Roche, Indianapolis, IN, USA) and Trypsin (Promega, Madson, USA). Filter Aided Sample Preparation was used when in-solution digest was carried out as described previously [67] to desalt large amounts of peptide mixtures for OFFGEL separation.

Protein Characterized by LC-MS/MS
A Finnigan TM LTQ TM linear ion trap MS (Thermo Electron) equipped with an electrospray interface was connected to the LC setup for eluted peptide detection. Data-dependent MS/MS spectra were obtained simultaneously. Each scan cycle consisted of one full MS scan in profile mode followed by five MS/MS scans in centroid mode with the following Dynamic Exclusion TM settings: repeat count 2, repeat duration 30 s, exclusion duration 90 s. Each sample was analyzed in triplicate.
MS/MS spectra were automatically searched against the NR protein database using the BioworksBrowser rev. 3.1(Thermo Electron, San Jose, CA). Protein identification results were extracted from SEQUEST out files with BuildSummary [68]. The peptides were constrained to be tryptic and up to two missed cleavages were allowed.
Carbamidomethylation of cysteines was treated as a fixed modification, whereas oxidation of methionine residues was considered as a variable modification. The mass tolerance allowed for the precursor ions was 2.0 Da and for fragment ions was 0.8 Da. The protein identification criteria were based on Delta CN ($0.1) and cross-correlation scores (Xcorr, one charge $1.9, two charges $2.2, three charges $3.75. Gene ontology information was assigned for the identified proteins using InterProScan searches implemented in Blast2GO [69].