Functional Environmental Screening of a Metagenomic Library Identifies stlA; A Unique Salt Tolerance Locus from the Human Gut Microbiome

Functional environmental screening of metagenomic libraries is a powerful means to identify and assign function to novel genes and their encoded proteins without any prior sequence knowledge. In the current study we describe the identification and subsequent analysis of a salt-tolerant clone from a human gut metagenomic library. Following transposon mutagenesis we identified an unknown gene (stlA, for “salt tolerance locus A”) with no current known homologues in the databases. Subsequent cloning and expression in Escherichia coli MKH13 revealed that stlA confers a salt tolerance phenotype in its surrogate host. Furthermore, a detailed in silico analysis was also conducted to gain additional information on the properties of the encoded StlA protein. The stlA gene is rare when searched against human metagenome datasets such as MetaHit and the Human Microbiome Project and represents a novel and unique salt tolerance determinant which appears to be found exclusively in the human gut environment.


Introduction
The human gastrointestinal (GI) tract is home to hundreds of bacterial species [1] which play an important and complex role in host health, metabolism and physiology [2].This relatively diverse community is dominated by two bacterial phyla; the Bacteroidetes and Firmicutes, with most of the remaining microbes represented by members of the Actinobacteria, Proteobacteria, Verrucomicrobia, and Fusobacteria [3].A significant proportion (estimates range from approximately 50-80%) of this bacterial community has thus far proved recalcitrant to traditional laboratory culture [3,4], although that number is constantly decreasing [5][6][7].The emergence of culture-independent techniques such as metagenomics in the past 10-15 years has enabled researchers to study these "unculturable organisms" (although "as-yet uncultured" would be more accurate) through direct sequencing of metagenomic DNA or through cloning and functional expression in a heterologous host -an approach referred to as functional genomics [8,9].
The human GI tract imposes numerous stresses on its resident and transient microbiota [10].The ability to adapt to and resist conditions such as low pH, bile acids, elevated osmolarity, nutrient limitation, host immune factors and competing microorganisms is a determining factor in niche colonisation and proliferation [11].Our research focuses specifically on the osmotic stress response.Bacteria generally elicit a phased response when challenged in such a manner, firstly by the rapid accumulation of potassium (K + ) ions (primary response), followed by the synthesis or accumulation of osmoprotectant compounds (secondary response) [12][13][14][15].A third mechanism is also employed which can involve a broad range of genes that are seemingly unrelated to the primary and secondary responses [16][17][18][19].These atypical, ancillary systems are arguably more interesting and provide a more complete view of the cellular response to osmotic stress in different bacteria and may also identify specific strategies employed by specific bacteria in distinct environments.
Our aim was to identify novel genes encoding proteins that could confer a salt tolerance phenotype.It is hoped that the identification of atypical genes, which have not previously been linked to salt tolerance will help to broaden our understanding and possibly lead to the identification of novel and unusual systems that play as yet undefined roles in salt tolerance.While sequenced-based metagenomics can define the abundance and diversity of different bacteria within a given microbiome, it cannot enable researchers to assign novel functions to new or known genes.This task can only be achieved through functional screening of metagenomic libraries using activity-based assays.Approximately 30-40% of genes in a given genome will be annotated as hypothetical, conserved hypothetical or function unknown [20], while ~75% of functions important for life in the gut consist of uncharacterized orthologous groups and/or completely novel gene families [1], emphasising the significant degree of novelty that exists in these (meta)genomes.
A previous study from our group identified five genes (which were previously annotated) from the human gut microbiome, to which a novel function of salt tolerance could be assigned [21].
In the current study, we describe the identification of a gene with no currently known homologues.Bioinformatic analysis suggests that the gene encodes a putative membrane protein, while transposon mutagenesis and subsequent cloning and heterologous expression of the gene revealed that it conferred a salt tolerance phenotype in Escherichia coli.This study illustrates the power of functional environmental screening of metagenomic libraries as means to identify and assign a function to as yet unknown genes and their encoded proteins.

Bacterial strains and growth conditions
Bacterial strains and plasmids used in this study are listed in Table S1, while primers (Eurofins, MWG Operon, Germany) used are listed in Table S2.E. coli EPI300::pCC1FOS (Epicentre Biotechnologies, Madison, WI, USA) was grown in Luria-Bertani (LB) medium containing 12.5μg/ml chloramphenicol (Cm) and in 12.5μg/ml chloramphenicol plus 50μg/ml kanamycin (Kan) following EZ-Tn5 transposon mutagenesis reactions E. coli MKH13 [22] and Lactococcus lactis MG1363 [23] were grown in LB and M17 (+0.5% glucose; GM17 media) media respectively.Media was supplemented with 20µg/ml Cm for strains transformed with the plasmid pCI372 [24].Media was supplemented with 1.5% (w/v) agar when required.Overnight cultures of E. coli were grown at 37°C with shaking, while L. lactis cultures were grown statically at 30°C.

Construction and screening of metagenomic library
A previously constructed fosmid clone library [25,26], created from metagenomic DNA isolated from a faecal sample from a healthy 26 year old Caucasian male was used to screen for salt-tolerant clones.The library was screened as outlined previously [21].Briefly, a total of 23,040 clones from the library were screened on LB agar supplemented with 6.5% (w/v) NaCl and 12.5μg/ml chloramphenicol using a Genetix QPix 2 XT™ colony picking/gridding robotics platform.Plates were incubated at 37°C for 2-3 days and checked periodically for growth of likely salt-tolerant clones.

Transposon mutagenesis
Transposon mutagenesis was carried out in accordance with the manufacturer's instructions, using the EZ-Tn5 <oriV/ KAN-2> in vitro transposition kit (Epicentre Biotechnologies).E. coli EPI300 cells were transformed with the transposon reaction mixture and selected on plates containing Cm and Kan (12.5 and 50µg/ml, respectively).The transposon insertion clones were subsequently replica plated onto LB with and without 6.5% added NaCl.Clones which grew on LB but not on LB + 6.5% NaCl suggested a likely insertion event in a gene involved in salt tolerance.Presumptive salt-tolerant knock-outs were grown overnight and a fosmid DNA extraction was performed.The extracted fosmid containing metagenomic DNA was subjected to sequencing from the ends of the transposon using the primers EZ-Tn FP-1 and EZ-Tn RP-1 (Table S2.).All sequencing was performed by GATC Biotech (Konstanz, Germany).

DNA manipulations
Induction of fosmids from low to high copy number for downstream applications such as transposon mutagenesis and sequencing was performed as per manufacturer's instructions and as described previously [21].The Qiagen QIAprep® Spin mini-prep kit was used to extract fosmids as per manufacturer's instructions.PCR products were purified with a Qiagen PCR purification kit and digested with restriction enzymes XbaI and PstI (Roche Applied Science), followed by ligation using the Fast-Link DNA ligase kit (Epicentre Biotechnologies) to similarly digested plasmid pCI372.Electro-competent E. coli MKH13 and L. lactis MG1363 were transformed with the ligation mixture and plated on LB and GM17 agar respectively, containing 20µg/ml Cm for selection.Colony PCR was performed on resistant transformants using a primer on the stlA gene (stlA FP) and a primer on the plasmid (pCI372 RP) to confirm the presence and size of the insert.
Detection of the stlA gene in metagenomic DNA isolated from human stool samples was attempted using PCR.Twenty samples from the ELDERMET study [27]; which consisted of five community (healthy), five long stay (frail) old subjects and five healthy young and five young subjects with irritable bowel syndrome (IBS) were used as template DNA.Furthermore, five samples from healthy adults from a separate study [28] were also tested using the following primer pairs: stlA FP and stlA RP, stlA-J FP and stlA-J RP, stlA-OUT FP and stlA-OUT RP, stlA-IN FP and stlA-IN RP (see Table S2).

Growth experiments
Cultures were grown overnight in appropriate media.Cells were subsequently harvested, washed in one quarter strength sterile Ringer's solution and re-suspended in fresh broth.A 2% inoculum was sub-cultured in fresh broth containing the appropriate stress (i.e. sodium chloride (NaCl), potassium chloride (KCl), sucrose, glycerol, low pH or bile as required) and 200µl was transferred to individual wells of a sterile 96-well micro-titre plate (Sarstedt Inc.Newton, USA).Plates were incubated at 37°C (or 30° for L. lactis strains) for 24-48 hours in an automated spectrophotometer (Tecan Genios) which recorded the optical density at 595 nanometres (OD 595nm ) every hour.For experiments using bile, uninoculated media containing bile were dispensed as blanks in the 96-well plate and their OD 595nm values were subtracted from the corresponding inoculated wells to give the OD 595nm for the microbial fraction.The data was subsequently retrieved and analysed using the Magellan 3 software program.Representative graphs were created using the Sigma Plot 10.0 software programme (Systat Software Inc, London, UK).Results are presented as the average of triplicate experiments, with error bars being representative of the standard error of the mean (SEM).

BIOLOG Phenotype Microarray (PM) Assay
The phenotype microarray (PM) osmolytes microplate (PM9) was used to compare the cellular phenotypes [29] of E. coli MKH13::pCI372 and MKH13::pCI372-stlA under 96 different conditions.The BIOLOG PM protocol for E. coli and other Gram-negative bacteria was followed for preparation of the different inoculating fluid (IF) solutions (IF-0 and IF-10; supplied by BIOLOG) and inoculation of the PM plates.Briefly, isolated colonies were added to IF-0 fluid until a cell suspension of 42% T (transmittance) was achieved.This was subsequently diluted in IF-0 + dye mix A to achieve 85% T. Finally, this was diluted in IF-10 + dye mix A and 100ul was inoculated to each well of the PM 9 microplates.Plates were incubated at 37°C and readings were taken over a 24 hour period using an automated plate reader (BIOTEK Synergy 2) which measured the absorbance at 590nm.

Sequencing and bioinformatic analysis
The fosmid insert from clone SMG 25 was fully sequenced and assembled by GATC Biotech (Konstanz, Germany) using the GS FLX (Roche) pyrosequencing platform on a titanium mini-run.Putative open reading frames were predicted using Softberry FGENESB bacterial operon and gene prediction software (available at www.softberry.com).Retrieved nucleotide and translated amino acid sequences were functionally annotated by homology searches using the Basic Local Alignment and Search Tool (BLAST) using a maximum e-value cut-off of 1e -03 , to identify homologous sequences from the National Centre for Biotechnology Information (NCBI) website: http://www.ncbi.nlm.nih.gov/blast/Blast.cgi.A list of proteins encoded on SMG 25 fosmid insert is presented in Table 1.
The Fold and Functional Assignment System (FFAS03) is a profile-profile and fold recognition algorithm that can detect remote homology between proteins [42].FFAS03 searches numerous databases including non-redundant NCBI protein sequence database (NCBI nr), Global Ocean Sampling (GOS) from JCVI (J.Craig Venter Institute), PDB (Protein Data Bank), SCOP (Structural Classification of Proteins), and COG (Clusters of Orthologous Groups), as well as numerous metagenome datasets (microbial metagenome samples from the Joint Genome Institute, human gut metagenome samples from the Hattori lab, human oral microbiome database from the Forsyth institute and GOS data from JCVI and CAMERA).Furthermore, FFAS03 searches against the MetaHit (Metagenomics of the Human Intestinal Tract) dataset [1], which contains over 3 million unique gene sequences from the human gut microbiome.The StlA protein sequence was submitted to the server to identify proteins with distant homology based on FFAS profiling or homologues by BLAST and PSI-BLAST (Position-Specific Iterated BLAST) against the databases and metagenome datasets.The FFAS03 server can be found at: http://ffas.burnham.org/ffas-cgi/cgi/document.pl.

Taxonomic assignment of scaffolds
The scaffolds on which an stlA homologue was identified were subjected to BLASTX analysis (maximum e-value cut-off of 1e -50 ).The BLASTX results were downloaded and imported to MEGAN 4 (Metagenome Analyser 4) software program [47] for taxonomic assignment.

Screening the metagenomic library
Screening approximately 23,000 clones from a human gut metagenomic library led to the identification of 53 clones which  [48,49], is a mucin degrading member of the Phylum Verrucomicrobia which is commonly found in the human gut microbiome [50].Growth was monitored spectrophotometrically, by measuring the optical density at 595 nm (OD 595nm ).SMG 25 was shown to have a significant (unpaired student t-test, P <0.0001) growth advantage in the presence of NaCl compared to the EPI300 host strain carrying an empty fosmid vector (pCC1FOS) (Figure 1 A).

Fosmid sequencing and analysis
The fosmid insert from SMG 25 was fully sequenced and assembled by GATC Biotech (Konstanz, Germany) and was predicted to contain 45 putative open reading frames that encode proteins (see Table 1).Translated nucleotide sequences were subjected to BLASTP analysis to identify homologous sequences in the database.Twenty-six of the genes encoded proteins corresponding to different species of Akkermansia (ranging from 34-98% amino acid identity), but a sizeable proportion of the encoded proteins, approximately 27% (12/45) (highlighted in bold in Table 1) had no significant similarity to sequences in the database, indicating the presence of novel sequences.Overall, only 13 proteins could be assigned a putative function based on BLASTP searches, whilst the remaining genes encoded hypothetical or uncharacterised proteins.The full fosmid insert sequence of SMG 25 can be found in GenBank (accession number=JQ269600.1;gi=375342965).The G+C (guanine and cytosine) skew of the entire fosmid insert as well a picture of the G+C content of each individual gene are presented in Figure 2 B and C respectively.

Transposon (EZ-Tn5) mutagenesis and cloning of the stlA gene
Transposon mutagenesis was performed on clone SMG 25 and a transposon insertion in gene 6 was identified which eliminated the growth advantage under osmotic stress; this locus (designated stlA) is predicted to encode a protein of 257 amino acids which, at the time of writing, has no homologues in the database.The transposon insertion was found to be between amino acid position 136 (alanine) and 137 (glutamine) of the protein.The stlA gene was cloned, along with some flanking DNA that was predicted to contain the native promoter region (predicted with BProm program; see Table 2 for details), into the shuttle plasmid pCI372 and transformed into E. coli MKH13 and L. lactis MG1363.

Growth experiments and BIOLOG phenotypic microarray
E. coli MKH13 cells transformed with a plasmid bearing a copy of the stlA gene were grown in LB broth containing various concentrations of NaCl (from 0-5% w/v added NaCl).It was observed that a statistically significant (unpaired student ttest) growth advantage was conferred upon the stlA + cells compared to wild-type MKH13 carrying an empty plasmid in LB broth supplemented with both 3% (P =0.0019) and 4% NaCl (P <0.0001) (Figure 1 B and C, respectively), while growth was similar in LB alone (data not shown).
Due to the uncharacterised and non-homologous nature of the stlA gene and its encoded protein, growth of wild-type MKH13 and stlA + strains was compared under 96 different conditions using BIOLOGs phenotypic microarray (PM) technology [29] to identify possible further phenotypic changes.Strains were tested on BIOLOG plate PM9, which contains different osmotic stress conditions and osmolytes for analysis.In addition to NaCl, the results indicated stlA + had an increased growth phenotype in the presence of potassium chloride (KCl).Confirmatory growth curves were performed in LB broth supplemented with a concentration of 4% KCl.A statistically significant (unpaired student t-test) growth advantage was observed for stlA + in LB supplemented with 4% KCl (P <0.0001) compared to wild-type MKH13 (Figure 1 D).Growth of both strains was also assessed under conditions of non-ionic osmotic stress (in the form of glycerol and sucrose), low pH and in both porcine and human bile, as all three stress conditions are commonly encountered in the GI tract.Growth of both strains was inhibited at pH 2.5 and pH 3.5, while no significant difference in growth was observed at pH 4.5 or pH 5.5, in the presence of sucrose or glycerol, or in the presence of either porcine or human bile (Figure S1).
The stlA genes' ability to increase salt tolerance was also tested in a Gram positive host; L. lactis MG1363.There was no observable increase in salt tolerance in L. lactis MG1363 carrying a plasmid encoded copy of stlA compared to L. lactis carrying an empty copy of the plasmid, while a similar growth rate and final OD value was observed for both strains in GM17 broth alone (Figure S2 A and B)

Bioinformatic analysis of StlA
The databases and tools used to identify features of StlA are presented in Table 2 below, along with the results of the analyses.An illustration of the stlA gene and its associated features is presented in Figure 2 D.

IMG/M-HMP analysis
The StlA protein sequence was screened against all available metagenomes from the human microbiome project (HMP) using BLASTP on the IMG/M-HMP website (http:// img.jgi.doe.gov/cgi-bin/imgm_hmp/main.cgi)[45], which were sampled from 17 body sites giving a total of 748 samples.In addition, all available finished, permanent draft and draft genome sequences for Bacteria, Archaea, Eukarya and viruses/phages, as well as all available sequenced plasmids were searched using BLASTP for homologous sequences to StlA.There was no significant similarity for the StlA protein to any of the bacterial, archaeal, eukaryotic, viral or plasmid genomes/sequences, nor to any non-human associated metagenomes (over 1,300 samples from more than 200 metagenomes, see File S1).The only similarity to StlA among the sampled microbiomes was to the stool microbiome samples, where 10 similar proteins from 8 different subjects (out of 100) (Table 3) were identified on different scaffolds.The date of the last search was on 19/09/13.The taxonomic assignment of the scaffolds can be seen in Figure S3.The gene neighbourhoods around the genes homologous to stlA on each scaffold were investigated in an attempt to gain information on possible functions and conserved gene arrangements (Figure S4).The genes most commonly flanking the stlA homologues were on the same strand of DNA and encoded an ankyrin repeat protein (COG0666), a DnaJ class molecular chaperone with C-terminal zinc-finger domain (COG0484) and a predicted membrane protein (COG2314; Pfam05154 -TM2 domain).There are also a number of hypothetical proteins, for which no additional information is currently known.On two of the larger scaffolds, genes for a restriction modification system are present, as well as an integrase/site specific recombinase protein (COG4974; Pfam00589), indicating some of this region may have been acquired by lateral gene transfer (LGT) and may represent prophage DNA.A phage-associated protein is predicted to be encoded by gene 20 (designated "P" in Figure 2 C) indicating the presence of a prophage on SMG 25 also.The fosmid insert of SMG 25 was analysed with PhiSpy [46] to identify possible prophage genes and the boundaries of the prophage region.PhiSpy predicted the prophage region to run from the start of gene 3 (nucleotide position 2024) to the end of gene 42 (nucleotide position 39972).

FFAS03 analysis
The FFAS03 server [42] was used to detect distant homology and fold recognition to StlA.FFAS analysis was also carried out on the translated protein sequences of the neighbouring genes to stlA on SMG 25, which also lacked any homologues in the databases (i.e. gene 3, 4, 5 and 7; gene 6 is stlA).The results of FFAS03 analysis are summarised in Figure 2 A). .
In addition to a profile-profile and a fold and functional assignment, the FFAS03 server also carries out a BLAST and PSI-BLAST search of the user sequence against numerous databases and metagenome datasets.The StlA protein was found to share significant similarity to a protein from two individuals from the MetaHit dataset [1].These sequences corresponded to samples MH0011 (a healthy Danish female) and V1.CD-14 (a Spanish female with Crohn's disease) which shared 60% identity (over 210 amino acids) and 82% identity (over 224 amino acids) respectively to StlA.
Detection of stlA in metagenomic DNA from human stool samples using PCR The primer pair (stlA FP and stlA RP) initially used to amplify the stlA gene for cloning was unable to amplify PCR products in any of the metagenomic DNA samples (isolated from human stool microbiota), so a set of primers (stlA-J FP and stlA-J RP) were designed to amplify an internal fragment of the gene.This set of primers amplified numerous products of the correct size but these were found to be false positives following sequencing.An alignment was generated for StlA and homologous sequences from the stool microbiome from the HMP and MetaHit datasets to identify the most highly conserved regions (Figure S5) and different primer pairs were designed (stlA-OUT FP and RP; stlA-IN FP and RP).Two of the 25 metagenomic DNA samples tested (isolated from human faecal samples from ELDERMET [27] and another study [28]) generated PCR products of the correct size, which were confirmed to be stlA homologues following sequencing by using the stlA-IN FP and RP primer pair.One positive PCR product shared 72% nucleotide identity over approximately 300 base pairs (BLASTN versus stlA gene) and 64% identity (over 100 amino acids) using BLASTX, while one ELDERMET [27] sample was positive (community care/ healthy old) and confirmed by sequencing (87% identity over 339 nucleotides and 85% identity over 112 amino acids).

Discussion
Functional screening of metagenomic libraries has the power to reveal novel functions for known genes or to identify completely novel genes and proteins.In the present study we describe the identification of an unknown protein (annotated StlA) from the human gut microbiome, which lacks any current homologues in the databases.The encoding gene (stlA), when expressed in E. coli, conferred a salt tolerance phenotype and may represent a novel stress resistance gene found exclusively among the human gut microbiota.This builds on previous work by our group, where we identified a novel function (i.e.increased salt tolerance) for five previously annotated genes (galE, mazG and murB) when expressed in E. coli [21].
Sequencing of the full fosmid insert from SMG 25 revealed an interesting gene landscape (Table 1), with approximately 58% of the predicted genes encoding proteins which shared highest genetic identity to different species of Akkermansia and 27% having no homologues in the databases.The Akkermansia-associated proteins and the "unknown" proteins are interspersed with proteins associated with different phyla such as Bacteroidetes/Chlorobi group, Synergistetes, Proteobacteria, Chlamydiae/ Verrucomicrobia group and Firmicutes, as well as Archaea.The percentage identity at the amino acid level ranges from 36-69%, revealing a diverse range of proteins encoded within approximately 44kb of fosmid insert DNA (Table 1).
The G+C content of the entire fosmid insert is 52.97%, which is close to the average G+C content (55.8%) of the A. muciniphila genome [49].The region from position 2024 (gene 3) to position 20148 (gene 26), which mainly consists of unknown genes or non-Akkermansia-associated genes has a lower G+C content of 47.97%.The region of the fosmid containing mainly Akkermansia-associated genes (from gene 27 at position 20120 to the end of the fosmid) has a G+C content of 56.73%, in line with the A. muciniphila genome (55.8%).A putative prophage region was predicted (using PhiSpy) to be present on SMG 25, running from gene 3 to 42 inclusive.It is difficult to say how reliable this prediction is because the criteria used by PhiSpy to predict prophage genes are strongly assisted by the degree of relatedness of the PhiSPy training genome sets and the genome/ DNA of the query organism [46].Unfortunately PhiSPy does not contain an Akkermansia or Verrucomicrobial training genome, which would increase the predictive value of the result.However, by looking at the G+C skew of SMG 25 and the G+C content of each individual gene on SMG 25 (Figure 2 B and C, respectively), it seems the prophage could indeed begin at gene 3, but it is possible that it ends somewhere between gene 23 and 26, as there is a clear difference in G+C content visible between this region and from gene 27 to 45 at the 3'-end of the fosmid (Figure 2, B and C).Taken together these data suggest that much of this region may have been acquired through LGT.
StlA is predicted to be a 257 amino acid, 28.62kDa membrane protein with four transmembrane regions.No conserved domains or motifs were detected, indicating the novelty of the protein.A signal peptide and a C-terminal outer membrane insertion signal are predicted to be present, suggesting that StlA may be exported to and inserted in the outer membrane.Furthermore, StlA possesses C-terminal phenylalanine residues, which are characteristic and highly conserved in outer membrane proteins [51].A detailed illustration of these features is presented in Figure 2 D, along with putative promoter and transcription binding sites.The outer membrane itself is an important mediator to external stresses, serving as a permeability barrier and protecting the cell from compounds in the environment, while outer membrane proteins, specifically porins, play an significant role in the cellular responses to salt and osmotic stress [15,52,53].It is noteworthy, given the likely location in the outer membrane, that StlA did not confer a salt tolerance phenotype on a Gram-positive host (L.lactis) (Figure S2).
Predictive 3D modelling was carried out with SWISS MODEL [30] and iTasser [35,40].However, the results were not statistically significant, most likely due to the lack of any suitable template structure in the databases to build an appropriate model.Ab initio structure prediction was attempted using QUARK [39] as no template information is required and is thus suitable for proteins with no homologues.Again the results were not significant, but this is most likely due to the inherent difficulty and current limited ability of ab initio prediction.Successful cases of ab initio prediction have been limited to proteins of 100 residues or less and the fact remains that there are really no methods to predictively fold proteins of >200 amino acids without template modelling at present [39].
As no sequence-based homology for StlA could be determined with BLAST analysis, a more sensitive profileprofile comparison with FFAS03 [42] was used to detect remote homology through fold and structure recognition, as proteins with a similar structure or fold can have a common function in the absence of any sequence similarity.The highest score for StlA corresponded to a hypothetical protein from C. crescentus, which has a TspO/MBR domain.Members of this group are involved in transmembrane signalling and are located in the outer membrane [54,55].They are associated with the major outer membrane porins (in prokaryotes) and with the voltage-dependent anion channel (in mitochondria), which links with the earlier observation that StlA may be inserted in the outer membrane.Such proteins have also been linked to desiccation stress in the bacterium Bradyrhizobium japonicum [56].
FFAS analysis of the encoded proteins in the gene neighbourhood of stlA on SMG 25 revealed some structural similarities to DnaJ and another type of molecular chaperone for the encoded proteins of genes 3, 4 and 7, while gene 5 encodes a protein with some structural similarity to a human voltage-gated calcium channel to which TspO has been linked, and it also shares a structural homology to Herpes virus latent membrane protein 1 (LMP 1).In addition to DnaJ, the predicted product of gene 7 also exhibited structural similarity to an antitermination protein from the Qin prophage.This could indicate some of this region was acquired via integration of a phage into the host chromosome.The novelty of the sequences may point to an uncharacterized phage.It is also noteworthy that gene 20 on SMG 25 is predicted to encode a phage-associated protein, while on two of the larger scaffolds from the stool microbiome samples; a gene encoding a phage integrase protein is present, revealing a commonality of such genes in this region.An elegant study by Wang and co-workers, has demonstrated prophage DNA plays a significant role in host resistance to numerous stresses, including osmotic stress [57].The phageassociated protein on SMG 25 shares 52% identity with a similar protein from Rhizobium lupini HPC(L) (100% coverage over 159 amino acids), Interestingly, this organism was recently sequenced following isolation from a saline desert soil [58].Rhizobium species belong to the phylum Proteobacteria and, based on taxonomic assignment with MEGAN 4, proteobacterial sequences were found on all the larger scaffolds with an stlA homologue (Figure S3.) and may indicate the origin of the phage.Furthermore, a number of genes on SMG 25 are predicted to encode proteins that share a high level of similarity to halophilic and halotolerant microorganisms (Table 1).For example, gene 8 and 9 are predicted to encode hypothetical proteins with similarity to Pontibacter sp.BAB1700 and a halophilic archaeon, respectively.Pontibacter species are halotolerant members of the phylum Bacteroidetes and have been isolated from saline and marine environments, while gene 26 is predicted to encode a protein with similarity to Halomonas, a genus of halophilic Proteobacteria with biotechnological and medical relevance [59][60][61].It seems possible the phage originated in a "salty" environment such as saline soil, a salt lake, a solar saltern or marine ecosystem.
When compared against all the available samples from the HMP, homologues of the stlA sequence were found to be present only in stool microbiome samples.Furthermore, no homologous sequences were found in any bacterial, archaeal, eukaryotic or viral genome sequences, or in any sequenced plasmids.This indicates that stlA gene is extremely rare in the sequences tested and may be a gut-specific gene and present only in species of low abundance, as no homologues were found in any of the common or dominant members of the human gut microbiome.In addition, we could only detect stlA homologues by PCR in two of 25 metagenomic DNA samples isolated from stool.Gene neighbourhood analysis around the stlA homologues revealed they were most often found in combination with genes encoding DnaJ-type molecular chaperones (COG0484), an ankyrin repeat protein (COG0666) or a predicted membrane protein containing a TM-2 domain (COG02314), which is also similar to the gene organisation on SMG 25.
DnaJ-domain proteins are molecular chaperones that aid protein folding, prevent aggregation and repair damaged proteins following cellular stress [62].They are members of the heat shock protein (Hsp) family, which have been shown to play important roles in the response to numerous stress conditions including osmotic stress and also can act as cochaperones by stimulating the activity of other chaperones such as DnaK [63][64][65].TM2 domain proteins are composed of a pair of alpha helices connected by a short linker.The function of this domain is unknown; however it occurs in a wide range of protein contexts.It occurs most often on its own or in tandem with another TM2 domain, but interestingly, the third most frequent association is with a DnaJ domain.
Ankyrin-repeat proteins are found across all three domains of life and modulate a number of diverse functions through protein-protein interactions [66].The repeat has been found in proteins of diverse function [67,68] and these proteins have also been linked to cellular stress responses, including osmotic stress [69][70][71].
With information gained from gene neighbourhood analysis and distant structural homology we can speculate as to the mechanisms of salt tolerance conferred by stlA.Overall, stlA and its neighbouring genes share common features that categorise them as stress responsive and may therefore constitute a stress operon.Three of the five encoded unknown proteins share a distant structural homology to chaperones.These chaperones could play a role in protein disaggregation and folding following stress as outlined above, or they could guide StlA through the periplasm and assist in inserting it in the membrane, although the latter situation would require E. coli chaperones to function in a similar capacity when stlA is cloned in isolation.StlA itself, being a predicted membrane protein could act as a sensor to external stresses or indeed stabilise the outer membrane during stress.It is noteworthy that the most significant homology predicted by FFAS for StlA was to a TspO/MBR protein which is involved in membrane signalling and is associated with voltage-dependent anion channels in mitochondria.
In conclusion, we have identified a novel salt tolerance gene, stlA, from the human gut microbiome through functional screening of a metagenomic library.The gene is rare among the HMP and MetaHit datasets and has no bacterial, archaeal, viral, plasmid or eukaryotic homologues in the current databases.Furthermore, no homologues were found in any non-human metagenome datasets nor in any of the human microbiome datasets (HMP and MetaHit) other than stool, indicating it is gut specific and present in a novel species of low abundance.The stlA gene appears to be on a prophage, indicating it may have been acquired (along with some of its neighbouring genes) through a LGT event and may confer a competitive advantage to its particular host species under stressful conditions in the gut or if there is an absence of or deficiency in some of the classical osmotolerance systems, such as in C. jejuni [72].
Overall this study illustrates the utility of functionally screening metagenomic libraries to assign a function to a completely novel gene and its encoded protein and suggests that novel mechanisms of osmotolerance may exist in different environmental niches.Mining (gut) microbiomes and the development of more sensitive and innovative screening assays will facilitate the discovery of novel stress resistance genes, antibiotics, biopharmaceuticals and biotherapeutics for use in biotechnology, medicine and health [73][74][75][76][77].

Figure 2 .
Figure 2. Bioinformatic analysis of SMG 25 fosmid insert.(A) FFAS03 analysis of the StlA protein and the encoded proteins of flanking genes was performed to identify putative distant structural homologues.A score of -9.50 or lower is considered significant.(B) Representation of the G+C skew of the entire fosmid insert of SMG 25. (C) Representation of the gene arrangement on SMG 25.Gene lengths are approximately to scale and colour coding represents G+C content of each individual gene which can be determined from the G+C content gradient bar.The presence of a phage-associated gene and clear separation in G+C content over the length of the fosmid insert indicates much of this region may have been acquired via lateral gene transfer (LGT).Phageassociated gene is marked "P", while the stlA gene is indicated with an asterisk (*) symbol.Genes are numbered as indicated in Table 1 and as mentioned in the text.Numbering of some shorter genes has been excluded for clarity.Selected nucleotide positions (in base pairs) are displayed in bold italic font above genes.(D) A detailed view is presented of the nucleotide and amino acid sequence of the stlA gene and StlA protein respectively.The putative start codon is in green, while a 250 base-pair region upstream of this is shown to include putative -35 and -10 promoter regions (underlined) and a predicted rpoD transcription factor binding site (in bold).Amino acids surrounded by grey box indicate the predicted signal sequence of StlA and those highlighted in blue represent four transmembrane regions.The location of the EZTn5 transposon insertion is indicated with a red triangle.doi: 10.1371/journal.pone.0082985.g002

Table 1 .
List of putative proteins encoded on SMG 25 fosmid insert.

Table 1 (continued).
Abbreviations and symbols: aa (amino acids); n/a (not applicable); %ID (% identity at amino acid level); DUF (Domain of Unknown Function); OM (outer membrane); Asterisk (*) indicates stlA gene product.Text in bold indicates that no homologues for these gene products were found following BLAST searches of NCBI database.doi: 10.1371/journal.pone.0082985.t001

Table 2 .
Bioinformatic analysis of StlA protein sequence.

Table 3 .
Gene, scaffold and subject information from which stlA homologues were found in Human Microbiome Project (HMP) dataset.
Information for stlA homologues found in HMP dataset, including scaffold and subject of origin.Symbols (* and ǂ) indicate detection of stlA homologue more than once from same subject.The StlA amino acid sequence was used to search against all 748 metagenome datasets from different body sites from the Human Microbiome Project (HMP) (1e-50 maximum e-value cut-off).Ten StlA homologues were identified in 8 different subjects.No StlA homologues were found in any other body site metagenome.doi: 10.1371/journal.pone.0082985.t003