Pediatric Fecal Microbiota Harbor Diverse and Novel Antibiotic Resistance Genes

Emerging antibiotic resistance threatens human health. Gut microbes are an epidemiologically important reservoir of resistance genes (resistome), yet prior studies indicate that the true diversity of gut-associated resistomes has been underestimated. To deeply characterize the pediatric gut-associated resistome, we created metagenomic recombinant libraries in an Escherichia coli host using fecal DNA from 22 healthy infants and children (most without recent antibiotic exposure), and performed functional selections for resistance to 18 antibiotics from eight drug classes. Resistance-conferring DNA fragments were sequenced (Illumina HiSeq 2000), and reads assembled and annotated with the PARFuMS computational pipeline. Resistance to 14 of the 18 antibiotics was found in stools of infants and children. Recovered genes included chloramphenicol acetyltransferases, drug-resistant dihydrofolate reductases, rRNA methyltransferases, transcriptional regulators, multidrug efflux pumps, and every major class of beta-lactamase, aminoglycoside-modifying enzyme, and tetracycline resistance protein. Many resistance-conferring sequences were mobilizable; some had low identity to any known organism, emphasizing cryptic organisms as potentially important resistance reservoirs. We functionally confirmed three novel resistance genes, including a 16S rRNA methylase conferring aminoglycoside resistance, and two tetracycline-resistance proteins nearly identical to a bifidobacterial MFS transporter (B. longum s. longum JDM301). We provide the first report to our knowledge of resistance to folate-synthesis inhibitors conferred by a predicted Nudix hydrolase (part of the folate synthesis pathway). This functional metagenomic survey of gut-associated resistomes, the largest of its kind to date, demonstrates that fecal resistomes of healthy children are far more diverse than previously suspected, that clinically relevant resistance genes are present even without recent selective antibiotic pressure in the human host, and that cryptic gut microbes are an important resistance reservoir. The observed transferability of gut-associated resistance genes to a gram-negative (E. coli) host also suggests that the potential for gut-associated resistomes to threaten human health by mediating antibiotic resistance in pathogens warrants further investigation.


I. MATERIALS AND METHODS A. Construction and Functional Screening of Metagenomic Libraries i. Selection of fecal samples
22 fecal samples were selected from an archive of 455 pediatric fecal samples collected for a previous study. To ensure that a broad distribution of ages was represented and that the first year of live was particularly well-represented, one to two samples were chosen randomly within defined age categories (e.g. 0-3mo, 4-5years, >14 years). The age distribution of the 22 sample donors is listed in Table S1.
Fecal DNA was extracted and metagenomic libraries were constructed and screened as previously described. (1,2) The experimental protocols are briefly described below.
ii. Metagenomic DNA extraction Metagenomic DNA was isolated from 50-100mg of fecal material from each donor using 250uL of 0·1mm zirconium beads (BioSpec Products, cat#1107910), 210uL of 20% sodium-dodecyl-sulfate (SDS), 500uL of a custom buffer (100mM NaCl, 100mM Tris, 10mM EDTA), and 500uL phenol:chloroform:isoamyl alcohol (24:24:1; pH 7·9). Fecal cells were then lysed in this suspension using a Mini-Beadbeater (BioSpec Products) on the "homogenize" setting for a total of four minutes (bead-beating for two minutes, samples cooled on ice for two minutes, bead-beating for two minutes). The samples were then centrifuged at 13,000rpm and the aqueous phase was added to a phase-lock gel tube (5Prime, cat#230280) along with another 600uL of phenol:chloroform:isoamyl alcohol (24:24:1; pH 7·9). The aqueous phase was placed in an Eppendorf tube and the DNA was recovered from the solution using isopropanol precipitation. A final purification step using the QIAquick® PCR Purification kit (Qiagen) was performed using the manufacturer's protocols. DNA was quantitated using the NanoVue TM Spectrophotometer (Fisher Scientific).

iii. pZE21 Expression Vector Preparation
The pZE21 expression vector(3) was linearized for library creation at the HincII cut site using PCR with Platinum® PFX DNA Polymerase (Invitrogen). The reaction mixture, which differed from the manufacturer's recommendations, is detailed below; primer sequences are listed in Table S2. The PCR product was subjected to gel electrophoresis (1% agarose, 0·5x Tris-Borate-EDTA (TBE), GelGreen TM (Biotium, Inc.)) and the 2kb product was excised from the gel. Gel purification was performed using the QIAquick® Gel Extraction kit (Qiagen) following their recommended protocols. The purified product was then treated with calf intestinal alkaline phosphatase (CIP, New England BioLabs Inc.). The reaction mixture, which differed from the manufacturer's standard protocol, is listed below. Blunt ligation of the metagenomic DNA and linearized pZE21 was performed with the FAST-LINK TM DNA Ligation kit (Epicentre®) following the manufacturer's recommended protocol for blunt ligation. A 1:5 molar ratio of pZE21 to metagenomic DNA was maintained in all reactions. Ligated plasmids were desalted using a nitrocellulose membrane (Millipore) and the entire reaction was added to 25uL of MegaXDH10B TM T1R TM Electrocompetent Cells (Invitrogen). Electroporation was performed using standard procedure for a 1mm cuvette, and the new metagenomic library was shaken at 37ºC for 1 hour before a 2uL aliquot was removed and used to plate titers on Luria-Bertani agar with 50ug/mL of kanamycin (the pZE21 expression vector has a kanamycin resistance cassette). The remainder of the library was transferred to 50mL of LB broth with 50ug/mL kanamycin (LB-Kan) and amplified at 30°C until the solution reached an optical density of 0·7-0·9 (typically 16-20 hours). The amplified library was gently centrifuged (3000rpm for 8 minutes), reconstituted in LB-Kan with 15% glycerol, and stored at -80ºC. Amplified library titers were determined by plating on LB-Kan agar prior to freezing.
Colony PCR was done using the titer plates obtained immediately after electroporation. 12 colonies were picked at random from the titer plates and the cells were lysed in 50uL nuclease-free water. For each colony, the following PCR reaction was performed using the Taq  The colony PCR product was visualized on a 1% agarose gel containing ethidium bromide. If the pZE21 had self-ligated with no insert, a 300bp band was seen; otherwise the bands seen reflected the size of the metagenomic fragment taken up by that clone + 300bp. The fraction of colony PCR reactions with bands >800bp in size seen on the gel, reflecting the approximate fraction of metagenomic fragments >500bp in size present in the library, was used to generate a correction factor used in the estimation of library size (e.g. if nine of the twelve clones had bands >800bp in size, the resultant correction factor would be 0·75). The size of each library, in gigabases (GB) was estimated using the following formula: Library Size (GB) = Total Clones * correction factor * approximate insert size (Total clones = cfu/mL of the library 1 hour after electroporation, as determined from the titer plates. Correction factor is determined by visualization of colony PCR product, and the average insert size is estimated at 2000bp.

v. Functional Screening on Antibiotic Media
Prior to plating on antibiotic-containing growth media, the concentration of the metagenomic libraries was adjusted so that each 100uL of plating solution contained 10x the number of total unique clones estimated in the library at 1 hour after electroporation (for a library with 1 x 10 6 clones following electroporation, the stock made for plating would contain 1 x 10 7 cfu/100uL, or 1 X 10 8 cfu/mL). If the concentration of the frozen library stock was higher than the desired concentration, the library was diluted with LB-Kan; if it was lower than the desired concentration, the cells were pelleted by gentle centrifugation at 3000rpm and reconstituted in an appropriate volume of LB-Kan. Libraries were plated on Mueller-Hinton agar with 50ug/mL of kanamycin (MH-Kan) and additional antibiotics at concentrations listed in Table S3. One plate was used for the selection of each library for resistance against a particular antibiotic. A control sample of the MegaX DH10B T1R strain with unmodified pZE21 was plated on each antibiotic at the time of screening to ensure the concentration of antibiotic used entirely inhibited the growth of clones with only pZE21. Plates were incubated for 24 hours at 37°C before counting the resistant colonies. Trimethoprim and Trimethoprim-sulfamethoxazole plates and controls were incubated for 48 hours because resistant clones were too small to count accurately at 24 hours. After the colonies were counted, 1.5mL of LB-Kan with 15% glycerol was added in two 750µl aliquots to the plates and the colonies were gently removed from the agar with a sterile L-shaped cell-spreader. The slurry of functionally selected clones was then removed from the surface of the plate by pipette aspiration and was stored at -80°C.

B. Sequencing and annotation
Functionally selected metagenomic fragments were barcoded for sequencing, then reassembled and annotated with the PARFuMS computational pipeline. The protocol is briefly outlined below.
i. Indexing samples for sequencing DNA was extracted from 300uL of functionally selected clones by pelleting the cells with high-speed centrifugation (13,000 rpm), removing the supernatant with a pipet, rinsing the pellet with 1mL of nuclease-free water, and then lysing the cells by resuspending them in 30uL of nuclease-free water and freezing at -20ºC. The cellular debris was then pelleted by high-speed centrifugation and the supernatant was used as a template for PCR amplification of functionally-selected metagenomic fragments. For each selection condition (one subject vs. one antibiotic comprises a single selection condition), the following PCR reaction was set up: PCR cycling conditions were identical to the colony PCR protocol described above. The PCR product was purified using the QIAquick® PCR Purification kit (Qiagen) using the manufacturer's protocols and quantitated with the NanoVue TM Spectrophotometer (Fisher Scientific). The DNA was then diluted to a concentration of 50ng/uL and 100uL of each sample was placed into a 600uL Eppendorf tube. These 100uL aliquots of DNA were sheared to ~160bp fragments using the Bioruptor® XL (Diagenode) on high power for a total of 45 minutes (90 cycles of 30 seconds on and 30 seconds off). The sheared DNA was then concentrated using the QIAquick® PCR Purification kit (Qiagen) with MinElute columns. After measuring the concentration with the NanoVue TM Spectrophotometer (Fisher Scientific), the sheared DNA was diluted to a concentration of 40ng/uL and the ends were blunted using the following reaction: After the end-repair reaction had been incubated at 25ºC for 30 minutes then at 75ºC for 20 minutes, 1uL of shrimp alkaline phosphatase 1U/uL (Promega) was added and the reaction incubated at 37ºC for 30 minutes then at 75ºC for 30 minutes to dephosphorylate any remaining dNTPs. Each blunted DNA fragment then had an adenosine added to the 3' end by adding 6·4uL of 1X T4 DNA ligase buffer with 10mM ATP (diluted from 10X stock; New England Biolabs, Inc.), 0.6uL of dATP 5mM (Promega), and 2uL of Klenow fragment (3'  5' exo-) 5U/uL (New England Biolabs, Inc.): 2uL and incubating at 37ºC for 30 minutes then at 75ºC for 20 minutes. In a separate tube, 1·4uL of a 25uM solution comprised of (i) a universal forward adapter and (ii) reverse adapters indexed with unique DNA tags specific to a given selection (adapters mixed in equal proportions) was added to 5uL 10X T4 DNA ligase buffer with 10mM ATP (New England Biolabs, Inc.) and 12uL of nuclease-free water and heated to 94ºC for 2 minutes and slowly cooled to room temperature to anneal the adapters. The annealed adapters were then mixed with 1uL of T4 DNA ligase 2,000U/uL (New England Biolabs, Inc.), and added to the end-repaired DNA solution (a unique indexed adapter was assigned to DNA fragments from each selection condition). The adapter ligation reaction was incubated for 60 minutes at 16ºC, then for 10 minutes at 65ºC, and then the indexed fragments were purified using the QIAquick® PCR Purification kit (Qiagen) with MinElute columns.
Size selection of the barcoded DNA fragments was performed by loading the samples pooled in groups of 4 on a gel (2% agarose, 1x TBE, GelGreen TM (Biotium, Inc.) and electrophoresing at 120V for 2 hours. The size range that corresponded to sheared metagenomic DNA fragments of length ~160bp in length was excised and the DNA was recovered using the QIAquick® Gel Extraction kit (Qiagen) with the MinElute columns. Size-selected indexed DNA fragments were PCR-enriched using the following reaction: The PCR product was then size-selected on a 2% agarose gel 1X TBE gel with GelGreen (Biotium) to remove primer-dimers. DNA was recovered from the gel slices using the QIAquick® PCR Purification kit (Qiagen) with the MinElute columns, and the concentration of the pooled, size-selected, barcoded DNA was measured using the Qubit® dsDNA HS Assay Kit (Invitrogen) and the Qubit® 2·0 Fluorometer (Invitrogen). 10 nanomoles of each PCR product were then aliquoted into a pooled sample that was submitted to the Genome Technology Access Center (GTAC, Washington University in St Louis) for Illumina Hi-Seq Paired-End 101bp sequencing.

ii. Overview of the Parallel Annotation and Reassembly of Functional Metagenomic Selections (PARFuMS) computational pipeline
To simultaneously assemble metagenomic fragments from multiple selections in parallel, we implemented an iterative assembly approach. Intermediate-length contigs were generated using multiple rounds of assembly with the short-read assemble Velvet,(4) and then further assembled into full-length contigs using the long-read assembler Phrap.(5) Assembly with Velvet cycles through three iterations: in the first iteration, assembly occurs using all reads while in the second and third, only reads that were not present in any previously assembled contig are used. Each round of assembly is divided into jobs of a defined number of reads, with each set of reads assembled in parallel, and all assembled contigs joined at the end of each round. As PARFuMS progresses through assemblies, the number of reads per job changes, decreasing in later assemblies as dataset complexity is reduced. This approach regulates the coverage across contigs, ensuring the number of reads with sequencing error in any given assembly remains low, avoiding the inappropriate division of contigs. Following each iteration of assembly, redundant contigs are collapsed to one sequence and chimeric assemblies split using a window-based coverage approach. This contig set represents the final Velvet-assembled contigs, which are then passed to Phrap for two iterations of further assembly. The first iteration assembles raw Velvet output into more complete contigs which are subsequently linked together using raw reads that bridge the ends of two sequences. The final assembly of Phrap uses the linked contigs as input, outputting contigs that are subsequently annotated via similarity to the COG functional database.(6) Finally, contigs are joined based on both sequence similarity and common annotation to generate the final sequence set, which is re-annotated through BlastX-similarity to the COG functional database. PARFuMS performance has been benchmarked on previously Sanger-sequenced human fecal metagenomic libraries,(2) and we observe not only recapitulation of the Sanger-sequenced results, but also importantly up to 20% more novel sequence recovery from libraries of similar size to ones screened in this paper. (7) iii. Determining the reduction in sequencing cost afforded by PARFuMS A single lane of sequence data from the Illumina Hi-Seq was estimated to cost $3250 (current price at GTAC). Assuming PARFuMS requires 700,000 reads per selection (in reality, PARFuMS requires fewer reads) and 140 million sequencing reads are generated per lane, this scheme can process 200 selections for $3250. To calculate how many selections could be processed for the same price using traditional Sangerbased sequencing methods, several assumptions regarding reagent-cost were made. First, it was assumed that six Sanger sequencing reactions would be needed to entirely sequence any given metagenomic insert from a resistant clone (four custom primers and two universal primers complementary to the ends of the expression vector). At an average cost of $3 per sequencing reaction (current price at Beckman Coulter) and $2 per custom primer (current price at Integrated DNA Technologies), sequencing the metagenomic insert from any given clone would cost $26. If the average selection yields 100 resistant clones (an underestimate compared to the mean colony counts observed in most antibiotic selections), sequencing a single experiment would cost $2600. Therefore, 1.25 selections could be sequenced for $3250, for which price 200 functional selections can be processed with PARFuMS, making the per-selection cost of PARFuMS <1% of the cost incurred using Sanger-based methods.

C. Data Analysis
The PARFuMS output was parsed as described below.

i. Categorization of metagenomic fragments
The PARFuMS output, which contains a set of functional annotations from the COG database for each ORF on every assembled contig, was queried for a set of keywords (Table S4) and each ORF was categorized as either a known antibiotic resistance gene, a gene encoding a transporter/efflux protein, a mobile genetic element, or "other" (the category "other" includes all annotations besides the three listed). Based on the assignment of the ORFs on a given contig, each contig was classified (Table S5) as either bearing a single resistance gene, bearing a multidrug resistance element (defined as either bearing two unique resistance genes or a resistance gene and a transporter, to best highlight contigs encoding multiple resistance mechanisms), bearing a resistance gene with a mobile element, bearing a transporter/efflux protein only, or unknown (not bearing any ORFs assigned as a resistance gene, transporter, or mobile element). The total number of contigs bearing a known resistance gene, sorted by antibiotic, and the associated proportions of contigs with a single resistance gene, contigs with multidrug resistance elements, and contigs with resistance genes associated with mobile elements are depicted in Figure S1.

ii. Creation of Beta-Lactamase and Aminoglycoside-modifying Enzyme Approximate Maximum-Likelihood Trees
Two datasets comprised of the predicted amino acid sequences associated with all genes identified that encoded (1) beta-lactamases or (2) aminoglycoside acetyltransferases and phosphotransferases were selected for analysis. Partially-assembled sequences <200aa in length were excluded from these datasets. Each dataset was BLASTed against itself and any duplicate genes (100% amino acid identity with equal sequence lengths) were collapsed to one sequence. All remaining sequences were classified (Ambler classification for beta-lactamases; acetyltransferase, phosphotransferases, or bifunctional for aminoglycoside-modifying enzymes) using the best hit to the NCBI nr database, or the Antibiotic Resistance Genes Database (ARDB). PSI-BLAST was also performed on all aminoglycoside-modifying enzymes and on all beta-lactamases with less than 90% identity to any known beta-lactamase, to confirm categorizations. Multiple alignments on the two datasets were done using Muscle v3.7.(8) Subsequently, FastTree v.2.1(9) was used to create an approximate maximum-likelihood tree. Default parameters were used for both Muscle v3.7 and FastTree v2.1. Regions of the trees (Figs. 3, 4) corresponding to Ambler classes (beta-lactamases) or predicted enzyme activity (aminoglycoside-modifying enzymes) were overlaid with color.

D. Experimental Validation of novel genes
Selected genes with putatively novel resistance functions were amplified from the metagenomic libraries and then subjected to phenotypic assays to validate their function and Sanger-sequenced to confirm their identities.

i. RBS-pZE21 Vector Preparation
The HincII cut site is not appropriately spaced relative to the pZE21 ribosomal binding site to allow for optimal transcription; accordingly RBS-pZE21, a linearized pZE21 derivative with optimized spacing between the insertion site and the ribosomal binding site was created for the purpose of cloning individual ORFs. RBS-pZE21 was created using a different set of primer pairs than the pZE21 used for library creation (primers 2A/2B, Table S2); preparation of RBS-pZE21 was otherwise exactly as described for the standard pZE21 used for library creation.

ii. Cloning, Phenotypic Validation, and Sequence Confirmation of Novel Genes
Four genes were selected for phenotypic validation: two MFS transporters found in tetracycline selections with >98% identity to a Bifidobacterium MFS transporter not known to confer tetracycline resistance, a 16S rRNA methylase with only 42% identity to any known rRNA methylase, and a NUDIX hydrolase, which acts early in the pterin branch of the folate synthesis pathway, in a trimethoprim-sulfamethoxazole selection. Functionally selected clones that had been scraped from each of the selection conditions (named 21TE, 30TE, 19GE, and 30TRSX, respectively) and frozen, were lysed as described in the section labeled "Indexing samples for sequencing" and the metagenomic fragments were PCR-amplified as follows. The PCR product was purified with the QIAquick® PCR Purification kit (Qiagen) and 1uL of the PCR product was used as template for another PCR reaction using primers specific to the full-length contigs bearing the four genes of interest. As shown in Table S3, primer pair 5A/5B was used to amplify the 19Gent contig from the 19GE PCR product; primer pair 6A/6B was used to amplify the 21MFSBif contig from the 21TE PCR product; primer pair 7A/7B was used to amplify the 30MFSBif contig from the 30TE selection; and primer pair 8A/8B was used to amplify the 30Nudix contig from the 30TRSX PCR product. Each of these PCR reactions was done using the same reaction conditions as described above. The PCR product was purified with the QIAquick® PCR Purification kit (Qiagen) and visualized on a 1% agarose gel stained with ethidium bromide (EB) to verify that the product was of the correct size. The product was then used as a PCR template for amplification of the ORFs. PCR conditions were the same as for the fulllength fragments, except the extension time was shortened to 2 minutes. As shown in Table S3, primer pair 9A/9B was used to amplify 19GentORF from the 19Gent contig; primer pair 10A/10B was used to amplify 21MFSORF from the 21MFSBif contig; primer pair 11A/11B was used to amplify 30MFSORF contig from the 30MFSBif contig; and primer pair 12A/12B was used to amplify 30NudixORF from the 30Nudix contig. The ORF PCR product was then purified with the QIAquick® PCR Purification kit (Qiagen) and visualized on a 1% agarose gel with ethidium bromide with EB. ORF PCR products verified to be the correct size were then ligated into RBS-pZE21 using the FAST-LINK TM DNA Ligation kit (Epicentre®) exactly as described for metagenomic library preparation. Dialysis of the ligation reactions and electroporation were also done precisely as described earlier. Following electroporation, the transformants were allowed to recover at 37ºC in a shaking warmer for 1 hour, then they were amplified overnight in 50mL Mueller-Hinton broth with 50ug/mL of kanamycin and the antibiotic appropriate to the predicted ORF function at a concentration listed in Table S3 (gentamicin for 19GentORF, tetracycline for 21MFSORF and 30MFSORF, and trimethoprim-sulfamethoxazole for 30NudixORF). A 100uL aliquot of each sample was plated on Mueller-Hinton agar with 50ug/mL of kanamycin and the same ORF-specific antibiotic that the sample had been amplified in overnight. The remainder of the amplified stock was pelleted and reconstituted in LB-Kan with 15% glycerol and stored at -80ºC. Controls (MegaX DH10B T1R with unmodified pZE21) were plated at the same time on separate plates with equivalent antibiotic concentrations. After 24 hours (48 for 30Nudix), several colonies were picked off of each plate and the metagenomic fragment was amplified using the colony PCR protocol described in the section on preparing metagenomic libraries. The PCR product was purified using the QIAquick® PCR Purification kit (Qiagen) and visualized on a 1% agarose EB gel to confirm that the fragment was the correct size. Fragments were then sent for Sanger sequencing using the colony PCR primers (primer pair 3A/3B) and the identities of the functionally validated clones were identified.