Long-read sequencing to interrogate strain-level variation among adherent-invasive Escherichia coli isolated from human intestinal tissue

Adherent-invasive Escherichia coli (AIEC) is a pathovar linked to inflammatory bowel diseases (IBD), especially Crohn’s disease, and colorectal cancer. AIEC are genetically diverse, and in the absence of a universal molecular signature, are defined by in vitro functional attributes. The relative ability of difference AIEC strains to colonize, persist, and induce inflammation in an IBD-susceptible host is unresolved. To evaluate strain-level variation among tissue-associated E. coli in the intestines, we develop a long-read sequencing approach to identify AIEC by strain that excludes host DNA. We use this approach to distinguish genetically similar strains and assess their fitness in colonizing the intestine. Here we have assembled complete genomes using long-read nanopore sequencing for a model AIEC strain, NC101, and seven strains isolated from the intestinal mucosa of Crohn’s disease and non-Crohn’s tissues. We show these strains can colonize the intestine of IBD susceptible mice and induce inflammatory cytokines from cultured macrophages. We demonstrate that these strains can be quantified and distinguished in the presence of 99.5% mammalian DNA and from within a fecal population. Analysis of global genomic structure and specific sequence variation within the ribosomal RNA operon provides a framework for efficiently tracking strain-level variation of closely-related E. coli and likely other commensal/pathogenic bacteria impacting intestinal inflammation in experimental settings and IBD patients.


Introduction
Crohn's disease (CD) is a type of Inflammatory bowel disease (IBD), a chronic inflammatory condition that is the result of an inappropriate immune response towards elements of the intestinal microbiota [1]. Due to chronic inflammation and exposure to pro-carcinogenic microbes, CD patients are at a greater risk of developing colorectal cancer (CRC) [2][3][4]. CD patients harbor microbiomes that differ from non-CD individuals in both composition and function, with reduced diversity and an increase of adherent/invasive E. coli (AIEC) [5][6][7][8][9][10][11]. AIEC are closely associated with the intestinal mucosa, termed the mucosal niche, and induce inflammation in murine models of IBD, including the well-established, inflammation-susceptibe Il10 -/mouse model [3,[12][13][14]. AIEC strains lack genes associated with virulence in diarrheagenic E. coli and are defined through the in vitro ability to adhere to and invade epithelial cells and survive in macrophages [5,7,15]. However, as AIEC are genetically diverse, there is no universal genomic feature that distinguishes the pathotype [16], limiting our ability to detect these pro-inflammatory strains from among patient microbiota.
Escherichia coli make up a diverse species that is known to variously have beneficial, commensal, or pathogenic impact in the mammalian gut. However, most commonly used methods for characterizing the composition of gut microbiota from fecal or tissue samples, notably sequencing of selected variable regions of the 16S rRNA operon, do not differentiate among these closely-related but functionally divergent types. Emerging single-molecule (third-generation/long-read) sequencing technologies including Pacific Biosciences ("PacBio") and Oxford Nanopore Technologies ("nanopore") produce reads dozens to hundreds of kilobases in length, often without prior amplification. Recent work [17][18][19][20] has demonstrated the utility of long reads for sequencing the entire 16S gene or entire rRNA operon, with corresponding increase in taxonomic resolution by capturing more variable sequence, including all nine variable regions of 16S, the internal transcribed spacers (ITS), and 23S. Traditional metataxonomic analysis using next-generation sequencing of one or two adjacent hypervariable regions of the 16S operon are unable to discriminate among closely-related species and genera [21], much less capture strain-level variation. We propose a full-length rRNA sequencing approachenabled by longer reads produced by nanopore sequencing-that is capable of discriminating among closely-related strains of E. coli. To support the extension of these approaches to characterize strain-level variation of tissue-associated (mucosa-associated) microbiota contributing to IBD and in models of experimental colitis in mice, with a particular focus on mechanistic studies of AIEC, we produced complete genome assemblies for eight AIEC and non-AIEC E. coli, describe the genomic variation among these strains, particularly within the rRNA operon, and demonstrate the accurate identification of these strains in mixed in vitro and in vivo microbiota.

DNA extraction and nanopore sequencing
We extracted ultra-high-molecular-weight genomic DNA from liquid E. coli cultures using a modified phenol:chloroform protocol [22]. Briefly, 1-1.5 ml of stationary-phase liquid culture was pelleted and resuspended in TE/NaCl/Triton buffer. Cells were lysed by addition of SDS to a final concentration of 2% and Proteinase K to a final concentration of 300 mg/ml and incubated at 55˚C for 30 minutes. DNA was purified twice by addition of 1x volume phenol: chloroform and phase separation. DNA was precipitated by addition of 0.1x volume sodium acetate and 2x volume isopropanol. Ultra-high-molecular weight DNA was "hooked" out with a melted glass capillary and washed in 70% ethanol, dried, and resuspended in nuclease-free water (NFW) by incubation overnight at 4˚C. Multiplexed sequencing libraries were prepared for nanopore sequencing using Oxford Nanopore's Ligation Sequencing Kit (LSK109) and Barcoding Expansion (NBD104) per the manufacturer's recommended protocol with the following modifications [23]. Instead of SPRI bead cleanup after each ligation reaction, we added 1 volume of "clumping buffer" (9% PEG 8000, 1 M NaCl, 10 mM Tris), incubated at room temperature for 30 mins, pelleted by centrifugation at 13K rpm, washed twice with 70% ethanol, then allowed to dry and resuspended in NFW by incubating at 4˚C overnight. This "beadfree" cleanup both avoids shearing and loss of high-molecular-weight DNA and selects for larger fragments in the PEG precipitations. Multiplexed samples were sequenced on R9.4.1 flow cells on either MinION or GridION with real-time basecalling using Guppy v3.2.2 in high-accuracy mode. The distribution of read lengths is shown in Fig 1.

Genome assembly
Basecalled reads for each sample were assembled using Miniasm v0.2 [24], followed by four rounds of polishing with Racon v1.3.1 [25], and Medaka v0.10.0 [26]. Each assembly consisted of one contig representing the full-length genome. Assembled genomes were normalized to start at the origin of replication and re-polished across the previous breakpoint. Assemblies were further polished with Illumina sequencing data using Pilon v1.22-1 [27]. Annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline [PGAP release 2020-02-06.build4373 ; 28]. Predicted genes were subsequently assigned putative function by aligning to the RefSeq nonredundant protein database with Diamond v0.9.22 [29] allowing for frame-shift sensitive alignment (diamond blastp-more-sensitive-frameshift 15). Sequence data and assembled genomes have been made publicly available under NCBI BioProject ID PRJNA759208.

Serotype, MLST, virulence, and antibiotic resistance genes
Polished assemblies were aligned against EcOH database [30] for serotyping, multi-locus sequence typing (MLST), and VirulenceFinder databases [31], and NCBI antimicrobial resistance database using Minimap2 [32] with parameter '-cx asm5' to allow for indel-sensitive alignment appropriate for hybrid-assembled genomes [33]. For serotype and MLST, the single best-aligning type gene (minimum edit distance) is given as the predicted type. Predicted serotypes, MLSTs, AMR, and virulence genes were further validated using corresponding webbased tools provided by the Center for Genomic Epidemiology ( [31], http://www. genomicepidemiology.org/services/), which showed complete agreement.

rRNA operon analysis
All copies of the rRNA operon in each assembly were identified by aligning to the E. coli K12 reference rRNA using Minimap2 [32]. Variants relative to the K12 reference were identified for each assembled rRNA copy across all assemblies based on the alignments. Total polymorphism distance was computed between every pair of alignments across each region of the rRNA operon (16S, ITS, 23S).

Bacterial growth conditions
Bacteria were grown in Luria broth (LB) at 37˚C and 250 rpm unless otherwise indicated.

Animal care
Germ-free mice were reared in the National Gnotobiotic Rodent Resource Center at UNC Chapel Hill. All animal experiments and procedures were approved by UNC's Institutional Animal Care and Use Committee (IACUC).

Murine stool samples
To generate an in vivo gut microbial community containing the seven E. coli strains of interest and endogenous mouse microbiota, we inoculated mice as follows. Three (2m/1f) interleukin-10-deficient (Il10 -/-) mice (129Sv/Ev background) were reared germ-free to adulthood (8-10 weeks) and colonzied by oral gavage with an even mixture of a total of 10 7 CFU clinical E. coli strains isolated from the intestinal tissue of Crohn's disease and non-Crohn's disease patients [7]. All strains have been classified as AIEC and non-AIEC using standard in vitro assays to evaluate adhesion/invasion to Caco2 epithelial cells and uptake/survival in J774 macrophages [5,7]. To validate our results by tracking individual strains with PCR, each strain was marked with an antibiotic resistance cassette and molecular barcode inserted into a neutral chromosomal region using Tn7 transposon insertion [34,35]. During experimentation, these strains were referenced by their blinded laboratory designation and barcode: JA0018/A1, JA0019/A3, JA0022/B6, JA0036/C5, JA0044/D2, JA0048/D5, and JA0091/C2. The identity of these strains is described in Table 1. Five strains were originally isolated by KW Simpson at Cornell University as the strains CU39ES-1, CU532-9, CU568-3, CU37RT-2, CU42ET-1; HM670 was isolated previously [6] and gifted by Barry Campbell from Liverpool University; and LF82 is the prototypic AIEC strain gifted to KW Simpson by Arlette Darfeuille-Michaud [5].
After colonization, mice were maintained in specific pathogen free (SPF) housing, where they acquired a simplified mouse microbiota. After 2 weeks of colonization, we gave kanamycin water ad libetum for 2 weeks to suppress this microbiota and ensure that all strains could persist to some extent. Mice were sacrificed 6 weeks later (by CO 2 asphyxiation), for a total of 10 weeks colonization. A stool sample was removed from the lumen of the distal colon for our analysis. DNA was extracted and purified as described in [3,36].

In vivo colonization and persistence studies
To demonstrate that the seven human-derived strains and murine E. coli NC101 could colonize germ-free Il10 -/mice throughout the gastrointestinal tract, we singly housed 1 male and 1 female mouse per E. coli strain, for a total of 16 cages. Mice were gavaged with 10 8 CFU of a single strain and moved to SPF housing. After 1 week, each mouse received a fecal transplant (prepared anaerobically from a pool of 7 C57BL/6 WT mice) via gavage to provide niche competition. Stool samples were collected almost daily and CFUs were quantified on kanamycin plates to monitor fecal E. coli colonization. Mice were sacrificed by CO 2 asphyxiation after 5 weeks, and the following tissues were harvested and CFUs quantified on kanamycin plates to measure viable colonizing bacteria: ileal content, cecal content, colon content, colon tissue and colon mucus layer. E. coli CFUs were also quantified from stomach content, duodenal content, and jejunal content, but only 1-3 samples of each tissue had detectable bacterial growth of more than 10 2 CFUs/10 mg.
To demonstrate that consistent colonizing strain, CU42ET-1-D5, and inconsistent colonizing strain, HM670-C2, colonized germ-free Il10 -/mice in a similar manner across a larger cohort, we housed 7 (3M/4F) or 8 (4M/4F) mice, respectively, in two cages for each cohort. Mice were gavaged with 10 8 CFU of a single strain and moved to SPF housing. After 1 week, each mouse received the same fecal transplant as used previously (prepared anaerobically from a pool of 7 C57BL/6 WT mice that were Helicobacter spp. free) via gavage to provide niche competition. After 3 weeks of colonization (2 weeks post-FMT), we administered kanamycin water ad libetum for 2 weeks to suppress this microbiota and ensure that all strains could persist to some extent. Mice were sacrificed 6 weeks later (by CO 2 asphyxiation), for a total of 10 weeks colonization. The following tissues were harvested and CFUs quantified on kanamycin plates to measure viable colonizing bacteria: colon content, colon tissue and colon mucus layer.

In vitro co-culture assays to measure pro-inflammatory cytokine production
To assess the inflammatory capacity of the clinical strains, J774 macrophages grown in DMEM complete media (supplemented with 10% heat-inactivated FBS and 100 U/ml penicillin-100  mg/ml streptomycin) were transferred to 12-well plates at 5x10 5 cells/well. Cells were grown overnight at 5% CO 2 at 37˚C, and washed twice with PBS before adding fresh culture media without antibiotics. Cells were infected at an MOI of 1 for 4 hours before removing the bacteria, washing with PBS, and adding fresh media with 100 μg/mL gentamycin. Cells were grown another 20 hours at which point the supernatants were collected for ELISA and the cells were collected for qPCR analysis.

ELISA
Supernatants from stimulated J774 macrophages (above) were analyzed for cytokine IL12p40 production through ELISA following manufacturer's protocol (BD Bioscience Opt EIA, catalog #555165). They were assayed in 4 independent experiments with 2-3 technical replicate wells.

Quantification of cytokine expression by qPCR
Stimulated J774 macrophages were washed with PBS and transferred into 1 mL Trizol (Invitrogen) and RNA was extracted following the manufacturer's protocol. Isolated RNA was subjected to DnaseI treatment (Invitrogen) prior to cDNA synthesis. cDNA synthesis was completed using qScript cDNA SuperMix (Quantabio). qPCR amplification was performed in triplicate with SYBR green qPCR chemistry (Bioline) using primers for Tnfa ) on a Quant-Studio 6 Real-Time PCR System. C t values were normalized to Gapdh to generate ΔC t values, and fold changes were calculated by ΔΔC t to the ΔC t of unstimulated controls. These were assayed in 3 independent experiments in technical triplicates.

Quantification of strains by qPCR
Stool DNA (6 ng each) from three mice was subjected to qPCR with primer pairs targeting each barcode to quantify relative amounts of each barcoded strain from within a complex population. Amplification was performed in duplicate using SYBR green qPCR chemistry (Bioline) using a universal barcode forward primer (F-5'-GCTTGGTTAGAATGGGTAACTAGTTTGCAG-3') and barcode-specific reverse primers for A1

Mock tissue-associated microbiome
Aliquots of purified genomic DNA isolated from independent strain cultures were mixed at equal abundance by weight. HMW DNA was isolated from a frozen aliquot of 10 7 HEK293 cells using the Circulomics Nanobind kit (https://www.circulomics.com/nanobind) per the manufacturer's recommended protocol. Human DNA was mixed with microbial mixture at a ratio of 99.5: 0.5% to represent a realistic tissue-associated microbiota composition.

Full-length rRNA amplicon sequencing
For stool and in vitro host/microbiota samples, primers for proximal 16S (27F: AGRGTTTGA-TYHTGGCTCAG) and distal 23S (2241R: ACCRCCCCAGTHAAACT) were used to amplify the full-length rRNA operon (4,500bp). Starting with~5 ng expected microbial DNA, 25 ul PCR reactions were prepared with LongAmp Taq 2x Master Mix (New England Biolabs, Inc; M0287L) and 0.4 uM of each primer. We ran 20 cycles consisting of denaturation at 94˚C for 10s, annealing at 51˚C for 30s, and extension at 65˚C for 225s, followed by final extension at 65˚C for 10min. Typical yield is 700ng of full-length amplicons, which were verified by agarose gel electrophoresis and fluorescent quantification (Qubit). Amplicon libraries were prepared for nanopore sequencing using the Oxford Nanopore Ligation Sequencing Kit (LSK109) per the manufacturer's protocol and sequenced on R9.4.1 flow cell on GridION with real-time basecalling using Guppy v3.2.2 in high-accuracy mode.

Strain-level variation
As illustrated in Fig 2, we observe significant genome-wide structural variation while broadly conserving GC content and annotated gene density within homologous blocks, even among these closely-related strains with similar adherent-invasive (AI) or non-AI phenotype. We computed the pairwise hamming distance [37] between sequences across each strain and each copy of the rRNA operon for the 16S, ITS, 23S, and entire rRNA region (Fig 3). We evaluated several regions independently for informative variation among these strains in silico by aligning our whole-genome sequence to either 1) the V1-V2 hypervariable region of 16S, 2) V3-V4 hypervariable region, 3) the entire 16S sequence containing all nine hypervariable regions and conserved spacers, 4) the entire rRNA operon, including 16S, ITS, and 23S, and 5) the entire genome. Very long nanopore reads allow us to use this as a proxy for PCR amplification of the various rRNA amplicon analyses since these reads average 10-20 Kbp (see Table 4), and thereby most often cover the entire region of interest.
In general, and not unexpectedly, the larger the region used for classification, the greater the accuracy in detecting the correct strain (Table 5). Hypervariable regions V1-V2 (27F-338R) and V3-V4 (343F-806R) produces poor results (averaging 35% and 27% accuracy, respectively), little better than chance since there is little or no discriminating variation within that region. The entire 16S gene performs better (70%). Notably, there is a huge amount of variation in the accuracy across strains owing to their relative dissimilarity in a particular region. Full-length rRNA achieves an average 87.3% accuracy at the strain level, followed closely by all reads aligned against the entire genome (i.e. shotgun metagenomics) at 91%. Additional benefits of whole-metagenome sequencing include the observation of genes that may imply the functional capacity of the community without explicitly characterizing its taxonomic structure, and the ability to assemble the metagenome into an approximation of its component microbial genomes. These are important, tangible benefits, but are beyond the scope of this paper.
There is a relatively small improvement in classification accuracy using the entire genomes compared to the full-length rRNA operon. Like commonly used 16S primers that target subsets of the hypervariable regions, the primer pair used for the full operon are well-conserved across bacteria, but presents a much larger sequence with which to accurately classify sequences. This coupled with cost-effective multiplexing and long-read sequencing, despite much higher error rates than standard Illumina sequencing, should make this a viable approach for characterizing complex microbiome samples.

In vivo colonization studies
We performed in vivo colonization studies using a well-established IBD mouse model [3,12,13,36,38,39]. We first determined if these human-derived strains and murine-derived NC101 could colonize and persist in mice in the presence of a competing microbiota. We colonized two singly-housed, adult, germ-free, Il10 -/mice each with a single E. coli strain, then after one week provided niche competition by gavaging with a murine fecal transplant as outlined in S1 Fig. After five weeks total, gastrointestinal tissues, including ileal content, cecal content, colon content, colon tissue, and colon mucus layer, were harvested and viable E. coli were quantified by serial plating. S1 Table reveals that almost all E. coli strains could colonize the lumen and intestinal tissue with complex community competition, even without kanamycin to suppress the competing microbiota. Two strains of E. coli colonized inconsistently across both mice, with high levels of E. coli in one mouse and low or undetected levels of E. coli in the other. To verify if this would persist across a larger cohort, consistent colonizing strain, CU37RT-2-D5, and inconsistent colonizing strain, HM670-C2, were given by gavage to seven or eight germ-free, Il10 -/mice, respectively. After one week we provided niche competition by gavaging with a murine fecal transplant, and then after three weeks we provided kanamycin water ad libitum for two weeks to supress the competing microbiota as outlined in S2 Fig. After ten weeks total, gastrointestinal tissues, including colon content, colon tissue, and colon mucus layer, were harvested and viable E. coli were quantified by serial plating. S2 Table  reveals that consistent colonizing strain CU37RT-2-D5 colonized all seven mice to high levels, while inconsistent colonizing strain HM670-C2 colonized five of eight mice at the lumen and intestinal tissue. Future studies are needed to validate these results and draw strong conclusions, including colonization in inflamed and non-inflamed environments.
To validate the pro-inflammatory potential of these strains, each strain was co-cultured with J774 macrophages and inflammatory colitis-inducing cytokine transcription and secretion was measured [40][41][42][43]. The results in Fig 4 demonstrate that all strains induced inflammatory cytokine production. Inducing inflammatory cytokine production is an indicator of inflammatory potential, but it is not a known predictor of tissue inflammation or colonization in vivo. Further effort is needed to compare in vitro cytokine production with in vivo inflammation elicited by colonizing AIEC isolates.

Full-length rRNA metataxonomics
To further evaluate the utility of performing strain-level metataxonomics using full-length rRNA sequencing on an Oxford Nanopore device, we prepared and sequenced one stool sample from in vivo Il10 -/colonization studies depicted in Fig 5. Il10 -/mice, a well-established model of microbially driven intestinal inflammation, were colonized with an equally-proportioned pool of the seven E. coli clinical isolates, and then housed in an SPF facility where they become colonized over time with a natural mouse microbiome [3,36]. After ten weeks, the mice were sacrificed and stool samples were collected for nanopore sequencing and quantification of each strain by targeted qPCR. We extracted DNA from the resulting stool samples. Full-length rRNA amplification and sequencing using universal 16S 27F and 23S 2241R primers (see Methods) produced >600,000 reads with a median length of 4,149 bp ( Table 6, Fig 6).  Table 5. Classification accuracy using commonly amplified 16S hypervariable regions, V1-V2 and V3-V4, the entire 16S and rRNA operons, and shotgun sequencing of the entire genome. While short hypervariable regions perform poorly and inconsistently across these samples, classification based on the entire rRNA operon (~4,500bp) approaches classification accuracy of whole-genome sequencing.  Relative percent abundance of each strain determined by full-length rRNA sequencing. Off-target hits to NC101 (4%) are expected to happen by chance given the classification error rate for the whole rRNA sequence (~13%, see Table 5).

Strain
(C) Relative percent abundance of each strain determined by quantitative PCR of molecular barcode. Values normalized to E. coli 16S rRNA, and fold increase relative to pooled E. coli inoculum. Each bar represents the mean across 3 murine stool samples in duplicate and error bars show standard deviation.
We assigned these reads to individual rRNA copies by aligning to our database of known sequences in the assembled genomes (40 copies, five in each of eight strains). Fig 7 illustrates the distribution of alignment identity (number of matching nucleotides) over the full-length rRNA (4,245bp). Sequences derived from the stool sample show a bimodal distribution with peaks at 3.9Kbp (92% identity) and 2.9Kbp (68% identity), representing the expected divergence and sequencing error matching E. coli and non-Escherichia genera, respectively. The proportion of reads assigned to each strain is shown in Fig 5(B). There is a skewed representation of strains in the stool sample due to natural variation in their ability to colonize and thrive and due to interstrain competition between isolates in the Il10 -/mouse gut. We confirmed the relative abundance of each strain in the stool sample through qPCR of the molecular barcode inserted in the neutral Tn7 site of each E. coli genome [34]. The qPCR showed a similar result, as there was skewing among the E. coli isolates in the mouse gut as shown in Fig 5 with the CU568-3-C5 strain most abundant. Both consistent and inconsistent colonizing strains showed lower abundance in this stool sample as there were multiple E. coli isolates providing competition upon initial infection, and this data is a measure of relative abundance across E. coli strains and not total load with the stool. Additionally, the inconsistent colonizing strain still colonized in 5 of 8 mice (S2 Table). We additionally prepared a mock mixture consisting of all eight strains at even abundance mixed 0.5: 99.5% with human DNA (HEK293) to simulate a realistic tissue-associated microbiota sample. Full-length rRNA sequences were successfully amplified and sequenced as done with the stool sample. We observed relatively uniform representation of the eight strains in the mock mixture, demonstrating our ability to amplify and properly classify closely-related rRNA sequences in a host-tissue-like context, as illustrated in Fig 8.

Discussion
Alterations in the composition and function of gut microbial communities are implicated in both IBD and CRC, with CD patient microbiomes harboring an increase in mucosa-associated Enterobacteriaceae, including AIEC [7]. After excluding genes associated with diarrheagenic E. coli, AIEC are defined by their in vitro adherent-invasive behavior. There is a clear need to extend our knowledge of AIEC to pathoadaptive determinants of colonization and virulence in vivo. Recent work highlights mechanisms AIEC have evolved to more effectively colonize the host mucosa including hypermotility and metabolic changes [44] and T4SS-linked biofilm formation [45]. These efforts involved either long in vivo host adaptation experiments or genome-wide screens with a limited number of E. coli strains. Other molecular mechanisms including the type 1 pili-CEACAM6 interaction that promotes adhesion of AIEC to intestinal epithelial cells [46,47] do not offer a genetic marker to distinguish AIEC. Competitive colonization with pools of known AIEC and non-AIEC strains as a parallel strategy could help elucidate traits and genes important for colonization, but requires methods to discriminate between closely-related microbes. To relate specific traits of different AIEC to pre-clinical models and patients, we need to be able to distinguish individual strains from within a complex community. As AIEC are mucosa-associated, the ability to evaluate colonization of inflamed or cancerous lesions and normal mucosa would be optimal. The commonly used metataxonomic approach of sequencing one or two amplified hypervariable regions of the 16S gene is unable to distinguish among closely-related, but functionally diverse, species and strains [21]. Consistent with previous work [17][18][19][20][21], we demonstrate that sequencing of longer regions of the 16S, ITS, and 23S operons, enabled by modern long-read sequencing technologies, dramatically increases taxonomic specificity. We illustrated the increase in sensitivity and specificity of strain detection using full-length rRNA amplification and whole-metagenome shotgun sequencing in in silico models, mock mixtures, and in vivo colonized murine microbiome.
This focused study highlights specifically the challenges in resolving closely-related but functionally divergent AIEC and non-adherent/invasive E. coli strains within mixed microbial populations. We demonstrated accurate identification and discrimination of these strains in complex in vitro and in vivo models relevant to colitis. Our novel nanopore sequencing approach illustrates an efficient and effective method for discrimination of known pathogenic or specifically colonized microbiota, including tissue-associated bacteria, to enable investigation of spatiotemporal dynamics of microbial colonization without relying on genetic engineering, molecular barcoding [34], or in vivo fluorescence markers [48].
While these results suggest a method for sensitive identification of known strains, this approach does not suggest a general taxonomic analysis protocol for microbiome studies since the genomes of the strains of interest must be known a priori in order to properly identify them. Full-length rRNA sequencing following PCR amplification permits accurate strain identification even when the microbial abundance-for example in tissue-associated microbial samples-is very low. We demonstrated high concordance among relative abundance between sequenced full-length rRNA sequences and qPCR results targeting the inserted molecular barcodes, suggesting that abundances are reliably preserved through amplification and nanopore sequencing. However, even higher accuracy and explicit identification of microbial genic content can be captured by shotgun metagenome sequencing when there exists enough material to extract high-quality microbial DNA.
Future work includes optimization of the full-length rRNA protocol for tissue samples with adherent microbes. Our analysis of in vitro mixtures indicate this should be feasible with microbial:host DNA content as low as 0.5%-consistent with expected abundances [49,50] in many human mucosa, including the gut-but we have so far failed to amplify full-length rRNA from resected mouse gut. Additionally, given the extraordinary utility of shotgun metagenomic sequence, it is worthwhile to explore the several existing methods for selective extraction and sequencing of microbial DNA to expand the availability of non-PCR-based approaches in the context of tissue-associated microbial communities.