Comparative genomics reveals differences in mobile virulence genes of Escherichia coli O103 pathotypes of bovine fecal origin

Escherichia coli O103, harbored in the hindgut and shed in the feces of cattle, can be enterohemorrhagic (EHEC), enteropathogenic (EPEC), or putative non-pathotype. The genetic diversity particularly that of virulence gene profiles within O103 serogroup is likely to be broad, considering the wide range in severity of illness. However, virulence descriptions of the E. coli O103 strains isolated from cattle feces have been primarily limited to major genes, such as Shiga toxin and intimin genes. Less is known about the frequency at which other virulence genes exist or about genes associated with the mobile genetic elements of E. coli O103 pathotypes. Our objective was to utilize whole genome sequencing (WGS) to identify and compare major and putative virulence genes of EHEC O103 (positive for Shiga toxin gene, stx1, and intimin gene, eae; n = 43), EPEC O103 (negative for stx1 and positive for eae; n = 13) and putative non-pathotype O103 strains (negative for stx and eae; n = 13) isolated from cattle feces. Six strains of EHEC O103 from human clinical cases were also included. All bovine EHEC strains (43/43) and a majority of EPEC (12/13) and putative non-pathotype strains (12/13) were O103:H2 serotype. Both bovine and human EHEC strains had significantly larger average genome sizes (P < 0.0001) and were positive for a higher number of adherence and toxin-based virulence genes and genes on mobile elements (prophages, transposable elements, and plasmids) than EPEC or putative non-pathotype strains. The genome size of the three pathotypes positively correlated (R2 = 0.7) with the number of genes carried on mobile genetic elements. Bovine strains clustered phylogenetically by pathotypes, which differed in several key virulence genes. The diversity of E. coli O103 pathotypes shed in cattle feces is likely reflective of the acquisition or loss of virulence genes carried on mobile genetic elements.


Introduction
Enterohemorrhagic Escherichia coli (EHEC) carry one or both phage-encoded Shiga toxin genes (stx1 and stx2) and the attaching and effacing gene (eae), which is harbored in the chromosomal-encoded locus of enterocyte effacement (LEE) pathogenicity island. Among EHEC pathotypes, O157:H7 serotype is most frequently associated with human foodborne illness. However, Centers for Disease Control and Prevention (CDC) rank O103 as the second most common serogroup, next to O26, identified in laboratory confirmed non-O157 EHEC infections in the U.S. [1]. In human EHEC infections, disease outcomes can range from mild to bloody diarrhea (hemorrhagic colitis) to more serious complications, such as hemolytic uremic syndrome (HUS), and even death [2]. Differences in disease-causing potential, particularly the ability to cause serious complications, are attributed to differences in virulence of EHEC strains [3]. In addition to the major virulence factors, which include Shiga toxins and LEE gene-encoded proteins, other virulence attributes, including known putative virulence factors, contribute to the development, progression, and outcome of the disease [4][5][6]. Enteropathogenic E. coli (EPEC), including EPEC O103, do not carry stx genes; however, they possess eae and other virulence genes to cause attaching and effacing lesions that can result in mild to severe diarrhea, or even death, particularly in children [7,8]. Strains within the EPEC pathotype are further characterized as typical or atypical, depending on presence or absence, respectively, of the EPEC adherence factor (EAF) plasmid [9]. The loss of the stx gene(s), a frequently reported event [10,11], can transform an EHEC into an EPEC pathotype. These major pathotype-defining mobile virulence genes have been well studied, but less is known about how other mobile elements contribute to the overall virulence diversity in O103 serogroup. Some strains of E. coli O103 carry neither Shiga toxin nor intimin genes, possibly a non-pathotype; even less is known about the virulence profiles of these strains. Cattle have been shown to harbor EHEC, EPEC and putative non-pathotype O103 in the hindgut and shed them in the feces [12]. We hypothesize that the diversity of O103 pathotypes harbored and shed in the feces of cattle is reflective of the loss or acquisition of genes carried on mobile genetic elements.
Whole genome sequencing (WGS) has been used to characterize the virulence gene profiles of EHEC O157 [13], identify phylogenetic relationships between EHEC O157 and non-O157 serotypes [14][15][16][17][18] as well as discover novel virulence determinants [19]. However, differences in virulence gene profiles and phylogenetic relationships of O103 pathotypes of bovine origin are less characterized [20]. Therefore, our objectives were to utilize WGS to identify and compare major and putative virulence genes, particularly genes located on mobile elements, of bovine and human clinical EHEC O103, bovine EPEC O103, and putative non-pathotype O103 strains and analyze phylogenetic relationships among the strains.

Strains
The Institutional Animal Care and Use Committee at Kansas State University approved the research that resulted in the strains that were used in the study. The bovine EHEC strains investigated in this study were isolated from cattle feces from several feedlots in the Midwest region of the US [12,21,22]. Sixty-nine bovine O103 strains, previously identified by endpoint PCR [23] as positive for stx1 (Shiga toxin 1) and eae (intimin) (bovine EHEC; n = 43), negative for stx1 and positive for eae (bovine EPEC; n = 13) and negative for both stx1 and eae (bovine putative non-pathotype; n = 13) were used in the study. Human clinical O103 strains positive for stx1 and eae (human EHEC; n = 6) were included in the study for comparison. The strains were cultured onto Tryptone soy agar (TSA; BD Difco, Sparks, MD) slants and shipped overnight in cold storage to the University of Maryland for whole genome sequencing.

DNA preparation and whole genome sequencing
The O103 strains from TSA slants were streaked onto blood agar (Remel, Lenexa, KS) and then subcultured in tryptone soy broth (BD Difco, Sparks, MD). Bacterial DNA from overnight culture was extracted from each strain using the DNeasy Blood and Tissue Kit with the QIAcube robotic workstation (Qiagen, Germantown, MD). The genomes were sequenced using an Illumina MiSeq platform (Illumina, San Diego, CA) at approximately 37x average coverage. Genomic libraries were constructed using Nextera XT DNA Library Preparation Kit and MiSeq Reagent Kits v2 (500 Cycles) (Illumina, Inc.). De novo genome assembly was performed using SPAdes 3.6.0 [24].

Genomic analysis
Draft genomes were annotated using Rapid Annotation using Subsystem Technology (RAST version 2.0 -http://rast.nmpdr.org/; [25]), a web-based service commonly used for annotation of draft bacterial genomes [26,27]. RAST applies the Fellowship for Interpretation of Genomes (FIG) subsystem approach to rapidly call and annotate genes, then uses high-throughput comparative analysis and a collection of expertly curated databases to categorize genes, based on the functional role they perform, into subsystems. Average number of genes located on mobile elements (prophages, transposable elements and plasmids), and genes related to virulence, disease and defense were determined, using RAST, for each of the O103 subgroups (bovine EHEC, human EHEC, bovine EPEC and bovine putative non-pathotype). Genomic sequencing data in this study exceeded the minimum criteria for analysis that RAST requires of each genome: at least 10x coverage (using 454 pyrosequencing) and 70% of assembled sequences in contigs > 20,000 base pairs. Serotype identity, virulence and plasmid make-up of the 75 strains were determined using default parameters of Center for Genomic Epidemiology Serotype-Finder 1.1 (https://cge.cbs.dtu.dk/services/SerotypeFinder/) [28],Virulence Finder 1.4 (https:// cge.cbs.dtu.dk/services/VirulenceFinder/) [29], and PlasmidFinder 1.3 [30] programs, respectively. Prophage sequences of the 75 strains were determined using Phage Search Tool Enhanced Release (PHASTER; http://phaster.ca/) [31,32]; intact, and questionable prophage sequences, defined by PHASTER scores of >90 and 70-90, respectively, were included in analysis. The complete genome of EHEC O103:H2 strain 12009 (GenBank accession no. AP010958.1; https:// www.ncbi.nlm.nih.gov/nuccore/AP010958.1) and 12009 plasmid pO103 DNA (GenBank accession no. NC_013354.1; https://www.ncbi.nlm.nih.gov/nuccore/NC_013354.1), a classical O103 reference strain of clinical origin used in many O103 genomic studies [14,33,34] and bootstrap values were reported for each branch. Representative strains, based on clustering patterns observed in the phylogenetic tree, were chosen as input for BLAST Ring Image Generator software (BRIG v0.95 -https://sourceforge.net/projects/brig/) [37]. The BRIG plot displays similarities and differences between the draft genome nucleotide sequence identities of target stains, represented by concentric rings, to the genome identity of a chosen reference strain, identified in the center of the BRIG plot. The complete genome of EHEC O103:H2 strain 12009 was used as a BRIG plot reference. The nucleotide sequence (45,325 bp) of the LEE pathogenicity island (GenBank accession no.: AF071034.1; https://www. ncbi.nlm.nih.gov/nuccore/AF071034.1) of human clinical EHEC O157:H7 EDL933 strain [37] was mapped to the BRIG plot for comparison of LEE between the target strains.

Statistical analysis
A single factor analysis of variance (ANOVA) test was performed to determine whether average genome size, and average number of extra-chromosomal genes and virulence, disease and defense genes were significantly different among the four subgroups (bovine EHEC, human EHEC, EPEC and putative non-pathotype). If means were significantly different (P 0.01), Tukey adjustment for multiple comparisons was performed, using SAS 9.4 with Proc Glimmix, to test each pairwise comparison for significant differences (P 0.01).

Nucleotide sequence accession numbers
Draft genome sequences of the 75 E. coli O103 strains are available in GenBank and their accession numbers are listed in Tables in S1, S2 and S3 Tables.

RAST subsystem summary
Genome size range of bovine (5.32-5.79 Mb) and human EHEC (5.43-5.77 Mb) subgroups were similar (Table 1). However, both bovine and human EHEC subgroups had significantly larger average genome sizes (P 0.0001) compared to EPEC or putative non-pathotype subgroups. Average genome size was similar between EPEC and putative non-pathotype subgroups. However, one of the bovine EPEC O103:H11 strains (2013-3-492A) had a similar genome size (5.67 Mb) to that of other EHEC strains.
Overall, the number of genes in the category of virulence, disease and defense was comparable for all 75 strains tested (Table 1), with no significant differences observed in the mean number of genes among the O103 subgroups. However, the number of genes on mobile elements (prophages, transposable elements, and plasmids) varied considerably among O103 subgroups and among serotypes within subgroups. Strains belonging to bovine and human EHEC subgroups had a significantly higher (P 0.001) number of mobile genes compared to EPEC and putative non-pathotype subgroups. Average number of mobile genes was not significantly different between bovine and human EHEC subgroups or between EPEC and putative non-pathotype subgroups. The bovine EHEC strains possessed the widest range in the number of genes on mobile elements (221-351). Similarly, wide ranges were observed in bovine EPEC strains (137-289 genes) and bovine putative non-pathotype strains (100-157 genes), but not in human EHEC strains (256-292 genes). Mobile gene counts above 300 were only observed in a few bovine EHEC strains (4/43), and one bovine EHEC strain (2014-5-933A) had 351 mobile genes, nearly 60 more than the highest number in strains of the human EHEC subgroup. Furthermore, the one bovine EPEC O103:H11 (strain 2013-3-492A) that had a similar genome size as EHEC pathotype had 289 mobile genes; 76 more mobile genes than the highest number in strains within the EPEC O103:H2 subgroup.
A strong correlation (R 2 = 0.70) was observed between genome size vs. number of genes on mobile elements for the 75 strains (Fig 1). The EHEC strains had larger genome size and higher number of genes on mobile elements compared to EPEC and putative non-pathotype strains. The EPEC O103:H11 strain (2013-3-492A) appeared to be an EPEC outlier, with genome size and number of genes on mobile elements closer to those of the EHEC O103 strains (Fig 1).

Virulence genes
Virulence genes with >90% sequence homology were considered positive in a genome. The complete virulence gene profiles of each genome are shown in tables in S1, S2 and S3 Tables. All EHEC strains were positive for Shiga toxin 1a (stx1a) subtype. On average, bovine and human EHEC strains were positive for more virulence genes than EPEC strains; putative nonpathotype strains were negative for all LEE encoded, non-LEE encoded, and pO157 plasmidencoded genes ( Table 2).
The putative virulence genes that were present in the O103 strains are shown in Table 3. Of all adherence-based genes in EHEC and EPEC strains (Tables 2 and 3), only long polar fimbriae gene (lpfA) was present in putative non-pathotype strains. The lpfA gene was also present in all human EHEC O103:H11 strains (n = 4) and in the EPEC O103:H11 strain, but was not detected in O103:H2 strains within bovine and human EHEC and bovine EPEC subgroups or within any of the human EHEC control strains (O103:H2 12009, O157:H7 Sakai, O157:H7 EDL933). ABC transporter protein MchF (mcfF), MchC protein (mchC), Microcin H47 part of colicin H (mchB) and Microcin M part of colicin H (mcmA) genes were present in 5/12 (41.7%) bovine putative non-pathotype O103:H2 strains but absent in all other strains. The colicin M gene (cma) was found in 5 of 12 putative non-pathotype O103:H2 strains, but also in one bovine EHEC O103:H2 (strain 2014-5-1565C). Glutamic acid decarboxylase (gad) was present in all 75 strains. EAST-1 toxin gene (astA), encoding for an enterotoxin, was in all O103:H11 strains (human EHEC and bovine EPEC) in the study, and in a majority of bovine EPEC O103:H2 strains (9/12), but not in any of the EHEC O103:H2 strains. Endonuclease colicin E2 gene (celb) was present in nearly half (20/43) of all bovine EHEC strains, and in the bovine EPEC O103:H11 strain, but absent from all other subgroups.

Phylogenetic relationships
A maximum likelihood phylogenetic tree, based on core genome alignment of all 75 strains, was constructed using Parsnp v.1.2. The output file was proportional branch transformed using FigTree 1.4 (Fig 2). Overall, strains clustered according to pathotypes, with one notable exception: bovine EPEC O103:H11 strain (2013-3-492A) was more closely related to a human EHEC O103:H11 (strain KSU-74) than to any of the other bovine EPEC strains included in the study (Fig 2). All EPEC O103:H2 strains clustered together and putative non-pathotype strains exhibited a similar clustering. One human EHEC O103:H2 strain (KSU-72) was more  Based on clustering patterns in Fig 2, representative strains were selected from observed serotypes (O103:H2, O103:H11, and O103:H16) within each O103 subgroup (bovine EHEC, human EHEC, bovine EPEC, and bovine putative non-pathotype) as input for BLAST Ring Image Generator (BRIG) v0.95 [37]. The draft genomes of these target strains are represented by the concentric rings in the BRIG plot; any missing portions of these rings represent nucleotide sequences missing from the target strains in comparison to a central reference strain (EHEC O103:H2 strain 12009; Fig 3). Putative non-pathotype strains (2013-3-308C and 2013-3-111C) displayed the largest degree of sequence divergence to the reference strain. As expected, the LEE island (45,325 bp), which encodes for the eae gene and other Type III secretion effectors, was present in all EHEC and EPEC strains, but absent in the putative nonpathotype strains. Interestingly, a relatively large unknown sequence (~40,000 bp) from the reference strain was present in 2/5 bovine EHEC O103:H2 strains (2013-3-174C, 2014-5-1565C) and in 1/3 human EHEC strains (KSU-72), but absent in all other EHEC, EPEC, and putative non-pathotype strains. It is worth noting that the three strains positive for the unknown sequence were not positive for any virulence genes not found in the remaining strains tested. Strains 2013-3-174C and 2014-5-1565C of bovine EHEC O103:H2 had higher sequence similarity with the human clinical O103:H2 reference strain than to any of the human clinical EHEC target strains.

Plasmid replicon
Host origin, pathotype and serotype (no. isolates tested) Genomic analysis of bovine E. coli O103

Discussion
Escherichia coli O103 is the third most common STEC (next to O157 and O26) implicated in human STEC infections [1,38]. Based on our studies, serogroup O103 is the second most prevalent STEC (next to O157) shed in cattle feces [12,21]. Brooks et al. [38] have reported that 117 human clinical O103 isolates, submitted to CDC from 1983 to 2002, were positive for stx1 and negative for stx2, and included only four flagellar types, H2, H11, H25 and non-motile. Similarly, all Shiga toxin-producing strains of cattle origin in this study (n = 43) were positive for stx1 gene only, however, all possessed the H2 flagellar type. The predominance of the H2 flagellar type in bovine strains is in agreement with previous reports of O103 strains in cattle and sheep [20,[39][40][41]. The majority of EHEC strains (48/49; 98.0%) in our study had Shiga toxin 1a (stx1a) gene. Söderlund et al. [20] report Shiga toxin 1a (stx1a) subtype present in five EHEC O103:H2 isolated from Swedish cattle. Similar to findings from previous studies [20,33], all EHEC/EPEC O103:H2 and O103:H11 strains carried epsilon and beta1 eae subtypes, respectively. All EPEC strains included in this study were considered atypical, as indicated by the absence of the EAF plasmid, a finding also in agreement with previous studies [20,42,43]. All EHEC O103 strains in this study (43 bovine and 6 human strains) had a higher number of genes on mobile elements (prophages, transposable elements, and plasmids) compared to the bovine EPEC (except for one O103:H11 strain) and putative non-pathotype strains. Significant differences in the genome size observed among the O103 subgroups are reflective of the  number of genes from mobile elements. However, one bovine EPEC O103:H11 strain (2013-3-492A) was an exception as its genome size and number of genes on mobile elements were more comparable to EHEC strains (Fig 1); furthermore, this strain was more closely related to a human EHEC O103:H11 strain (KSU-74) than to any of the EPEC strains (Fig 2). Also, the virulence gene profile of the EPEC O103:H11 strain 2013-3-492A more closely resembled the virulence gene profiles of the EHEC O103 subgroup than that of the bovine EPEC O103 subgroup. Furthermore, the strain is positive for stx1 bacteriophage insertion site (yehV) and bacteriophage-yehV right and left junctions [44], suggesting that the EPEC O103:H11 strain may be capable of acquiring and/or had once acquired but lost stx gene(s). This suggests that much of the genetic diversity in E. coli O103 strains shed in cattle feces can be attributed to the loss or to acquisition of mobile genetic elements [45]. Similar to the phylogenetic clustering of bovine EHEC and EPEC O103:H2 strains reported in Söderlund et al. [20], strains in this study largely clustered by pathotype (Fig 2). A genomewide visual comparison between representative strains from observed serotypes (O103:H2, O103:H11, O103:H16) within each O103 subgroup (bovine EHEC, human EHEC, bovine EPEC, and bovine putative non-pathotype) showed clear differences in the sequence identity between target strains (Fig 3). Interestingly, two of the bovine EHEC O103:H2 strains (2013-3-174C and 2014-5-1565C) shared more sequence identity with the clinical reference strain than did the human EHEC strains included in Fig 3, which may be an indication of the virulence potential of these strains. It is clear that the EHEC and EPEC strains have acquired more genetic elements during the course of their evolution in comparison to the putative non- pathotype strains. Although overall number of genes implicated in virulence, disease and defense was comparable among all 69 bovine strains, a closer examination revealed key differences in virulence gene profiles of O103 subgroups and serotypes within subgroups.

LEE effector genes
The chromosomal LEE pathogenicity island carries genes that encode for intimin (eae), translocated intimin receptor protein (tir), and type III secretion system effector proteins (espA and espB). Studies have shown that without any one of these genes (eae, tir, espA, espB), attaching and effacing (A/E) E. coli are unable to produce their characteristic A/E lesions [46][47][48]. The espF gene is also LEE encoded, but unlike the other LEE genes that were present in all EHEC and EPEC strains, a small number of bovine EPEC (3/13) and EHEC (4/43) strains were espFnegative. Although espF contributes to the disruption of intestinal barrier function during attachment, McNamara et al. [49] have shown that the gene is not required for A/E lesion formation. Other type III effector genes (cif, espJ, and tccP) were variably present in the EHEC and EPEC strains, possibly, because they are prophage-encoded genes. Although cif and espJ genes enhance attachment, in vivo and/or in vitro studies have shown that A/E lesions are not significantly inhibited in the absence of either gene [50,51]. Garmendia et al. [52] have shown that tir-cytoskeleton coupling protein gene (tccP) assists in the translocation of the intimin receptor protein during bacterial attachment. In the same study, tccP mutants were unable to trigger A/E lesions on in vitro-inoculated HeLa epithelial cells. Considering its seemingly critical importance in type III secretory system-related disease outcomes, it is surprising that not all human clinical EHEC were positive for the tccP gene. Garmendia et al. [52] reported that tir translocation was not affected in tccP mutants, therefore, it is possible that bacterial attachment and expression of other virulence factors in tccP-negative EHEC could contribute to A/E lesions.

Non-LEE effector genes
Non-LEE effector (nle) genes, including nleA, nleB and nleC, have been associated with HUScausing strains of EHEC [53] and were present in varying proportions within EHEC and EPEC O103 subgroups in this study. In two independent studies, ΔnleA [54] and ΔnleB mutant strains of Citrobacter rodentium [55] were unable to cause mortality in inoculated mice. Wickham et al. [55] also reported a three-log decrease (10 6 vs. 10 3 ) in infectious dose for nleB wildtype-compared to ΔnleB-mutant, which highlights the importance of nleB gene as it relates to the low infectious dose of EHEC strains. The nleC gene serves to down-regulate host NF-B signaling pathway in efforts to disrupt immune clearance of invading bacteria [56]. Although nleC has also been significantly associated with HUS-causing strains [53], it was present only in 4 of 6 human clinical EHEC strains, but in 53.5% (23/43) of bovine EHEC strains.

pO157 plasmid encoded virulence genes
The pO157 plasmid (~93 kb) carries a number of virulence genes implicated in EHEC virulence [57] and is present in nearly all clinical O157:H7 strains [58]. Major pO157 plasmidencoded genes, ehxA, espP, etpD, katP and toxB, were present in many EHEC and EPEC O103 strains. The enterohemolysin gene (ehxA), present in all EHEC (49/49) and nearly all EPEC (12/13) strains in this study, encodes for a pore-forming toxin, which elicits in vivo production of IL-1β from human mononuclear cells, a commonly expressed cytokine during HUS infections [59]. The extracellular serine protease gene (espP) was found in almost all EHEC and EPEC strains and is considered to contribute to hemorrhagic colitis via the cleavage of pepsin A and human coagulation factor V [60].
The etpD, katP and toxB genes, located on the pO157 plasmid, were less frequently present in EHEC and EPEC strains, compared to ehxA and espP genes. Schmidt et al. [61] report that EHEC type II secretion pathway (etp) genes are not commonly detected (~10%) in bovine EHEC isolated from feces. In this study, etpD gene was present in 9 of 43 (20.9%) of bovine EHEC strains, but absent in the other subgroups. Brunder et al. [62] report a close association between the presence of ehxA and the catalase peroxidase gene (katP) in EHEC O157:H7 strains. We observed a similar trend for bovine and human EHEC; however, ehxA was present in a majority (11/12) of bovine EPEC O103:H2, whereas katP was absent in all of those strains. The toxB gene, identified by Tatsuno et al. [63], is a homolog of EHEC factor for adherence gene (efa1), carried on the pO157 plasmid and is commonly present in clinical EHEC O157: H7. In a study examining the prevalence of toxB in O157 and major non-O157 EHEC and EPEC of clinical and animal origin, Tozzoli et al. [64] report all O103 strains used in their study were negative for the gene. In the current study, 3 of 6 human EHEC strains were positive for toxB. Yet, the gene was present in only 1/43 bovine EHEC strains and in the single bovine EPEC O103:H11 strain. Although toxB is not required for formation of A/E lesions, Tatsuno et al. [63] showed that expression of toxB does lead to enhanced virulence by increasing expression of major LEE-encoded effector genes including espA, espB and tir.

Other virulence genes
Interestingly, lpfA was the only adherence-based virulence gene present in the bovine putative non-pathotype O103:H2 strains (n = 12), yet the gene was absent in all EHEC (n = 43) and EPEC O103:H2 (n = 12) strains, suggesting possible loss of lpfA gene by O103:H2 serotype at some point during the course of acquiring new genetic elements. The gene for increased serum survival (iss) was prevalent in all 75 strains. The iss gene is often associated with avian pathogenic E. coli (APEC) that cause colibacillosis in poultry, and serves as a genetic marker for APEC strains [4]. Among APEC, the iss gene is carried by a ColV plasmid [65] that in addition to conferring increased virulence and fitness traits, also encodes for multidrug resistance [66].
The E. coli secreted protease island encoded gene (espI) is considered part of the family of extracellular proteases known as SPATE, or serine protease autotransporters of Enterobacteriaceae [67]. The espI gene is harbored on the O91:H"pathogenicity island and previously reported to occur exclusively in a LEE-negative subgroup of STEC that carry a stx2d gene variant [68]. Krüger et al. [69] also report detection of espI gene exclusively in stx2-(but not stx1) positive E. coli O26:H11 strains of clinical, bovine and food origin. In our study, espI gene was present in more than half (23/43; 53.5%) of all bovine EHEC O103:H2 that were stx1a positive; espI gene was also present in three of 12 bovine EPEC O103:H2 strains. These results are in contrast with previous studies linking the espI gene to stx2-carrying EHEC only [68,69] and may be the first time espI gene has been reported in bovine EHEC and EPEC O103 strains.

Plasmid and prophage sequences
Some of these plasmid sequences are originally associated with non-E. coli bacteria, including Klebsiella pneumoniae (ColRNAI and IncA/C2), Salmonella typhi (IncFIA(HI1)), Salmonella typhimurium (IncN) and Pseudomonas aeruginosa (IncP), which further highlights the mobility of these genetic elements. Many of the plasmids, including IncA/C2, IncFII, IncFII (pHN7A8), IncFII(pRSB107), IncN and IncX1 have also been associated with antimicrobial resistance determinants and/or other putative virulence-associated functions, that in some cases have been the causative genetic element behind human outbreaks [70]. The IncF incompatibility family represents the majority of virulence-associated plasmids carried by E. coli [71], therefore it may not be surprising that IncF plasmids represented nearly half (96/218; 44.0%) of all total plasmids identified in the strains used in this study.
Similarly, non-E. coli prophage sequences, including Aeromonas phage phiO18P, Burkholderia phage phiE255, Salmonella phage SEN34 and Shigella phage SfII, were found in many of the strains, which further demonstrates the mobility of these genetic elements. The most and least prophage diversity, defined by total number of different prophages carried by an O103 subgroup, was found in bovine EHEC and bovine putative non-pathotype strains, respectively, which also highlights the differences in mobile content found between these subgroups.

Conclusion
The virulence gene profiles of the bovine and human EHEC, bovine (atypical) EPEC and putative non-pathotype strains of E. coli O103 were quite diverse. The difference in the number of strains tested within each subgroup and lack of publicly available O103 genome sequences may have limited the strength of comparison. Although the in silico analysis performed here does not provide phenotypic evidence of virulence contributions, a number of major and putative virulence genes were comparable among bovine and human EHEC O103 strains, which may indicate the potential for bovine EHEC O103 to cause human infection. The bovine EPEC O103:H11 strain also shared similar virulence gene and plasmid profiles with human EHEC O103:H11 strains, raising the possibility that the EPEC may have lost its stx prophage. Regardless, the in silico data highlight the numerous virulence genes carried on mobile genetic elements (prophages, transposable elements, and plasmids) that contribute to the plasticity of bovine EHEC or EPEC. Genome size and number of genes from mobile elements were strongly correlated among the O103 subgroups. The putative non-pathotype strains had the smallest genome size and carried the fewest overall number of mobile genes and perhaps related to this, lacked any specific major or putative mobile virulence genes. The EPEC strains in this study had larger genomes and were positive for a higher number of specific virulence genes compared to putative non-pathotype strains. Excluding the outlying EPEC O103:H11 strain, the EHEC overshadowed EPEC, and putative non-pathotype subgroups in both these categories, which raises the question whether progenitor EHEC bacteria are more genetically predisposed toward acquiring certain mobile elements that could confer virulence. Conversely, putative virulence genes that allow for increased EHEC survival within the environment or within a host may afford EHEC with increased opportunity to acquire mobile genetic elements. We believe that the diversity of pathotypes of E. coli O103 harbored and shed in the feces of cattle is reflective of the loss or acquisition of genes carried on mobile genetic elements. The environmental and biological mechanisms that allow for loss or acquisition of virulence genes by EHEC and EPEC and putative non-pathotype strains remain an exciting frontier for the whole-genome sequence-based analysis of E. coli pathotypes.