Comparative Genomics and Characterization of Hybrid Shigatoxigenic and Enterotoxigenic Escherichia coli (STEC/ETEC) Strains

Background Shigatoxigenic Escherichia coli (STEC) and enterotoxigenic E. coli (ETEC) cause serious foodborne infections in humans. These two pathogroups are defined based on the pathogroup-associated virulence genes: stx encoding Shiga toxin (Stx) for STEC and elt encoding heat-labile and/or est encoding heat-stable enterotoxin (ST) for ETEC. The study investigated the genomics of STEC/ETEC hybrid strains to determine their phylogenetic position among E. coli and to define the virulence genes they harbor. Methods The whole genomes of three STEC/ETEC strains possessing both stx and est genes were sequenced using PacBio RS sequencer. Two of the strains were isolated from the patients, one with hemolytic uremic syndrome, and one with diarrhea. The third strain was of bovine origin. Core genome analysis of the shared chromosomal genes and comparison with E. coli and Shigella spp. reference genomes was performed to determine the phylogenetic position of the STEC/ETEC strains. In addition, a set of virulence genes and ETEC colonization factors were extracted from the genomes. The production of Stx and ST were studied. Results The human STEC/ETEC strains clustered with strains representing ETEC, STEC, enteroaggregative E. coli, and commensal and laboratory-adapted E. coli. However, the bovine STEC/ETEC strain formed a remote cluster with two STECs of bovine origin. All three STEC/ETEC strains harbored several other virulence genes, apart from stx and est, and lacked ETEC colonization factors. Two STEC/ETEC strains produced both toxins and one strain Stx only. Conclusions This study shows that pathogroup-associated virulence genes of different E. coli can co-exist in strains originating from different phylogenetic lineages. The possibility of virulence genes to be associated with several E. coli pathogroups should be taken into account in strain typing and in epidemiological surveillance. Development of novel hybrid E. coli strains may cause a new public health risk, which challenges the traditional diagnostics of E. coli infections.


Introduction
Shigatoxigenic Escherichia coli (STEC) and other diarrheagenic E. coli (DEC) cause diarrheal disease in humans [1]. STEC cause bloody or non-bloody diarrhea. The infection may result in severe sequelae, such as hemolytic uremic syndrome (HUS). STEC produce one or two types of Shiga toxin (Stx1 and Stx2 encoded by the genes stx 1 and stx 2 ), which are responsible for the toxic effects in the host.
Several other DEC pathogroups have been established based on the pathogroup-associated virulence traits [1]. Enterotoxigenic E. coli (ETEC) cause watery diarrhea by producing heatlabile LT (encoded by elt) and/or heat-stable ST (encoded by estIa porcine variant and/or estIb human variant) enterotoxin. Enteropathogenic E. coli (EPEC) produces characteristic histopathology known as attaching and effacing on intestinal cells. Enteroinvasive E. coli (EIEC) is associated with invasive, bloody diarrhea resembling that caused by Shigella spp. Enteroaggregative E. coli (EAEC) harbors the mechanism for aggregative-adherence pattern mediated by aggregative adhesive fimbriae. EAEC is increasingly recognized as a diarrheal pathogen in developing countries. STEC and other DECs are able to acquire virulence genes via horizontal gene transfer from other pathogroups leading to the development of intermediate or hybrid pathogroups [2][3]. A hybrid of EAEC/STEC O104:H4 caused a large outbreak with severe disease and deaths in Germany in 2011 [4]. Hybrids of STEC/ETEC have recently been reported in Germany, United States, and Slovakia [5][6][7], some of which have been associated with human disease [7]. In our previous studies, we have identified STEC/ETEC hybrid strains from patients and animals in Finland [8] and from animal derived food in Burkina Faso [9].
E. coli is a genetically versatile species. Strains within a single pathogroup can originate from different genetical backgrounds [10][11][12][13]. Among STEC, the Locus of Enterocyte Effacement (LEE) negative strains have evolved and acquired stx-phages multiple times [14]. In addition, E. coli strains belonging to different phylogenetic lineages can independently evolve into enterohemorrhagic form of STEC by acquiring phages and other integrative elements, such as LEE, essential for the virulence properties [11]. Also the ETEC pathogroup consists of strains of polyphyletic origin [15]. Multi locus sequence typing (MLST) has revealed that ETEC strains originate from different evolutionary lineages indicating that the acquisition of the elt or est genes may be enough to make an ETEC strain [15]. In addition, the prototypical ETEC strain H10407 chromosome is almost identical with the chromosome of E. coli K-12 strain MG1655 suggesting that the main event in the emergence of ETEC from E. coli is the acquisition of virulence plasmids carrying elt or est [16]. The variability in virulence gene and colonization factor combinations highlights the genomic diversity within the ETEC pathogroup [12]. These findings suggest that ETEC consists of genetically heterogeneous group of strains that have gained the ETEC-associated virulence genes by horizontal gene transfer. However, recent evidence, based on the sequence analysis of 362 ETEC isolates, shows that persistent plasmid-chromosomal background combinations exist in certain phylogenetic lineages [17].
Genomics and phylogeny of hybrid E. coli strains have not been studied widely. An exception is the German outbreak strain EAEC/STEC O104:H4, which was shown to form a distinct clade with other O104:H4 strains among EAEC and E. coli indicating that the outbreak strain has the chromosomal backbone similar to EAEC O104:H4 group [18]. In a recent study, STEC/ETEC hybrid strains of several serotypes were not found phylogenetically related [14]. This suggests that these strains may have arisen from several genetic backgrounds.
In the present study, we investigated human and bovine STEC/ETEC hybrid strains to determine their phylogenetic position among E. coli and to define the similarities and differences in their gene contents and virulence properties related to other DEC pathogroups. We used whole genome sequencing and whole genome mapping for comparative genomics between the STEC/ETEC genomes and the reference genomes of pathogenic and commensal E. coli and Shigella spp. It is crucial to understand the phylogeny of pathogenic bacteria to evaluate how they have evolved and to monitor the emergence of potential new pathogens.

Bacterial strains and reference genomes
The whole genomes of three E. coli strains possessing both STEC-and ETEC-associated virulence genes stx 1 or stx 2 and estIa were sequenced. The strains have been described in our previous studies [8][9]. The strain IH53473 (serotype O101:H -) was isolated from a 1.9-year-old infant with HUS in Finland and the strain IH57218 (serotype O2:H27) was isolated from a 7.3-year-old child with diarrhea in Finland [8] (S1 Table). The strain FE95160 (serotype O2: H2) was isolated from a bovine intestine sample in Burkina Faso [9] (S1 Table). For comparative genomics, publicly available complete and draft genomes of different DEC pathogroups, extraintestinal pathogroups, commensal strains, laboratory-adapted strains, and Shigella spp. strains were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/genbank) (S1 Table).

DNA extraction and whole genome sequencing
Genomic DNA was extracted using QIAGEN Genomic Tip 100/G (QIAGEN, Gaithersburg, MD, USA) according to the manufacturer's instructions. After the extraction, the intactness of the genomic DNA was verified by agarose gel electrophoresis and the quantity of the genomic DNA was measured by Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA).
Sequencing libraries were constructed according to the manufacturer's (Pacific Biosciences, Menlo Park, CA, USA) protocol. Sequencing was done on the PacBio RS instrument (Pacific Biosciences) with P4/C2 chemistry.
De novo genome assembly, genome annotation, and validation of the assembly by whole genome mapping The data collected from the PacBio RS instrument were processed and filtered using the single molecule real-time (SMRT) analysis software suite (Pacific Biosciences). Data were filtered by read quality (> 0.75) and read length (> 1000 bp). When processing continuous long read (CLR) data, raw reads from the SMRT Cells were split on adapter sequence resulting in 1 subread or CLR per zero-mode waveguides (ZMW). For SMRT de novo assembly, the HGAP pre-assembly workflow was used to generate long and highly accurate sequences. This was accomplished by mapping single pass reads to seed reads, which represent the longest portion of the read length distribution. Subsequently, a consensus sequence of the mapped reads was generated resulting in long and highly accurate fragments of the target genome. We further pruned reads from this pre-assembly pipeline that were < 3500bp. The pre-assembled reads were output to FASTQ format and further error corrected using the PacBioToCA utility in Celera Assembler (CA 7.0). The reads were then assembled with the CA 7.0 assembler using the BOGART algorithm with mer size of 25. Scaffolding was carried out using the CGW algorithm of the CA 7.0 assembler.
The final number of contigs per genome after the assembly was 10 for IH53473, 17 for IH57218, and 43 for FE95160. The three draft genomes were uploaded into Galaxy/CRS4 (Orione) [19] for automatic annotation with PROKKA (version 1.4.0) [20] using default settings. The whole genome sequence reads were deposited at NCBI SRA (study accession no. PRJNA269579). The draft genomes were deposited at NCBI Whole Genome Shotgun (WGS) database under accession number LFZH00000000 for IH53473, LFZJ00000000 for IH57218, and LFZI00000000 for FE95160.
Whole genome mapping (i.e. optical mapping) was used to correct the order of contigs and to detect any misassemblies. For this, the chromosomal DNA was digested using NcoI restriction enzyme. Whole genome maps were produced using Argus Optical Mapping System (OpGen Inc., Gaithersburg, MD, USA) as previously described [21]. To compare the sequence contigs to the respective whole genome map, the sequence contigs were restricted with NcoI in silico and the contigs were aligned against the map using MapSolver 3.2.0 software (OpGen Inc.).
In silico MLST, phylogrouping, and serotyping The allelic profiles of the seven genes, adk, fumC, gyrB, icd, mdh, purA, recA, used in the published protocol for E. coli MLST [26], were extracted using E. coli MLST application in Ridom SeqSphere+ program (Ridom GmbH) to determine MLST sequence types (ST) of the STEC/ ETEC genomes and reference genomes.
The three STEC/ETEC strains were previously serotyped using the traditional O-and Hantigen agglutination [8][9]. We now compared the previous results with the in silico serotyping results. The genes responsible for the expression of O-and H-antigens were detected from the genomes. The primers and probes to detect the respective O-and H-sequences were previously published [29][30]. The primer and probe search were done using Geneious 6.0.5 software (Biomatters Ltd).

Identification of prophage regions and stx-phage integration sites and testing for the production of Stx and ST toxins
PHAST tool (phast.wishartlab.com) [31] was used to identify prophage sequences in the STEC/ETEC genomes. A prophage region was considered to be intact if the completeness score was above 90, questionable if the score was between 60 and 90, and incomplete if the score was less than 60. The integration sites for the stx-phages were determined manually. The stx genes were located in the assembled contigs. Starting from the stx gene, the sequence upstream and downstream was screened for the phage-related genes using the BLAST tool [32] and Geneious 6.0.5 software (Biomatters Ltd). If the phage sequence was not contiguous, the phage was reconstructed joining the sequence contigs together with the guidance of the whole genome map. The interrupted gene adjacent to the phage integrase was designated as the phage integration site.
Production and titers of Stx and ST were determined. Stx was tested on the Vero cell assay at Statens Serum Institut (Copenhagen, Denmark). ST was determined using the GM1-ELISA method as previously described [33][34]. The prototypical ETEC strain H10407 was used as a control in the ST titration.
Identification of plasmid-associated sequences PlasmidFinder 1.2 [35] was used to identify the presence of plasmids in the three STEC/ETEC genomes. Identification was based on the detection of replicon sequences belonging to several known plasmid incompatibility (Inc) groups. The threshold for identification was set to 80%. The locations of the plasmid Inc groups were compared to the locations of known plasmidassociated genes estIa, hlyA, espP, and astA in the contigs of the draft genomes. PROKKA annotation reports were also utilized in the survey of the possible plasmid sequences.

Comparative genomics
To generate a phylogenetic tree depicting positions of the three STEC/ETEC strains, the genomes were compared with 73 published E. coli and Shigella spp. reference genomes, both completed and draft genomes (S1 Table). Phylogenetic analysis based on the genes common to all the genomes included in the comparison was performed using Ridom SeqSphere+ program (Ridom GmbH). Gene nomenclature used in the analysis was based upon the strain ETEC H10407 (accession no. FN649414.1). All the annotated coding sequences were imported from the ETEC H10407 genome to create a task template for core genome MLST (cgMLST). Required thresholds for gene identification were 90% identity to reference sequence and 99% alignment with reference sequence. If a single target gene resulted in more than one match in the genome, the whole target was excluded from the analysis. Altogether, 1341 targets were determined as the shared genes. These 1341 targets of all the 76 genomes included in the cgMLST analysis were concatenated into 76 continuous sequences and exported from Ridom SeqSphere+ as a multi-fasta file. The sequences were uploaded into Galaxy/CRS4 (Orione) [19] and aligned with MAFFT (version 0.1) [36]. The alignment was imported into Geneious 6.0.5 software. UPGMA dendrogram including BootStrap confidence values was produced using Jukes-Cantor genetic distance model within the Geneious tree builder tool. Among the reference genomes, there were five draft genomes that were previously characterized as STEC/ ETEC hybrids [14,37] and 14 draft genomes that represented 14 phylogenetic lineages L1-L14 of the ETEC pathogroup [17].
Restriction enzyme NcoI based whole genome maps were used to determine the degree of genomic identity between the three STEC/ETEC chromosomes. The maps were compared with each other and similarity percentage was calculated for each pair using MapSolver 3.2.0 software (OpGen Inc.). Whole genome maps were also used to generate phylogenetic tree where the three STEC/ETEC whole genome maps were compared with in silico NcoI restricted maps generated from completed reference chromosomal sequences (S1 Table) using UPGMA algorithm.

De novo genome assemblies
All three STEC/ETEC genomes remained as draft genomes including several gaps when aligned to whole genome maps produced from the chromosomal DNA. The chromosome sizes derived from whole genome maps were as follows: 5 097 783 bp, 5 123 796 bp, and 4 907 103 bp for strains IH53473, IH57218, and FE95160, respectively. The sequence contigs contained the plasmid DNA but the maps were only of the chromosomal DNA. In the genome IH57218, a misassembled area inside one contig was detected when aligned against the whole genome map. The other two genomes were assembled correctly according to the alignments against the maps.

Virulence genes harbored by STEC/ETEC strains
All three STEC/ETEC strains harbored multiple virulence genes ( Table 1). All the strains carried the genes clyA encoding cytolysin and shiA encoding shikimate transporter. The human strain IH53473 was the only strain possessing the gene espP, which belongs to the group of Serine Protease Autotransporters of Enterobactericeae (SPATE). No other SPATEs were detected. IH53473 possessed also genes irp1, irp2, and fyuA encoding yersiniabactin biosynthetic proteins and a receptor. IH53473 was positive for LEE pathogenicity island, which contains the gene eae encoding intimin, espA and espD encoding translocators, and escV, espF, and espH encoding type III secretion system structure and effector proteins. The human strain IH57218 was positive for Shigella enterotoxin 2. The bovine strain FE95160 was positive for astA encoding EAEC heat-stable enterotoxin I and aai pathogenicity island encoding type VI secretion system.
All the strains were negative for ETEC colonization factors. However, the strains harbored some of the tested virulence genes contributing to adherence: putative adhesin gene eaeH, putative outer membrane autotransporter adhesin gene yfaL, type 1 fimbria gene fimH, and common pilus subunit gene ecpA ( Table 1).
The results of in silico stx 1 and stx 2 subtyping were consistent with the previous results obtained by PCR: IH53473 stx 2a , IH57218 stx 2a , FE95160 stx 1a .
The virulence gene analysis revealed that strain IH53473 possessed a frame shift mutated estIa gene while the other two STEC/ETEC hybrids had intact estIa genes. To verify whether the mutation was real or a sequencing error, the gene estIa was amplified by PCR from all the strains and the PCR products were sequenced by Sanger sequencing. The result was confirmed: estIa in IH53473 had a single nucleotide deletion which resulted in a frame shift mutation and produced a premature stop codon. The translated polypeptide would be only 53 amino acids long, whereas the full length polypeptide consists of 73 amino acids.

In silico MLSTs, phylogroups, and serotypes
The STs of the human strains were of the previously established types: ST330 for IH53473 and ST10 for IH57218. The ST of the bovine strain FE95160 was novel. It was submitted to the E. coli MLST database (http://mlst.warwick.ac.uk/mlst/dbs/Ecoli) and was assigned with a new ST number ST4123.
Both human strains belonged to the E. coli phylogroup A. The bovine strain belonged to the cryptic clade I.
O-grouping results were consistent with the previous results: IH53473 O101, IH57218 O2, and FE95160 O2. However, H-typing results were different in two strains. IH53473, which had previously been typed as H -/non-motile had primer and probe binding sites for H33. FE95160, which had previously been typed as H2 had primer and probe binding sites for H25. H-typing result of strain IH57218 was consistent with the previous result H27.

Identified prophage regions and stx-phage integration sites and production of Stx and ST toxins
For strain IH53473, 16 prophage regions were identified, of which nine regions were intact, three regions were incomplete, and four regions were questionable. For strain IH57218, 17 prophage regions were identified, of which 12 regions were intact, and five regions were incomplete. For strain FE95169, 10 prophage regions were identified, of which seven regions were intact, and three regions were incomplete. The integration sites for the stx 2a -phages in IH53473 and IH5728 were identified in wrbA locus. Both phages were similar to phage P13374 (accession no. HE664024) by BLAST search. The integration site for the stx 1a -phage in strain FE95160 was between the genes ybhC and ybhB. The phage remained unidentified as no hits were found by BLAST search.

Plasmid-associated sequences
PlasmidFinder indicated several plasmid replicon sequences of known Inc groups in the STEC/ ETEC genomes. IH53473 had three plasmid replicons: IncQ2, IncFII (29), and IncXI. The plasmid-associated genes estIa, hlyA, and espP were placed in the same contig as IncFII (29). IH57218 had one plasmid replicon: IncFII. The plasmid-associated genes estIa and hlyA were placed in a different contig than IncFII. FE95160 had two plasmid replicons: IncFII(pSE11), and IncFIB(AP001918). The plasmid-associated genes estIa, hlyA, and astA were placed in the same contig as IncFIB(AP001918). According to the PROKKA annotation reports of IH53473 and FE95160, several plasmid-associated genes, such as RepFIB replication protein A, plasmid partitioning protein B, and plasmid stability protein, were located in the same contigs as the virulence genes estIa, hlyA, espP, and astA. According to the PROKKA annotation report of IH57218, several plasmid-associated genes were also found in the same contig with estIa and hlyA. However, no origin of plasmid replication was present in that contig but it was located in another one.

Comparative genomics analyses
The core genome phylogeny was inferred from the shared genes among a diverse set of E. coli and Shigella spp. genome sequences using cgMLST and sequence alignment. The analysis showed that different E. coli pathogroups are inter-mixed (Fig 1). For instance, STEC genomes can be found from nearly all branches of the UPGMA tree. The three STEC/ETEC strains did not form a single cluster. The two human STEC/ETEC strains IH53473 and IH57218 clustered with genomes representing ETEC, STEC, EAEC, and laboratory-adapted E. coli. If the cut-off point for the cluster was further extended within the genomes belonging to phylogroup A, other ETEC, laboratory-adapted and commensal E. coli genomes were included into the cluster. The bovine STEC/ETEC strain FE95160 formed a remote cluster with two STEC genomes. These three genomes belonged to cryptic clade I while other genomes included in the tree belonged to the actual phylogroups. The core genome phylogeny followed the phylogrouping results with one exception: STEC O5:H-97.0246 genome was separated from the rest of the phylogroup A genomes.
Whole genome maps of the STEC/ETEC strains were compared with each other. According to the map lengths, the human strains IH53473 and IH57218 both have approximately 5.1 Mb  (Fig 2). The bovine strain FE95160 has approximately 4.9 Mb chromosome, which is notably shorter than the human STEC/ETEC chromosomes. Comparison of FE95160 with IH53473 and with IH57218 indicated only a few homologous regions between the chromosomes (Fig 2). Based on the restriction map similarity, the chromosome of IH53473 is expected to demonstrate approximately 69% identity with that of IH57218, the chromosome of IH53473 approximately 12% identity with that of FE95160, and the chromosome of IH57218 approximately 16% identity with that of FE95160 (S1 Fig).
Whole genome maps of the STEC/ETEC chromosomes were compared to in silico maps of completed reference E. coli and Shigella spp. chromosomal sequences (Fig 3). The human strains IH53473 and IH57218 clustered with strains representing ETEC, commensal, and laboratory-adapted E. coli. However, the bovine strain FE95160 formed an outgroup in the UPGMA tree. The whole genome map similarity clustering is comparable to the clustering fashion in the cgMLST based UPGMA tree (Fig 1). Again, the STEC genomes can be seen in more than one cluster on the UPGMA tree. However, the number of genomes included into the whole genome map comparison is smaller due to the limited number of completed genomes available.

Discussion
This study investigated the virulence gene contents of three STEC/ETEC hybrid strains and their phylogenetic position in relation to other E. coli genomes. The information obtained in this study reveals the genomic diversity of STEC/ETEC hybrid strains and contributes significantly to our understanding of genomics and virulence factor repertoire of hybrid E. coli strains.
The varying combination of multiple virulence genes harbored by the STEC/ETEC hybrids showed that the strains were a mixture of several different E. coli pathogroup-associated properties. The presence of virulence genes stx and estIa associated with two different pathogroups in our strains confirmed their hybrid status. In addition, the strains harbored other virulence genes, some of which have been associated with ETEC, STEC, and EAEC. All STEC/ETEC strains were positive for clyA, which encodes cytolysin A and which has been associated with ETEC pathogroup [12]. The human strain IH53473 harbored the genes irp1, irp2, and fyuA for two yersiniabactin biosynthetic proteins and a receptor located in a high-pathogenicity island (HPI), which is prevalent in several EAEC isolates but rarely detected in other DECs [24,38]. HPI may contribute to virulence by offering an iron scavenging system for survival in the host. One SPATE gene, espP, was also detected in IH53473. SPATE is a family of extracellular proteases produced by the species belonging to Enterobacteriaceae and they have an impact on mucosal damage and colonization [39]. The German outbreak strain EAEC/STEC O104:H4 harbored a combination of several SPATEs which may have contributed to its heightened virulence [18]. The bovine strain FE95160 was positive for astA encoding EAEC heat-stable enterotoxin I. Unlike the toxin name indicates, astA can be harbored by strains of STEC, EAEC, EPEC, ETEC, and EIEC pathogroups [40] and even extraintestinal pathogenic E. coli [41]. FE95160 was also positive for aai pathogenicity island encoding putative type VI secretion system, which was also detected in the German outbreak strain EAEC/STEC O104:H4 [18]. marked by colors. The reference genomes STEC O139:H1 S1191, ETEC UMNF18, STEC O2:H25 7v, STEC O8:H19 MHI813, and STEC O73:H18 C165-02 were previously characterized as STEC/ETEC hybrids [14,37]. The reference genomes ETEC O6 E8, ETEC O6 E66, ETEC O78 E36, ETEC O25 E135,  ETEC O115 E21, ETEC ON3 E562, ETEC O169 E344, ETEC O148 E222, ETEC O27 E220, ETEC O114 E934, ETEC O159 E159, ETEC O15 E330, ETEC  O112ab E399, and ETEC ON5 E620 represent the phylogenetic lineages L1-L14 of the ETEC pathogroup, respectively [17].  Comparative Genomics of STEC/ETEC Strains All the STEC/ETEC strains possessed some of the genes eaeH, yfaL, ecpA, and fimH associated with adhesion [12,16,[42][43]. These genes may contribute to the adhesion while the strains lacked ETEC colonization factors. The genes ecpA and fimH have been associated with ETEC pathogroup [12]. It is not uncommon that ETEC strains are negative for ETEC colonization factors [24]. Thus, these colonization factor negative strains may have other virulence genes by which they can adhere. The human strain IH57218 was positive for the gene encoding Shigella enterotoxin 2, which can increase virulence [44]. All STEC/ETEC strains were positive for the gene shiA encoding shikimate transporter in a pathogenicity island involved in the suppression of host inflammatory response [45].
The phylogeny inferred from cgMLST and sequence alignment demonstrates that our STEC/ETEC hybrid strains do not form a single cluster. The phylogenetic placement of the human STEC/ETEC strains indicates a common ancestor with certain ETEC, STEC, EAEC, laboratory-adapted and commensal E. coli strains. On the contrary, the bovine strain FE95160 shares similar genetic background with two STEC genomes of bovine origin and they form a remote cluster in the phylogenetic tree. The phylogeny inferred from whole genome map similarity clustering supports these observations. Also the whole genome map comparison between the three STEC/ETEC strains is consistent with the remote phylogenetic position of the bovine strain since the human strains were shown to share more genetic elements with each other than with the bovine strain.
The previous studies have shown that both STEC and ETEC pathogroups are genetically versatile [11,[14][15]. Our observations on cgMLST and whole genome map similarity clustering support this. Both STEC and ETEC genomes were found from several branches in the two UPGMA trees. It has been suggested that the acquisition of the toxin genes may be all that is required to form ETEC and there are no specific chromosomal factors prerequisite for the enterotoxigenicity [15]. However, von Mentzer and colleagues [17] recently described several robust phylogenetic lineages in the ETEC pathogroup. Lineages L1-L10 possessed certain colonization factor and toxin gene profiles whereas lineages L11-L14 were always colonization factor negative. Their data showed that toxin allele profiles and colonization factor profiles were associated with certain chromosomal background. Thus, we included 14 ETEC draft genomes representing the ETEC lineages L1-L14 from von Mentzer and colleagues' study [17] into our cgMLST analysis to see if the STEC/ETEC genomes cluster with some of these genomes. The genomes representing the major ETEC lineages L1, L2, and L4 were the closest relatives of our human STEC/ETEC strains. The second closest relatives were the genomes representing colonization factor negative ETEC lineages L11 and L13. These results might indicate that STEC/ETEC hybrid strains also have genetic backgrounds linked with certain colonization factor and toxin gene profiles.
Even though all our STEC/ETEC strains did not cluster together, we found evidence that STEC/ETEC hybrid strains may have similarities in their chromosomal background. In our study, we combined our data with the previously sequenced five STEC/ETEC hybrid genomes in cgMLST [14,37]. Our human STEC/ETEC strains clustered with STEC O139:H1 S1191 [14], which was isolated from a pig suffering from edema disease and possesses estIb and stx 2e genes, and with ETEC O147 UMNF18 [37], which is also of porcine origin and possesses estIa, estIb and stx 2e genes. Our bovine STEC/ETEC clustered with STEC O2:H25 7v [14], which has been isolated from cattle feces and possesses estIa and stx 2g genes. The other two STEC/ETEC hybrids from Steyert et al. study [14], STEC O8:H19 MHI813 and STEC O73:H18 C165-02, did not cluster with the rest of the STEC/ETEC hybrids. We suggest that certain genetic background may favor the acquisition of ETEC virulence genes and stx-phages.
E. coli MLST ST10, which includes our human STEC/ETEC strain IH57218, is common among the ETEC strains from human origin [15]. Interestingly, there is a previous report of an UPEC strain having ST330 with the estIa gene [46], as is the case with our human STEC/ETEC IH53473 strain, although the latter had a frame shift mutation in the estIa gene. IH53473, which was shown to be SPATE espP positive, clustered together with an EAEC genome in cgMLST based UPGMA tree. EAEC often harbor several SPATEs, and espP is class I cytotoxic SPATE [39]. IH53473 was shown to possess a variety of virulence factors that may have had an effect on its pathogenic potential since the strain was isolated from a patient with HUS.
Two of the STEC/ETEC strains were able to produce both Stx and STIa. The ability to produce both toxins may result in increased virulence. All three STEC/ETEC strains had a very high Stx titer of 1:100,000 dilution. There was no difference in Stx cytotoxicity between the human and bovine STEC/ETEC strains. Human isolates possessing stx 2a , stx 2c , or stx 2dact show generally higher cell cytotoxicity compared to stx 2b , stx 2e , or stx 2g [47]. Our human STEC/ ETEC strains possessing stx 2a were isolated from patients with HUS or diarrhea. IH53473 also carried eae. Clinically relevant STEC, and especially eae positive STEC, have shown high cytotoxicity levels compared to food isolates, which have shown more diverse cytotoxicity levels [47]. The human strain IH57218 and the control strain ETEC H10407 showed higher STIa production rate compared to the bovine strain. The human strain IH53473 produced Stx but not STIa. The result is consistent with the detected frame shift mutation in the estIa gene.
In silico O-grouping and H-typing may be used to replace agglutination-based serotyping. In the present study, O-grouping results were consistent with the previous results obtained by antiserum agglutination [8][9]. However, some of the H-grouping results were not consistent. Strain IH53473, which was previously typed as H-/non-motile had primer and probe binding sites for H33. Since the strain was not motile, the phenotypic H-antigen agglutination test could not be performed. Strain FE95160 was previously typed as H2. However, in silico method showed primer and probe binding sites for H25. The results may be due to the fact that the Hagglutination schema [48] has only one reaction difference between H2 and H25. In silico typing seems to be a good choice especially for non-motile strains which cannot be typed by agglutination due to the lack of the expression of H-antigen.
The genes estIa and astA may be plasmid-or chromosome-associated [1]. The locations of the plasmid replicon sequences identified in the STEC/ETEC genomes were associated with the locations of potentially plasmid-associated genes estIa, hlyA, espP, and astA in the genomes of IH53473 and FE95160. This may indicate that the estIa genes of IH53473 and FE95160 and astA gene of FE95160 are located on plasmids rather than the chromosome. The assembly pipeline favors the assembly of plasmid sequences in separate contigs. Also estIa in IH57218 genome may be plasmid-associated. Even though the plasmid replicon sequence was detected in another contig than estIa, this may be due to fragmentation of plasmid sequence in two separate contigs during the assembly.
All our STEC/ETEC strains possessed several prophage regions in their genomes. It is typical of STEC genomes to harbor prophages and other integrative elements [13]. The identified stx-phage integration sites in the STEC/ETEC genomes were the usual ones. In the IH53473 and IH57218 genomes stx-phage interrupted the wrbA gene. The gene wrbA encoding tryptophan repressor binging protein is a common integration site for the stx 2 phages [49]. The FE95160 genome had the stx-phage integration site between the genes ybhC (predicted pectinesterase) and ybhB (predicted kinase inhibitor). We screened the E. coli reference genomes for similar cases and noticed that also E. coli UMN026 (accession no. NC_011751.1) and E. coli EDL933 (accession no. NC_002655.2) possess phages on this site.
The Stx-phages and their ability to transfer genes horizontally play an important role in the evolution of E. coli and development of STEC variants [50]. Also the plasmids carrying ST and LT toxin genes can be transferred between E. coli strains [15]. However, ST toxin can also be encoded by transposons, which are mobile genetic elements as well [1]. ETEC plasmids can carry both ETEC colonization factors and est [12,51]. Some ETEC strains are negative for colonization factors [24], as was the case with our STEC/ETEC strains. The survey of plasmidassociated sequences indicated that estIa in the STEC/ETEC hybrids may be associated with plasmids. Nevertheless, it is not surprising that both ETEC and STEC strains can arise in different phylogenetic groups and that they do not necessarily have a clonal lineage. Some commensal strains may have pathogenic potential since certain parts of their genomes may act as genetic repositories for virulence factors [10]. Acquisition of appropriate pathogenic features may cause a transformation of a commensal strain to a pathogen or a strain of one pathogroup to a hybrid pathogroup.

Conclusions
The comparative genomics of the STEC/ETEC hybrid strains showed that STEC-and ETECassociated virulence genes can co-exists in strains originating from different phylogenetic lineages. Whole genome sequencing techniques enable fast typing and possibility to screen several genetic markers simultaneously making it easy to detect virulence genes associated with several pathogroups. An infection with a hybrid pathogenic strain may result in more severe disease in a patient. These strains may also have increased spreading potential. Therefore, their emergence should be taken into account in modern strain typing and in epidemiological surveillance of E. coli infections.