A Role for Tn6029 in the Evolution of the Complex Antibiotic Resistance Gene Loci in Genomic Island 3 in Enteroaggregative Hemorrhagic Escherichia coli O104:H4

In enteroaggregative hemorrhagic Escherichia coli (EAHEC) O104 the complex antibiotic resistance gene loci (CRL) found in the region of divergence 1 (RD1) within E. coli genomic island 3 (GI3) contains bla TEM-1, strAB, sul2, tet(A)A, and dfrA7 genes encoding resistance to ampicillin, streptomycin, sulfamethoxazole, tetracycline and trimethoprim respectively. The precise arrangement of antibiotic resistance genes and the role of mobile elements that drove the evolutionary events and created the CRL have not been investigated. We used a combination of bioinformatics and iterative BLASTn searches to determine the micro-evolutionary events that likely led to the formation of the CRL in GI3 using the closed genome sequences of EAHEC O104:H4 strains 2011C-3493 and 2009EL-2050 and high quality draft genomes of EAHEC E. coli O104:H4 isolates from sporadic cases not associated with the initial outbreak. Our analyses indicate that the CRL in GI3 evolved from a progenitor structure that contained an In2-derived class 1 integron in a Tn21/Tn1721 hybrid backbone. Within the hybrid backbone, a Tn6029-family transposon, identified here as Tn6029C abuts the sul1 gene in the 3´-Conserved Segment (-CS) of a class 1 integron generating a unique molecular signature that has only previously been observed in pASL01a, a small plasmid found in commensal E. coli in West Africa. From this common progenitor, independent IS26-mediated events created two novel transposons identified here as Tn6029D and Tn6222 in 2011C-3493 and 2009EL-2050 respectively. Analysis of RD1 within GI3 reveals IS26 has played a crucial role in the assembly of regions within the CRL.


Introduction
The German outbreak of E. coli O104:H4, and a smaller number of sporadic cases of E. coli O104:H4 infections that followed a few weeks later in France, resulted in an unusually high incidence of haemolytic uremic syndrome (HUS) (850 cases) (as well as bloody diarrhoea (4320 cases) and 82 deaths (total numbers)) [1]. Isolates from these outbreaks, including 2011C-3493 isolated from a US patient with a history of travel to Germany, express Shiga toxin 2, (stx 2 ) which is typically associated with enterohemorrhagic (EHEC) strains. The outbreak strains are however phylogenetically related to enteroaggregative (EAEC) strains. The hybrid features of O104:H4 indicated that they belong to a new lineage known as the enteroaggregative haemorrhagic E. coli (EAHEC). Strains isolated from the German outbreak and the ones from the sporadic infections in France form a separate clade (Clade 1) distinct from historical enteroaggregative O104:H4 isolates,-such as strain O104:H4 55989 recovered from Central Africa in 1995 [2][3][4]. EAHEC O104:H4 has also caused cases of HUS and bloody diarrhea in the Republic of Georgia. Strains isolated in 2009 from the Republic of Georgia (2009EL-2050, 2009EL-2071 are closely related to the Clade 1 strains but cluster distinctly from them and the historical isolates [2]. The 2011 outbreak strains and subsequent isolates recovered from sporadic cases of HUS in France were found to be resistant to multiple antibiotics including third generation cephalosporins (ceftiofur and ceftriaxone), ampicillin, streptomycin, trimethoprim, sulfamethoxazole, tetracycline and sulfisoxazole, severely limiting treatment options [4][5][6][7]. Unique to the 2011 outbreak strains was a large IncI1 plasmid encoding the CTX-M-15 cephalosporinase [8]. The plasmid has been sequenced [8] and extensively characterised in EAHEC O104:H4 isolates, but a detailed analysis of the antibiotic resistance genes and their arrangement in Genomic island (GI3) has not been determined.
The acquisition and loss of mobile genetic elements can play an important role in the emergence of new pathogens. The acquisition of virulence genes and antibiotic resistance are considered to be significant events in the emergence of new pathogens because of the potential to impact fitness and transmissibility [9]. Comparative whole genome based analyses of O104:H4 isolates has revealed diversity between EAHEC O104:H4 strains [2,4]. Closed genomes now exist for three EAHEC O104:H4 strains including a representative of the German outbreak strain 2011C-3493 and two strains from the republic of Georgia (2009EL-2050. The closed genome sequences provide contextual information of the precise arrangement of resistance genes and is important for understanding the molecular events that generate complex antibiotic resistance gene loci (CRL) [3,10], [11]. Such closed genomes allowed us to determine the precise arrangement of antibiotic resistance genes within GI3 and the molecular events that created the CRL.
GI3 is a hotspot for genetic events within O104:H4 genomes [2,3] and is unique to Clade 1 O104:H4 strains [3]. It is an important region within O104:H4 genomes because it has previously been demonstrated to be mobile [12,13]. GI3 targets a widely-dispersed genomic hotspot consisting of a target 23 bp sequence and consequently it may be significant in the emergence of future MDR pathogens in the Enterobacteriaceae. Moreover, it houses a CRL containing the mercury resistance gene cluster (mer-module) and the virulence factor Ag43, implicated in biofilm formation [3]. The CRL also contains genes encoding resistance to ampicillin (bla TEM-1 ), streptomycin (strAB), sulfisoxazole (sul1 and sul2), tetracycline (tetA(A)) and trimethoprim (dfrA7). Notably, comparative genome studies of O104:H4 were the first to identify the bla TEM1 -sul2-strA-strB gene cluster in a chromosomal location. Previously, this gene cluster had only been identified on plasmid backbones [14][15][16][17][18]. For example, an IncHI2 plasmid from Salmonella enterica serovar Typhimurium from Australia carries a sul2-strA-strB gene cluster together with plasmid replication genes repA and repC adjacent to a module encompassing the bla TEM-1 gene [15]. On the plasmid, these genes comprise a large part of Tn6029, a transposon flanked by direct copies of IS26 [15]. The molecular events that led to the construction of Tn6029 have been described and involve the insertion of Tn1, Tn2 or Tn3 in plasmid RSF1010 followed by the independent insertion of three copies of IS26 [15]. In 2011 a second variant of the transposon, Tn6029B [18], was described within the sequence of an IncH1 plasmid, (pHCM1 [30]) isolated from a Salmonella enterica serovar Typhi strain from Vietnam in 1993.
Here, we present the key defining features of the CRL within GI3 of the finished genomes of the 2011 German outbreak strain, 2011C-3493 (CP003289.1) and the 2009 Republic of Georgia strain, 2009EL-2050 (CP003297.1). Our analysis indicates that IS26-mediated events have played a major role in shaping GI3 in EAHEC O104:H4. We propose an evolutionary model for the CRL seen in EAHEC O104:H4 isolates,-based on the genetic signatures we identified from the closed genomes and iterative BLASTn searches of the microbial genome database.

Data mining
Defined sequences spanning regions within RD1 in strains 2011C-3493 and 2009EL-2050 were downloaded from GenBank and aligned using Mauve version 2.3.1 [29] as described below. The automated annotations available through NCBI genome database were manually curated using the NCBI ORF finder and iterative BLASTn and BLASTp searches. Number of nucleotides spanning the priming sites of primers L1 and JL-D2 [19] were used to determine the precise location of Tn6029-family transposons within RD1 using AmplifyX version 3.1 software.

BLASTn analysis of draft O104:H4 genomes
Fifty three draft and complete genomes of Escherichia coli O104:H4 isolates (taxid:1038927) available in the GenBank NCBI microbial genome database (on 21 st July 2014) were queried with BLASTn using sequences representing different regions within the RD1 structure seen in 2009EL-2050 and 2011C-3493, and proposed to represent a progenitor in this study (see details in results section). The DNA sequence of segments 1 and 2, were derived from the genome sequence of isolate 2011C-3493 while that of fragments 3, 4 and 5 were derived from 2009EL-2050. BLAST alignments that exhibited greater than 99.95% identity over 100% of the query sequence length against E. coli O104:H4 genomes were deemed significant and analysed in detail here. BLAST hits showing sequence identity against varying lengths of the input sequence spanning Tn6029 (Fragment 4) were also considered because (1) the region is predicted to be evolving rapidly [14,16,17,27,30] and (2) the region had multiple copies of an insertion element which typically generate scaffold breaks during assembly. A visual representation of locations of the different input query start positions as it appears in S3

Results
Genetic features of RD1 that distinguish O104:H4 strains 2009EL-2050 and 2011C-3493 To decipher the salient features of RD1 in 2009EL-2050 (Fig. 1A) and 2011C-3493 (Fig. 1B) we carried out an in-depth sequence analysis and annotation of the regions in both strains. Our analysis showed that the basic backbone of the CRL that defines RD1 in O104:H4 strains comprise a chimeric Tn21/Tn1721 structure with a Tn1721-associated tetA-tetR-pecM module adjacent to a mercury resistance module from Tn21. In both isolates the CRL in RD1 (Fig. 1A and 1B) harbours a class 1 integron, typical of the In2 family, with a dfrA7 cassette that encodes resistance to trimethoprim [31]. However, the class 1 integron in 2011C_3493 is inverted compared to how it appears in 2009EL-2050 (See Fig. 1A and 1B). The 3´-conserved segment (3´-CS) of the class 1 integron in both strains is partially deleted by the insertion of a composite transposon (see later) and comprises qacEΔ1 and a portion of the sul1 gene (79 nucleotides at the 3´end of the sul1 gene). The deletion in sul1 and remaining regions of the 3´-CS probably arose by insertion of a composite IS26 transposon in the 3´-CS followed by an IS26-mediated deletion event [19]. These genetic events generated variations in the sequences within RD1 in these strains and will be detailed below.
Downstream of the composite IS26 transposon within RD1 is the remnant of ΔtniA missing 428 nucleotides of the gene and the IRt inverted repeat, both of which represent key structural components of clinical class 1 integrons (Fig. 1). A mercury resistance module identical to that found in Tn21 is present beyond IRt. The insertion site of the class 1 integron in the mercury module found in EAHEC O104:H4 is typical of the In2-family. The CRL carries a second truncated copy of a tniA gene (indicated by ΔtniA ÃÃ in Fig. 1), missing 996 nucleotides from the 3é nd of the gene. ΔtniA ÃÃ (Fig. 1) and its associated IRt is found adjacent to the tetA-tetR-pecM module, an orientation identical to that seen in Tn1721-derived transposons. These data suggests that the two copies of ΔtniA have separate origins.
Notably the 3´-CS of the class 1 integron in both 2011C-3493 and 2009EL-2050 comprises 1210 nucleotides and includes the qacEΔ1 gene plus a partial copy of the sul1 gene. The sul1 gene is typically found as a complete ORF in the 3´-CS of most clinical class 1 integrons. An IS26-mediated deletion resulted in the loss a region of the 3´-CS leaving only 79 nucleotides at the 3´end of the sul1 gene. Previously we have shown that IS26-mediated deletion events create novel signatures that can be used in tracking lateral movement of CRL within bacterial populations [16,17,19,20]. In-silico PCR simulation using primers in intI1 (L1) and IS26 (JL-D2) (see methods) is expected to generate a novel 2082 bp amplicon that can be used as a genetic signature to test for the presence of the CRL in EAHEC O104:H4 ( Fig. 2A). Notably, BLASTn analysis of the sequence of the 2082 bp amplicon indicated that it is identical to a homologous fragment in pASL01a (27,072 bp) [27], a small plasmid circulating within commensal E. coli in West Africa. pASL01a has a backbone of 5,168 bp that carries the plasmid replication genes and Tn21-derivative transposon, TnASL01a [27]. Transposon TnASL01a encodes resistance to ampicillin, streptomycin, sulfathiazole and trimethoprim, has a class 1 integron carrying dfrA7 (trimethoprim resistance) gene and an IS26 transposon which carries the bla TEM1 -sul2-strA-strB gene cluster [27]. The composite IS26 transposon in pASL01a appears at precisely the same location as in RD1 of 2011C-3493, a remarkable observation that supports the hypothesis that these derivate Tn21 transposons share evolutionary history.
BLASTn analysis of the 2082 bp amplicon sequence also generated a partial match with plasmid pAKU_1 [32]. Alignments revealed the 2082 fragment is identical across the entire query sequence except for a gap of 260 nt. In pAKU_1, a portion of the orf5 gene (which is often found associated with 3´-CS of clinical class 1 integrons) is present, which is missing in 2011C-3493 and 2009EL-2050. Detailed sequence similarity searches confirmed that pAKU_1 retains binding sites for primers L1 and JL-D2 and the diagnostic PCR is expected to generate a 2342 bp fragment (Fig. 2) clearly distinguishing the evolutionary histories of the CRL found in pAKU_1 with pASL01a, 2009EL-2050 and 2011C-3493.

Evidence of Tn6029-derived regions within RD1
Within the 3´-CSΔ and the IRt of the class 1 integron in outbreak isolate 2011C-3493 are coupled IS26 elements (indicated by a box in Fig. 1B) which have their respective transposase genes facing away from each other, followed by repA-repC-sul2-strA-strB module and a third copy of IS26. Boundaries of the insertion element IS26 are defined by the presence of 14 nucleotide inverted repeats (TTTGCAACAGTGCC) at either end (IR left or IR l and IR right or IR r ). Insertion of an IS26 creates a direct repeat of eight nucleotides at the insertion site. Eight nucleotides long sequences directly abutting IS26 can therefore be used as a signature to track the evolutionary events that constructed the CRL [19]. Within the coupled IS26 structure we i) identified a deletion of 10-nucleotides of IS26-IR l and ii) addition of an eight nucleotides CCCCATAT. As a consequence, a 12-nucleotide spacer signature, TTTGCCCCATAT, separates the two copies of IS26 in the coupled IS26 structure (Fig. 1B). Independent BLASTn analysis of the module consisting of the repA-repC-sul2-strA-strB bounded by two copies of IS26 indicated 100% identity to the module seen in pASL01a, pHCM1 and other plasmids that carry Tn6029-family transposons.
In isolate 2009EL-2050, the region flanked by direct copies of IS26 between IRi and IRt had two separate modules, one which had ΔtnpR-bla TEM1 -ΔCR2 genes, while the other had repA-repC-sul2-strA-strB genes (Fig. 1A). Independent BLASTn analysis of the entire IS26 transposon structure containing both the ΔtnpR-bla TEM1 -ΔCR2 and repA-repC-sul2-strA-strB modules showed 99.99% sequence identity to transposon Tn6029B, first described in plasmid pHCM1 (Fig. 2B). Based on our analysis of the CRL present in the two genomes the simplest likely progenitor that can give rise independently to the two CRL via separate micro-evolutionary events within a very short time is depicted in Fig. 3. Details of these micro-evolutionary events are explained below.
A comparison of the CRL found in strains 2011C-3943 and 2009EL-2050 showed: i) differences in the number, location and orientation of IS26 elements, ii) the orientation of the class 1 integron (5´-CS and deleted 3´-CS) in 2009EL-2050 is inverted compared to that of 2011C-3493 and iii) that the module containing bla TEM-1 gene is absent in the outbreak strain 2011C-3493 but present in progenitor strain 2009EL-2050 ( Fig. 1B and 1A respectively).
pASL01a deserves special mention in this context as it carries an identical copy of the class 1 integron containing dfrA7 with an IS26-flanked transposon inserted at an identical location of the 3´-CS as is found in both O104:H4 genomes: 2009EL-2050 and 2011C-3493. Our analysis indicates that TnASL01a contains a novel variant of Tn6029. The Tn6029 variant is characterized by an 11 bp deletion in the ΔtnpR-bla TEM-1-ΔCR2 segment seen in Tn6029B and by the inverted orientation of the module compared to the structures seen in Tn6029 or Tn6029B. We propose the name Tn6029C for the Tn6029 variant present in pASL01a. Characteristic features of Tn6029 variants and the unique attributes that distinguish Tn6029B from Tn6029C are shown in S2B Fig.  Tn6029 and variants Tn6029B and Tn6029C each carry the repA-repC-sul2-strA-strB gene cluster acquired from RSF1010 and bla TEM-1 from Tn1, Tn2 or Tn3 [15,33]. The description of the events that led to the formation of Tn6029 has been reported [15]. Tn6029B [18] differs from Tn6029 by loss of 85 nucleotides in ΔCR2 gene. Tn6026C was likely formed by an IS26mediated inversion of the bla TEM-1 module followed by deletion of 11 nucleotides within the Tn2-derivate (S2B Fig.). This inversion event in Tn6029C has also reversed the characteristic eight nucleotide direct repeat signature sequences that abut IS26 elements in Tn6029B (Fig. 2B). Apart from CTCGCGCC in pHCM1, all eight nucleotide signature sequences adjacent to the different copies of IS26 present in Tn6029B and Tn6029C were identical to that seen in RD1 in isolates 2009EL-2050 and 2011C-3493 respectively (Fig. 2B).

Analysis of RD1 in EAHEC O104:H4 genomes
We interrogated the microbial genome database in NCBI for the presence of RD1 in GI3. Our searches were restricted to complete and draft genomes of E coli O104:H4 isolates (taxid: 1038927) that aligned to five separate fragments spanning RD1 (See Fig. 3). The rationale for our BLASTn approach resides in our hypothesis that the CRL seen in genomes 2011C-3493 and 2009EL-2050 are the product of independent micro-evolutionary events and therefore we did not want to restrict our searches to the specific structures seen in isolates 2011C-3493 or 2009EL-2050. A summary of the entire BLASTn searches is presented in Table 1, while detailed results of all the BLAST analysis are presented in S1-S5 Tables. On the 20 th of July 2014, the GenBank microbial genomic database consisted of 53 E coli O104:H4 genomes including the three completely finished genomes described by Ahmed et al., [2] and five well-scaffolded draft genomes with an N50 of 1Mbp [3]. Notably, BLASTn analysis of Fragment 2 (Fig. 3), which includes the 2082 bp signature sequence generated matches with 100% sequence identity in 35 of the 53 O104:H4 genomes. BLASTn searches using Fragment 1 (Fig. 3) which spans a portion of GI3 and its junction with the Tn21-specific transposition module and a small overlapping portion with Fragment 2 marking the beginning of integron In2 showed 100% identity in 42 of the 53 genomes. Fragment 4, which spans the Tn21 mer-module (Fig. 3), showed 100% identity in 37 genomes. Thirty-six genomes also had the Tn1721-derived tetA-tetR-pecM module and the associated ΔtniAÃÃ gene. Table 1 depicts a summary of BLASTn searches using Fragments 1-5. Our data indicates that 22 of the 53 (~42%) O104:H4 genomes in the NCBI microbial genome database have all five fragments that span the progenitor RD1 structure depicted in Fig. 3. Genomes Ec11-4984 (AHOU01000000), Ec11-4988 (AHOY01000000) and Ec11-4986 (AHOW01000000) carry Tn6029B in addition to the other four fragments spanning the CRL in our proposed progenitor  structure (see Fig. 3). While RD1 in isolate Ec11-4984 (AHOU01000000) is split into two scaffolds, isolates Ec11-4988 and Ec11-4986 have a complete RD1 in one supercontig. Collectively our data supports the presence of the progenitor structure/s depicted in Figs. 3 and 4 in EAHEC O104:H4 isolates from around the world. Isolate Ec12-0466 (AIPR01000000) has a significant portion of the Tn6029B and all other segments that constitute the CRL in RD1 in two separate contigs [3]. The contig that has Tn6029B, Tn21 and the Tn1721 modules starts at a point just past the bla TEM1 gene of the ΔtnpR-bla TEM1 -ΔCR2 module. Similar scenarios were noted in 14 other incomplete genome sequences (S3 Table). Given that assembly of genome sequences using next generation sequencing technologies is problematic, specifically in regions that have multiple copies of any insertion sequences, our search of specific fragments that contain distinguishing features of CRL identified from our analysis of the two completely finished genomes (2011C-3493 and 2009EL-2050) suggests that variants of the proposed progenitor/s are most likely present within GI3 of the majority of O104:H4 genomes.
The chimeric Tn21-Tn1721 backbone present in the CRL of RD1 The chimeric backbone structure seen in isolates 2011C-3943 and 2009EL-2050 has most likely formed by a homologous recombination event between two In2 derivatives, one of which is identical to that seen in plasmid pASL01a (NC_019091.1) [27] and the other found in Salmonella enterica plasmids like pYT2 (AB605179.1) or pSal8934b (JF274992) [34] (see S2A Fig.). The fact that the two copies of ΔtniA (ΔtniA and ΔtniAÃÃ seen in Fig. 1) found in the CRL can be traced to their exact locations in both plasmids supports this contention. pASL01a or variants of it are likely progenitors of the region encompassing the In2 integron because it has the 2082 bp ( Fig. 2A) molecular signature seen in 35 of 53 sequenced EAHEC O104:H4 strains. In the outbreak strain 2011C-3943 the tetA gene has been interrupted by the insertion of IS181b at a time post homologous recombination in the Tn21-Tn1721 backbones, as 10 nucleotide direct repeats of the sequence at the insertion site of IS181b and directly abutting the inverted repeats of IS181b are clearly evident. The progenitor (Fig. 3) from which RD1 in O104:H4 isolates 2011C-3943 and 2009EL-2050 evolved was likely a hybrid structure that included an In2 derivative as the insertion site of the class 1 integron on the Tn21 backbone is characteristic of In2 integrons. Our BLASTn analysis of the O104:H4 microbial genomic database using overlapping fragments 4 and 5 (see Fig. 3) which span regions of RD1 containing the Tn21-specific and the Tn1721-derived modules of CRL (Table 1) provide evidence that these hybrid structures are frequently present in O104:H4 EAHEC.

Discussion
Multiple comparative genomic studies on both finished and draft genomes of EAHEC O104: H4 [4,35] indicate that RD1 within GI3 is a hotspot for micro-evolutionary events mediated by mobile genetic elements, particularly IS26 [2]. The preponderance of IS26 in GI3 indicates that clinically-relevant antibiotic resistance genes known to be mobilised by IS26 including bla SHV-11, bla SHV-12 (β-lactam resistance) [11,36,37], qnrB19 (quinolone resistance) [14,36] and aphA1 (kanamycin/neomycin resistance) [15] can readily recombine within the CRL in O104:H4. Furthermore, homologous recombination events involving IS26 is well known to drive the transfer of genetic loci between chromosome and plasmid backbones [11,38], and has played a central role in the evolution of new transposon-like structures (This study, [15,40]). As a consequence, evolution of resistance to clinically important antibiotics in O104:H4 should be monitored.
Our analyses indicate that two separate evolutionary scenarios (Fig. 4) shaped the CRL in EAHEC O104:H4 isolates 2011C-3943 and 2009EL-2050. The formation of a chimeric Complex Antibiotic Resistance Gene Loci in EAHEC O104:H4 backbone structure formed by a homologous recombination event between two In2 derivatives is a key event in the creation of a Tn21/Tn1721 hybrid transposon that represented a progenitor structure shown in Fig. 3. The CRL seen in 2009EL-2050 and 2011C-3493 evolved separately from Tn6029B and Tn6029C respectively. The sequence of eight nucleotides adjacent to IS26 elements in the CRL shown in Fig. 2B provides irrefutable evidence that IS26 played a key role in the evolution of the CRL seen in EAHEC strains 2011C-3493 and 2009EL-2050. The precise events that created the CRL in 2009EL-2050 is depicted in Fig. 4A and B.
In isolate 2009EL-2050, a copy of IS26 inserted into the tnpM gene of Tn21, creating an eight base sequence duplication (Fig. 2B) at the insertion site, in the proposed progenitor. This event created inverted copies of IS26 flanking the class 1 integron allowing this region to undergo an IS26-mediated inversion. Inversion events driven by inverted copies of IS26 have been previously described [15]. These events generated a novel transposon flanked by direct copies of IS26 which we have identified here as Tn6222. Tn6222 comprises two modules, one composed of a functional class 1 integron containing a dfrA7 gene cassette and another with the bla TEM-1 gene (Fig. 4A). The class 1 integron in transposon Tn6222 has the ability to assemble more resistance genes via the acquisition of resistance gene cassettes and may continue to evolve from the resistance phenotype imparted by its present configuration.
The CRL in 2011C-3493 has evolved by a separate set of events mediated by the insertion of an IS26 within the boundaries of Tn6029C. IS26 increasingly plays a key role in the evolutionary events that underpin the formation of CRL in genomic islands among a range of emerging pathogens [21,39]. We propose that a copy of IS26 inserted into the IR L of the existing IS26 that abuts the bla TEM-1 gene in Tn6029C in the proposed progenitor (Fig. 4B). A subsequent deletion event removed one copy of IS26, the bla TEM-1 gene and ΔCR2 generating a new derivative of the Tn6029-family, identified here as Tn6029D (Fig. 4B) This loss of bla TEM1 in the outbreak strain is consistent with the formation of a translocatable unit [40] formed as a consequence of the insertion of an IS26 at the IR L of an existing IS26 present in the progenitor structure. While Tn6029D is clearly related to Tn6029C by virtue of the 12-nucleotide signature sequence (TTTGCCCCATAT), the loss of bla TEM-1 is unique (Fig. 4B). Evidence that the insertion of IS26 has driven the creation of the CRL in 2011C-3493 is found in the eight base characteristic sequences shown in Fig. 2B.
The progenitor structure depicted in S2A Fig. was created by a double reciprocal cross-over between the common regions in two In2 derivatives, one of which is identical to plasmid pASL01a (NC_019091.1) [27] and the other identical to that in Salmonella enterica plasmids like pYT2 (AB605179.1) or pSal8934b (JF274992) [34]. Although a similar structure is also found in pHCM1, this plasmid is unlikely to be a progenitor of the structure seen in RD1 as it lacks a class 1 integron [15,30]. The insertion of IS26 generates eight base direct repeats [41] at the site of insertion and is known to induce deletions of variable length creating unique molecular signatures. We acknowledge that one of the limitations in our evolutionary models is that they are based on polished genome sequences of only two EAHEC isolates. To counter this, we performed extensive BLASTn analysis of 53 O104:H4 E. coli genomes in the microbial genome database. Our analyses confirm the presence of our proposed ancestral structures and identified possible variants of RD1 found in the genomes of 2011C-3493 and 2009EL-2050. In our opinion the major hindrance to rigorous assessment of evolutionary history for complex resistance loci or microbial genomes as a whole is not the quality of available genome data but a lack of statistical models for the inference of genome evolution and the availability of computational tools implementing the models. As genomes evolve their loci undergo rearrangement, insertions, deletions, segmental duplication and lateral transfer. Although efforts to combine these events into a unified statistical model of evolutionary history are ongoing [42,43], no software implementing such a unified inference model is yet available.
The 2082 bp molecular signature has been observed in pASL01a (JQ480155.1), in the CRL of EAHEC O104:H4 German outbreak strain 2011C-3493 (CP003289.1) and strains 2009EL-2050 (CP003297.1) and 2009EL-2071 (CP003301.1) from patients with bloody diarrhoea in the Republic of Georgia, and in a clinical E. coli isolate (HM999792.1) from Sydney, Australia. In strain 2009EL-2071 the CRL in RD1 has been lost [2]. RD1 is located in GI3 adjacent to the selC tRNA gene in EAHEC [3]. GI3 likely presents an evolutionary advantage to strains that carry it because it carries ag43, a gene encoding the self-associating serine protease auto-transporter of Enterobacteriaceae (SPATE) that influences biofilm formation in EAHEC O104:H4 [44] and the yeeV/yeeU toxin/antitoxin system [35]. This view is supported by a study showing that kidney damage and virulence gene expression (stx 2 , aggR and pgaA) correlates with the ability of O104:H4 to form biofilms in germ-free mice [45]. GI3 is potentially mobile [12] and targets the 23 bp genomic sequences that are widely dispersed in E. coli genomes, suggesting it may be a significant player in future emerging pathogens. The homologous recombination event has likely taken place between two multiple antibiotic resistance plasmids pASL01a and plasmids from Salmonella like pYT2. The event created a CRL encoding dfrA7 (resistance to trimethoprim), bla TEM-1 (resistance to ampicillin), strAB (resistance to streptomycin), sul2 (resistance to sulfamethoxazole), qacEΔ1 (resistance to quaternary compounds), merA (resistance to mercury chloride) and tet(A)A (resistance to tetracycline). B: A cartoon depicting the structural differences and genetic signatures present in members of Tn6029-family of transposons, i.e. Tn6029, Tn6029B and Tn6029C. The top panel shows the structure of Tn6029 and the middle panel shows that Tn6029B is characterised by an 85 bp deletion in the bla TEM module (see asterix). In Tn6029C (bottom panel), the bla TEM containing module is orientated in the reverse direction compared to Tn6029 / Tn6029B and has lost an additional 11 bp from the deleted-CR2 region (indicated by the asterix). Tn6029C is the arrangement seen in pASL01a. (TIF) S1