An Investigation of the Diversity of Strains of Enteroaggregative Escherichia coli Isolated from Cases Associated with a Large Multi-Pathogen Foodborne Outbreak in the UK

Following a large outbreak of foodborne gastrointestinal (GI) disease, a multiplex PCR approach was used retrospectively to investigate faecal specimens from 88 of the 413 reported cases. Gene targets from a range of bacterial GI pathogens were detected, including Salmonella species, Shigella species and Shiga toxin-producing Escherichia coli, with the majority (75%) of faecal specimens being PCR positive for aggR associated with the Enteroaggregative E. coli (EAEC) group. The 20 isolates of EAEC recovered from the outbreak specimens exhibited a range of serotypes, the most frequent being O104:H4 and O131:H27. None of the EAEC isolates had the Shiga toxin (stx) genes. Multilocus sequence typing and single nucleotide polymorphism analysis of the core genome confirmed the diverse phylogeny of the strains. The analysis also revealed a close phylogenetic relationship between the EAEC O104:H4 strains in this outbreak and the strain of E. coli O104:H4 associated with a large outbreak of haemolytic ureamic syndrome in Germany in 2011. Further analysis of the EAEC plasmids, encoding the key enteroaggregative virulence genes, showed diversity with respect to FIB/FII type, gene content and genomic architecture. Known EAEC virulence genes, such as aggR, aat and aap, were present in all but one of the strains. A variety of fimbrial genes were observed, including genes encoding all five known fimbrial types, AAF/1 to AAF/V. The AAI operon was present in its entirety in 15 of the EAEC strains, absent in three and present, but incomplete, in two isolates. EAEC is known to be a diverse pathotype and this study demonstrates that a high level of diversity in strains recovered from cases associated with a single outbreak. Although the EAEC in this study did not carry the stx genes, this outbreak provides further evidence of the pathogenic potential of the EAEC O104:H4 serotype.


Introduction
The Enteroaggregative Escherichia coli (EAEC) group is a large, diverse group of diarrhoeagenic E. coli originally defined by their adherence to HEp-2 cells in a stacked brick formation [1]. Generally, EAEC are detected and identified using PCR targeting EAEC associated virulence genes that are predominately plasmid encoded, including a regulator of multiple plasmid virulence factors (aggR), the anti-aggregation transporter gene (aat) and the gene encoding dispersin (aap) [2][3][4]. AggR also activates the expression of the chromosomal aai genes encoding a Type VI Secretion System (T6SS) [5].
Virulence gene content associated with EAEC is highly variable between different strains, as illustrated in studies aimed at genotyping EAEC from a variety of clinical sources, healthy control groups and outbreaks [6][7][8][9]. In these studies, strains show inconsistent presence and concordance of EAEC virulence genes by PCR in specimens from symptomatic and asymptomatic cases. These data suggest that the full genetic component of this phenotype is not yet fully understood and, although most of these genes are found on the aggregative virulence plasmid, their inheritability is complex.
Early research on EAEC linked these strains to persistent diarrhoea in children in developing countries but EAEC have since been shown to be a significant cause of acute diarrhoea and important in the aetiology of intestinal infections in industrialized countries [10]. Two independent, large, prospective studies of diarrhoea aetiology conducted in the UK (1993)(1994)(1995)(1996) and USA (2002)(2003)(2004) reported a similar EAEC prevalence in patients with diarrhoea: 4.6% (160/3506) and 4.5% (37/823) in the UK and US studies, respectively, and 1.7% in control subjects from both studies [11,12]. Clinical symptoms include watery diarrhoea, often with mucus, low grade fever, abdominal pain, nausea and vomiting [10].
Several EAEC foodborne outbreaks of gastroenteritis have been documented, notably in Japan, the UK and Italy [13][14][15]. Recently, a strain of enteroaggregative Shiga toxin-producing E. coli O104:H4 was identified as the cause of a foodborne outbreak of bloody diarrhoea and haemolytic ureamic syndrome (HUS) in Germany and France [16][17][18][19][20]. Case-control, cohort and trace back studies implicated fenugreek sprouts from Egypt as the source of the infection [21]. Detailed and timely microbiological outbreak investigations were followed by whole genome sequencing of strains of E. coli O104:H4 by various international groups [17,19,[22][23].
In March 2013, a large outbreak of GI disease occurred in the North East of England and cases were linked to a food festival. Four hundred and thirteen cases reported illness including symptoms of persistent diarrhoea and abdominal pain immediately following the event, and a total of 592 cases were identified following an on-line questionnaire. One hundred and ten specimens were submitted to the regional Public Health England and local hospital laboratories. Using traditional culture methods, Salmonella enterica serotype Agona was isolated from 25 cases and 4 further cases had other Salmonella species. Cohort and trace back studies implicated a contaminated, fresh curry leaves from Pakistan as the source of the infection.
The low number of cases testing positive for Salmonella species raised the suspicion that this was a multi-pathogen outbreak and further testing using a pan pathogen PCR was requested by the Outbreak Control Team. Subsequently, strains of EAEC harbouring aggR were isolated from PCR positive faecal specimens. The aim of this study was to use whole genome sequencing to explore the genomic diversity of the 20 strains of EAEC harbouring aggR by determining their phylogenetic relationship, plasmid type and virulence gene content and to assess the likely contribution of each strain type to the reported symptoms of GI disease.

Microbiology
Retrospectively, 88 faecal specimens from cases associated with the outbreak were tested for the presence of other bacterial GI pathogens using a multiplex GI pathogens PCR [24]. Although the faecal specimens had been stored for over 10 weeks at 4uC, an attempt was made to isolate the pathogens detected by the multiplex PCR by testing individual colonies for the stx, ipaH and aggR target genes, associated with Shiga toxin-producing Escherichia coli (STEC), Shigella species and EAEC respectively. For faecal specimens positive for aggR, 20 colonies were picked from bacterial growth on MacConkey or Sorbitol MacConkey agar plates and retested using the same PCR. Those colonies harbouring the aggR genes were identified biochemically as E. coli and serotyped using antisera raised in rabbits to the E. coli somatic O antigens.
Library preparation and whole genome sequencing DNA was extracted for sequencing using the Wizard kit (Promega UK). Paired-end libraries were generated using the Illumina Nextera XT sample preparation kit. Automated  [25] was used to produce de novo assemblies of the sequenced paired-end fastq file. The number of contigs produced ranged from 221 to 552 per sample with N50s from 48338 to 192731 nucleotides.

Phylogenetic analysis
Illumina reads were mapped to the reference EAEC strain 55989 using BWA-SW [26]. The Sequence Alignment Map output from BWA was sorted and indexed to produce a Binary Alignment Map (BAM) using Samtools [27]. GATK2 [28] was Pseudosequences of polymorphic positions were used to create approximate maximum likelihood trees using FastTree [29] under the General time reversible (GTR) model of nucleotide evolution.

Multilocus sequence typing (MLST)
MLST types were identified by mapping the reads against all E. coli allele variants held in the Achtman MLST database (www. mlst.ucc.ie/mlst/dbs/Ecoli) using a modification of the SRST software [30].

Plasmid FIB/FII typing
Plasmid incompatibility groups were determined using the specific sequences for plasmid replicon types defined by Carattoli et al. 2005 [31]. These sequences were searched for using blastn against the assembled genomes. Retrieved IncF and IncI replicon sequences were extracted in silico and further characterised to sequence type level according to the new scheme described in the plasmid MLST database (pMLST: www.pubmlst.or/plasmid/)

BRIG analysis
Assembled genomes were loaded into BRIG as concentric rings [32] and compared against the pAA reference genome using blastn. pAA annotations from genbank file were added in the final ring.
Determination of the presence or absence of AAI operon T6SS components using BLAST AAI operon T6SS coding genes were extracted from the reference strain 55989 genbank file (http://www.ncbi.nlm.nih. gov/nuccore/NC_011748.1) and made into a BLAST database. Each of the assembled genomes was queried against the database using blastn to recover whether it had significant hits for each component of the AAI.

Data Submission
The short read sequence data has been deposited in the NCBI Short Read Archive under the BioProject PRJNA245029 Where the average coverage is less than 10% across the length of the gene the cell in the table is highlighted in bold. doi:10.1371/journal.pone.0098103.t002

Detection of multiple GI pathogens by PCR
Retrospectively, 88 specimens from cases associated with the outbreak were tested for the presence of other bacterial GI pathogens using a multiplex GI pathogens PCR [24]. A variety of bacterial GI pathogens were detected by PCR from 88 of the stored faecal specimens from cases associated with the outbreak including Salmonella (3 cases), STEC (5 cases) and Shigella (29 cases). The aggR gene was identified in 65 (75%) specimens. Twenty strains of EAEC harbouring aggR were isolated from the 65 PCR positive faecal specimens. No STEC or Shigella species were isolated.
Phylogeny of EAEC isolated from the cases associated with the outbreak Ten different serotypes and nine MLSTs were identified among the EAEC isolated from the outbreak cases (Table 1). The most commonly observed serotypes were O131:H27 (6), O104:H4 (5) and O20:H19 (2), and the most frequently identified STs, corresponding with these serotypes, were ST10, ST678 and ST278 respectively. SNP analysis confirmed that the strains were phylogenetically diverse between serotypes (Figure 1). Strains belonging to the same serotype clustered on the same branch of the tree, however, even within the same serotype, isolates were phylogenetically distinct. Figure 2 shows a phylogeny based on 3115 core SNPs of 14 strains of E. coli O104:H4 and illustrates the relationship between the EAEC O104:H4 strains in this study with sporadic strains of E. coli O104:H4 and the strain associated with the E. coli O104:H4 outbreak in Germany in 2011 [22,23]. Although none of the EAEC O104:H4 isolates in this study had the stx gene they share a common ancestor with the German outbreak strain 280/11 and the sporadic stx harboring enteroaggregative strains characterised by Grad et al (23). All five strains of E. coli O104:H4 isolated during this study share the MDR genomic island conferring resistance to ampicillin, the sulphonamides, streptomycin and tetracycline, and the S83A gyrA mutation in common with German outbreak strain and the closely related EAEC/STEC sporadic isolates from France. The EAEC O104:H4 strains isolated in this study are phylogenetically integrated with strains of EAEC/STEC suggesting either multiple gain or gain then loss of the stx phage within the O104:H4 serotype.
None of the EAEC O104:H4 isolates in this study had the stx gene or carried the extended spectrum beta lactamase (ESBL) plasmid characteristic of the 280/11 strain, although three other strains isolated during this study were identified phenotypically and genotypically as being ESBL-producers (Table 1).

Replicon types of the EAEC plasmids encoding the key enteroaggregative virulence genes (pEAEC)
Multiple replicon types were observed with multiple combinations of FII and FIB proteins, with all but three plasmids having both the FIB and FII replicon types (Table1). Plasmids of type FIB5_FII17 were carried by strains belonging to two serotypes, O131:H27 and O20:H19. The plasmid type FIB25_FII48 harboured by the strains of EAEC O104:H4 was the same FIB/ FII type described in the strains of E. coli O104:H4 linked to the 2011 German outbreak (Table 1).

pEAEC encoded virulence genes and genomic architecture
Several plasmid encoded genes associated with EAEC have been described in previous studies. These include the transcriptional activator aggR, the anti-aggregation transporter locus aat, the anti-aggregative dispersin protein aap (2-4), the aggregative adherence fimbriae (AAF) (30)(31)(32)(33)(34), the serine protease autotransporter toxin SepA [38] and the recently described putative isopentenyl isomerase (IDI) enzymes [39]. Table 2 shows the number of reads that mapped to these targets in each outbreak isolate. All of the strains, apart from E. coli O111:H4 designated 1060/13, had sequence reads that mapped to aggR, aat, aap and the putative IDI enzymes. This isolate originally tested positive with the aggR PCR subsequently tested negative following storage on Dorset Egg medium at room temperature. It is likely that this isolate lost the EAEC plasmid during storage. The serine protease sepA was present in 16 of the 20 EAEC strains isolated from the cases associated with the outbreak. Whilst the pEAEC virulence gene complement was conserved, the genomic context in terms of flanking IS elements was highly variable across the different plasmids ( Figure 3).
Five types of pEAEC associated AAF have been described [33][34][35][36][37] and all five fimbriae types were identified in the strains analysed during this study. Strains of EAEC belonging to serotypes O104:H4 and O131:H27 had AAF/I fimbriae, as seen in the aggregative plasmid of E. coli O104:H4 linked to the 2011 German outbreak. Those strains belonging to serotype O20:H19 had the Type IV fimbriae (HdaA) [36]. AAF/II, AAF/III and AAF/V fimbriae were detected in five strains belonging to five different serotypes but three harbouring the same plasmid type, FIB33_FII1 (Table 2).

AAI operon encoding the putative T6SS
A 117 kb pathogenicity island, first described in the chromosome of EAEC 042, has been implicated as an EAEC pathogenicity factor. Twenty-five contiguous genes (aaiA-Y) in this island were previously shown to be transcriptionally activated by the plasmid encoded AggR protein and encoded for a T6SS [5]. In the EAEC strains isolated from the outbreak cases described in this study, the AAI operon was present in its entirety in the strains belonging to serotypes O104:H4, O131:H27, O20:H19 and O55:H19, whilst the island was absent in the strains belonging to the serotypes O19a:H30, O?:H21 and O63:H12 (Figure 3). Table 3 shows the distribution of the putative T6SS genes, aaiA to aaiN, in the outbreak strains. In the EAEC strains designated 1060/13 and 0214/13 (serotypes O111:H4 and O?:H19 respectively), a contig with 84% identity to the AAI operon and no homology to the NCBI non-redundant database was identified. In the EAEC O?:H19 isolate this homologue to aai was co-located on a contig with a plasmid addiction system suggestive of a non-chromosomal location in these strains.

Discussion
Historically, outbreaks have been associated with strains of a single pathogen exhibiting similar, if not identical, phenotypic and genotypic characteristics. However, the multiplex PCR approach to detection of GI pathogens directly from faecal specimens has provided good evidence that many individual cases of diarrhoea  Table 3. The distribution of the putative T6SS genes aaiA to aaiN in the outbreak strains. aaiA and outbreaks of GI disease are associated with multiple pathogens [40][41]. Although there was clear microbiology evidence that established GI pathogens, such as Salmonella and Shigella species, played a significant part, the symptoms described by the cases and the presence of aggR in 75% of the specimens retrospectively tested by PCR, suggested that certain serotypes of the EAEC isolated, contributed to the GI disease associated with this outbreak. However, the variety of EAEC serotypes identified in the 20 strains isolated presented a complex picture.
Initially, it was suggested that the variation in serotype in the outbreak strains was masking a closer phylogenetic relationship. However, the phylogenetic tree created by comparing SNPs in the core genome showed that, although strains of the same serotype were relatively closely related, those of different serotypes were diverse. EAEC belong to several lineages with different evolutionary histories demonstrating independent acquisition of the plasmids encoding EAEC virulence genes [42]. Conversely, strains with a recent common ancestor, e.g. those that share an MLST sequence type, may have different pathotypes [43]. For example, E. coli O104:H4 ST678 has been shown to be STEC and EAEC [19]. The pathotype distribution is explained with multiple loss/ gain events of pathogenicity elements.
Although strains of EAEC have been shown to harbour a wide diversity of plasmids that encode the enteroaggregative phenotype even in conserved chromosomal backgrounds [19], it was considered possible that similar plasmids would be found in the different strains of EAEC linked to this outbreak, given their spatial and temporal association. However, analysis of the plasmid genomes showed that they demonstrated a high level of variation in replicon type, gene content and genomic architecture. Some plasmid similarity was seen within strains of the same MLST and serotype but wide diversity was observed between different MLST and serotypes. The interspersing of different plasmids in the phylogeny suggests that the aggregative phenotype (specifically the presence of aggR, aat and aap) has been acquired by several different replicons of F-plasmids on multiple occasions. This level of strain and plasmid diversity has not previously been identified in isolates of EAEC from the same outbreak, although EAEC outbreaks involving more than one serotype and variation in pEAEC have been described previously [14,44] Generally, aggR, aat and aap were conserved between strains of EAEC linked to this outbreak but a variety of fimbrial genes were identified. The presence of AAF is required for mediating the aggregative adherence seen in EAEC. To date five nonhomologous AAF fimbiral structural proteins have been described and a representative of each was identified in strains belonging to this outbreak.
aaiA-P comprise a T6SS apparatus for aaiC and was the first example of a conserved chromosomal aggregative genotype whose expression is under the control of a conserved plasmid encoded pathogenicity factor AggR [5]. In this study, five isolates all harbouring aggR and aat, had a missing or an incomplete AAI operon. This raises a question regarding the pathogenic potential of the aggR-positive but AAI operon deficient strains, in relation to those aggR-positive strains with complete AAI cassettes. Previous prevalence studies detecting the aaiA show its presence in between 26 and 44% of phenotypic aggregative E. coli [45,46]. Animal models for investigating EAEC virulence have been described previously [47] and further virulence studies are required. Interestingly, two strains (EAEC O111:H4 and O?:H19) had incomplete AAI operons encoding a T6SS. These regions were homologous to those found in serotypes O104:H4, O131:H27, O20:H19 and O55:H19 but with a different aaiC component. In addition, there was some evidence that it may be plasmidencoded.
It was suggested that certain strains of EAEC may have been carried asymptomatically by the cases before the outbreak occurred. Hwoever, it was not possible to compare the serotypes isolates following the outbreak with the serotypes of strains of EAEC currently circulating in England as there are very little data on domestically acquired strains of EAEC. Surveillance data indicates that the majority of strains of EAEC isolate in England are from cases of travellers' diarrhoea [12,50].
One hypothesis is that not all the strains of EAEC associated with this outbreak had the same level of pathogenicity and that only certain EAEC serotypes isolated contributed to the symptoms described. For example, a complete AAI operon may increase the pathogenic potential of strains of E. coli harbouring the pEAEC. Other studies have suggested that strains harbouring different fimbrial types maybe more pathogenic that others. Nüesch-Inderbinen et al. (2013) [48] showed a statistically significant association of the agg3C gene with the asymptomatic state. The presence of AAF/I and AAF/II have been associated with symptomatic cases [6,49].
Importantly, although colonies of EAEC O104:H4 were isolated from only five outbreak cases, all faecal specimens were retrospectively tested by PCR for the presence of the O104 Oantigen gene (wzxO104) [51]. The PCR detected wzxO104 in faecal specimens from 36 cases. The EAEC serotype O104:H4, with and without stx2, has been previously identified as a cause of GI disease. Furthermore, although the EAEC O104:H4 in this study did not carry the stx genes, this outbreak provides further evidence of the pathogenic potential of this EAEC serotype. EAEC is known to be a diverse pathotype and this study demonstrates this diversity can be seen within a single outbreak.