The prevalence and genomic context of Shiga toxin 2a genes in E. coli found in cattle

Shiga toxin-producing Escherichia coli (STEC) that cause severe disease predominantly carry the toxin gene variant stx2a. However, the role of Shiga toxin in the ruminant reservoirs of this zoonotic pathogen is poorly understood and strains that cause severe disease in humans (HUSEC) likely constitute a small and atypical subset of the overall STEC flora. The aim of this study was to investigate the presence of stx2a in samples from cattle and to isolate and characterize stx2a-positive E. coli. In nationwide surveys in Sweden and Norway samples were collected from individual cattle or from cattle herds, respectively. Samples were tested for Shiga toxin genes by real-time PCR and amplicon sequencing and stx2a-positive isolates were whole genome sequenced. Among faecal samples from Sweden, stx1 was detected in 37%, stx2 in 53% and stx2a in 5% and in skin (ear) samples in 64%, 79% and 2% respectively. In Norway, 79% of the herds were positive for stx1, 93% for stx2 and 17% for stx2a. Based on amplicon sequencing the most common stx2 types in samples from Swedish cattle were stx2a and stx2d. Multilocus sequence typing (MLST) of 39 stx2a-positive isolates collected from both countries revealed substantial diversity with 19 different sequence types. Only a few classical LEE-positive strains similar to HUSEC were found among the stx2a-positive isolates, notably a single O121:H19 and an O26:H11. Lineages known to include LEE-negative HUSEC were also recovered including, such as O113:H21 (sequence type ST-223), O130:H11 (ST-297), and O101:H33 (ST-330). We conclude that E. coli encoding stx2a in cattle are ranging from strains similar to HUSEC to unknown STEC variants. Comparison of isolates from human HUS cases to related STEC from the ruminant reservoirs can help identify combinations of virulence attributes necessary to cause HUS, as well as provide a better understanding of the routes of infection for rare and emerging pathogenic STEC.


Introduction
Shiga toxin-producing Escherichia coli (STEC) are zoonotic pathogens, occurring as abundant commensals among ruminants while occasionally causing gastrointestinal disease in humans. STEC infections can lead to the rare, but severe, hemolytic uraemic syndrome (HUS), with children and the elderly being most at risk [1]. HUS can be fatal or lead to long-term sequelae with reduced kidney function or less commonly gastrointestinal or cognitive disabilities [2]. Though STEC constitute a genetically diverse group and all STEC should be considered potential agents of severe human disease [3], certain lineages appear to pose a far higher risk of causing HUS compared to other STEC. These STEC are referred to as HUSEC, i.e. E. coli previously associated with HUS [4]. In addition to the characteristics of the infecting STEC strain, the risk of developing HUS for a given patient is likely to be affected by host factors such as age and immunological status and possibly other factors like inoculum size and route of infection. Nonetheless, the identification of HUSEC strains is valuable for prioritizing interventions to reduce the exposure of humans to the most dangerous forms of STEC and to predict the progression of cases of illness. STEC carry genes encoding Shiga toxins (Stx), considered to be their primary virulence factor. Stx genes are encoded by lambdoid bacteriophages that are maintained in a lysogenic stage in the bacterial chromosome [5]. Two types of Shiga toxins are known, Stx1 and Stx2, both of which are grouped into several subtypes. The presence of genes encoding a toxin variant referred to as Shiga toxin 2a (encoded by stx 2a genes) has been repeatedly shown to be a trait shared by the majority of HUSEC strains [4,6,7]. However, while the presence of stx 2a appears to be useful as a marker for the potential to cause HUS for strains infecting humans, the role of Shiga toxin in the ruminant reservoirs of STEC is poorly understood. Therefore, strains that cause severe disease in humans likely constitute a small and atypical subset of the overall STEC flora in ruminants. The prevalence and characteristics of major human pathogenic serotypes like O157:H7 that include known HUSEC have been extensively investigated in many countries over the last decades. Many other virulence-associated genes have been described and intimin (eae) is probably the principal adherence factor in human pathogenic STEC. The intimin gene is encoded on a pathogenicity island called the locus of enterocyte effacement (LEE), a genomic region encoding a system for attachment to the intestinal mucosa and translocation of effector proteins into host cells [3,8]. LEE is common in pathogenic STEC, but not essential for causing severe disease [9][10][11]. LEE-negative (eae-negative) STEC strains have also been associated with severe disease such as HUS, and they probably possess alternative mechanisms for attachment, such as aggR described for STEC/EAEC O104:H4 [12] and saa as described for STEC O113:H21 [13]. Few if any studies have been performed with the aim of providing an unbiased and comprehensive view of STEC strains carrying stx 2a that occur in ruminant reservoirs. The aim of the present study was 1) to investigate the presence of stx 2a genes in cattle samples collected in Norway and Sweden, 2) investigate the presence of stx 2 variants in cattle using high throughput amplicon sequencing and 3) characterize recovered stx 2a -positive E. coli and relate their phylogeny and virulence characteristics to known HUSEC.

Development of a real-time PCR for stx 2a
A hydrolysis probe based real-time PCR assay for the detection of stx 2a (Table 1) with primers containing locked nucleic acids (Exiqon A/S, Vedbaek, Denmark) was developed by alignment of 94 sequences of stx 2 representing all the recognised subtypes, 2a through 2g [14]. The novel assay was tested against a panel of E. coli strains obtained from the EU reference laboratory for E. coli (Istituto Superiore di Sanità, Rome, Italy) and from Statens Serum Institut (Copenhagen, Denmark) with known stx 1 and stx 2 subtypes. The assay was also tested in parallel with conventional PCR for stx 2 subtyping according to the reference laboratory [15] on a panel of STEC strains to verify the performance of the method (Table 2). PCR efficiency was assessed by analysing serial dilutions of DNA from the positive control EDL933 in the Swedish lab. DNA concentration was measured using Qubit QuantIT HS kit (Invitrogen, Carlsbad, CA).

Samples
As part of a nationwide prevalence survey of STEC in Swedish cattle, individual faecal and skin (ear) samples were collected at different slaughterhouses during 2011 to 2012. The total number of faecal and skin samples were 2041 and 418, respectively. Skin samples are likely to reflect bacteria previously shed by the individual animal as well as bacteria shed by other animals in the same group transferred via the environment or direct interaction e.g. grooming. They allow the efficient recovery of strains representative of a group of animals as they tend to be positive more often compared to individual faecal samples; the collection of ears for this purpose is easily standardized [17]. The number of samples collected at different slaughterhouses were determined to represent the geographical distribution of cattle in Sweden. In Norway, pooled faecal samples were collected through a nationwide survey sampling dairy herds with more than 50 cows in 2014 [18]. From each herd, faecal material was collected from ten different places, and were to include all present age groups. In total, samples were retrieved from 179 dairy herds.

Sample preparation and isolation of stx 2a -positive E. coli
All samples were enriched in buffered peptone water at 37˚C for 18-20 hours. After enrichment, DNA was extracted directly from the enrichment broths using QIAamp DNA Stool Mini kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The extracted DNA was used for testing for the presence of Shiga toxin genes; stx 1 , stx 2 and stx 2a as described below. Isolation of stx 2a -positive E. coli was attempted from all stx 2a -positive samples, comprising plating onto different agar plates (i.e. MacConkey agar, Sorbitol MacConkey agar and/or CHROMagar™ O157) and incubation at 37˚C overnight. Colonies with presumptive E. coli morphology were selected for further testing for presence of stx 2a and DNA was extracted from colony material by boiling and tested by the stx 2a real-time PCR. Presumptive stx 2a -positive E. coli were confirmed as E. coli using MALDI-TOF-MS (Bruker, Bremen, TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGCACTGTCTGAAACTGCTCCTGT stx 2 amplicon sequencing a Modified from Persson et al. [6] F4f_ad   Germany) and characterized by WGS as described below. Extracted DNA from a subset of cattle skin (ear) samples were subjected to high-throughput amplicon sequencing of stx 2 genes.

Real-time PCR detection of stx genes
DNA extracts from all the enriched samples were subjected to screening by real-time PCR for the detection of stx 1 and stx 2 as described by Perelle

High-throughput amplicon sequencing of stx 2 genes from skin samples
To further estimate the prevalence of stx 2 variants and combinations of variants, the sequencing protocol developed by Persson and co-workers [6,14] was adapted to the deep amplicon sequencing protocol provided by Illumina [20]. The modified primers are shown in Table 1. This assay was applied to DNA extracted from enrichment broths of a subset of 48 stx 2 PCRpositive Swedish cattle skin (ear) samples. Amplicons generated using the modified primers were barcoded using Nextera XT index kit (Illumina, San Diego, CA). An Illumina MiSeq instrument was used for sequencing 250 base pairs in both directions. Primer sequences were removed and reads were quality trimmed using Trimmomatic 0.32 [21]. Any read pair with less than 200 bp of high quality sequence from each direction was discarded. Read pairs were mapped against 89 known stx 2 variants [14], and the closest match for each read pair was determined. Samples were discarded if less than 100 read pairs were consistent with stx 2 , and any sequence variant supported by <5% of the total reads for a given sample was ignored to remove spurious variants generated by sequencing errors. With the minimum accepted length of 2 × 200 bp, all known variants could be unambiguously classified as either stx 2 subtype a, b, c, d, e, f or g with the exception of the sequence variants AF500189 (stx 2d , could not be distinguished from certain stx 2c ) and M59432 (stx 2c , could not be distinguished from certain stx 2a ) which occurred in zero and one sample respectively.

Statistical analysis
Shiga toxin gene prevalence with 95% confidence intervals were calculated by the exact binomial test using R version 3.6.2 for Windows [22].

Genomic characterization of stx 2a -positive E. coli isolates
The phylogroup of the stx 2a -positive E. coli isolates was determined by in silico implementation of the PCR system of Clermont et al. [23] on assembled genomes. Multilocus sequence typing (MLST) [24], in silico serotyping [25] and virulence gene detection [26] was performed using Centre for Genomic Epidemiology web services [27]. Clustering of MLST data was performed using the minimum spanning tree algorithm in Bionumerics 7.6 (Applied Maths NV, Sint-Martens-Latem, Belgium), with each locus as an equivalent categorical variable. Multiple correspondence analysis was performed on virulence gene presence/absence data in R 3.5.0 [22] using the FactoMineR library.

Development of a real-time PCR for stx 2a
The specificity of the novel PCR assay was confirmed at both laboratories by analyzing clinical and reference strains encoding stx 1 and/or stx 2 . All 45 stx 2a -positive strains and none of the 35 stx 2a -negative strains were identified as stx 2a -positive by the real-time PCR ( Table 2). The PCR efficiency was calculated to be 75% based on the standard curve from serial dilutions of DNA from EDL933.  positive for stx 2a (71.0%) and stx 2d (48.4%), with a lower proportion of samples positive for stx 2c (9.7%), stx 2g (9.7%), and stx 2b (6.5%) (Fig 1A). Most samples were positive for only a single stx 2 variant, but several were positive for two or three different variants, most commonly stx 2a together with stx 2d (Fig 1B). Only four out of the 22 samples in which stx 2a was identified by amplicon sequencing were positive for stx 2a when tested by real-time PCR.

Genomic characterization of stx 2a -positive E. coli isolates
In total, 40 stx 2a -positive E. coli isolates were retrieved from the Swedish (n = 25) and the Norwegian (n = 15) samples by plating the enriched samples onto selective agar plates and selecting one isolates per positive sample. One isolate was determined not to be stx 2a -positive after genome analysis and excluded from further analysis, bringing the total number of included isolates to 39. Characteristics of the stx 2a -positive E. coli isolates are summarized in Fig 2 and S1 Table. Most of the isolates belonged to phylogroup B1 (n = 27), but A (n = 8) and B2 (n = 3) were also represented. One isolate could not be assigned to a phylogroup as it presented an undefined profile (+/-/+/+). A total of 19 MLST profiles were identified, of which 11 profiles formed three clonal complexes (CC) centered around ST-10 (phylogroup A, CC10, 9 isolates), ST-223 (phylogroup B1, CC155, 7 isolates) and ST-718 (phylogroup B1, 10 isolates from different predefined and undefined CC's). These three CC's were recovered from both Swedish and Norwegian cattle (Fig 3). Isolates were assigned to a CC if they differed from one another by a maximum of three loci. In silico serotyping revealed substantial diversity but was generally in agreement with MLST and phylogroup divisions (Fig 2). Virulence gene detection revealed five isolates to be "classical" LEE-positive STEC with intimin (eae), tir, tccP and a repertoire of non-LEE encoded effector genes. These isolates were

Prevalence of stx 2 and stx 2a in cattle
Shiga toxin-producing E. coli are generally considered a normal part of the healthy ruminant intestinal flora, so the high prevalence of both stx 1 and stx 2 genes observed among cattle in the present study is not surprising. As some STEC seem to be more associated with severe human infection than others, the general detection of stx genes without any knowledge of the subtypes present is not well suited for detecting severe human pathogens. The stx genes are encoded in prophage genomes, and free phage particles might increase the load of PCR-detectable stx  genes in environments where stx-carrying bacteria are present. Both shedding and colonization status for ruminants with STEC can be expected to be periodic or transient; the presence of potential pathogens on the herd level can therefore be more relevant compared to the prevalence in individual animals. In the Swedish survey, the skin (ear) samples resulted in higher prevalence's than the corresponding faecal samples. This is in concordance with previous results from Sweden [17], and indicate that skin (ear) samples represents the prevalence in an animal herd or group and not individual animals. The Norwegian survey analyzed pooled faecal samples collected from dairy herds, giving herd prevalence and not individual which is probably the reason these resulted in the highest stx prevalence. Due to the differences in sampling and study design, the stx prevalence in the two countries cannot be compared directly. This was not the purpose of this study.
In recent years, studies have shown that STEC strains carrying stx 2 are more pathogenic than those carrying only stx 1 , while STEC strains with the stx 2a subtype are most frequently associated with severe STEC disease manifestation; HUS [4,6,7]. The association between stx 2a and HUS made this subtype a particular focus of the presented study. As more STEC strains have been investigated by whole genome sequencing over the years it has been shown that there are sequence diversity within the stx 2a subtype, with some variants being highly similar to sequence variants of the stx 2c subtype [14]. In the present study, a novel real-time PCR assay was developed to specifically detect stx 2a genes. This PCR was employed for primary screening of samples in order to identify samples for attempting isolation of stx 2a -positive E. coli isolates from the two countries for further characterization. As specificity was the primary concern in the development of this PCR, the sensitivity is likely to be suboptimal and the PCR level prevalence of stx 2a reported for the different sample types should be interpreted with caution. The poor sensitivity was a necessary trade-off to produce a sufficiently specific assay for identifying stx 2a -positive samples due to the high sequence similarity of stx 2a compared to other stx 2 subtypes. The real-time PCR described here has successfully been employed in two different laboratories using different PCR reagents and platforms which indicates that the method is robust and can easily be used in different laboratories. As the real-time PCR has not been tested on spiked samples with known levels of stx 2a -positive STEC we do not know how the Ct values relates to number of cells. We are, however, aware that a Ct-value of 45 might be a false positive. In the case of stx 2a -positive STEC, we consider it more important to avoid false negative samples than having a few false positive samples.
A community profiling approach based on amplicon sequencing of partial stx 2 genes was used on a subset of the stx 2 PCR-positive samples. This revealed stx 2a and stx 2d to be the most common types of stx 2 among Swedish cattle samples, but also the presence of several other variants and combinations of variants. With these results in mind, identifying all potential STEC in a sample, especially an environmental or primary production sample, one would need to retrieve and test several isolates. This is in line with recommendations in ISO/TS-13136:2012 [19]. It is also reasonable to believe that one or more stx 1 -subtypes, in addition to several stx 2subtypes, could be present in a sample as well as different strains with the same stx 1 -or stx 2subtype, e.g. stx 2a could be present in two different genomic backbones in the same sample. However, this was beyond the scope of this study. The approach of using amplicon sequencing described in this study is a powerful tool for detecting all known stx 2 variants in a sample, only limited by the inclusivity of the standard stx 2 sequencing primers. A comparison between the real-time PCR and amplicon sequencing results revealed that only four out of 22 samples positive for stx 2a by amplicon sequencing were also positive for the same subtype by PCR. Although the limit of detection for the PCR was low when evaluated using purified DNA from a defined strain this might not be the case for mixed samples where other subtypes of stx 2 may interfere with the amplification. However, there might also be an actual difference in the analytical sensitivity between the two methods. The real-time PCR was primarily used in this study as a screening tool in order to retrieve stx 2a -positive E. coli and therefore the discrepant results were not further investigated.

Characteristics and public health relevance of LEE-positive STEC isolates
Most STEC associated with HUS reported in the literature to date have carried stx 2a genes in combination with the locus of enterocyte attachment and effacement (LEE) [28]. In the present study, only one LEE-positive isolate was detected from the Swedish samples; an O121:H19. STEC O121:H19 is a well-known cause of severe outbreaks and sporadic infections, and is a known HUSEC [4,29]. From the Norwegian samples one LEE-positive STEC of serotype O26: H11 was detected. This serotype is known to carry different variants of stx genes or to be stxnegative when isolated from ruminants, and stx 2a -positive strains of O26:H11 have caused several cases of HUS in Norway [30] as well as elsewhere [31]. Two isolates of O84:H2 (ST-306) with LEE and harbouring both stx 1 and stx 2a genes were also recovered from the Norwegian samples. O84:H2 has been found in cattle and as a cause of sporadic cases of diarrhoea among humans in New Zealand [32], but has to our knowledge not been linked to severe cases of illness. However, strains of the same sequence type, ST-306, with different serotypes have been linked to HUS cases in both Germany [4] and Sweden [33].

Characteristics and public health relevance of LEE-negative STEC isolates
LEE-negative STEC are rare, but perhaps underestimated as a cause of HUS, as diagnostics have historically focused on the most well-known LEE-positive serotypes. In general, LEE-negative isolates rely on alternative host tissue adhesion mechanisms, which are either poorly understood or known, but historically associated with other E. coli pathotypes [9][10][11]. The most notable example of LEE-negative HUSEC is the major outbreak in Europe 2011 of enteroaggregative E. coli O104:H4 encoding stx 2a [34], but smaller outbreaks and sporadic cases are continuously reported. Two variants of LEE-negative STEC recovered from Swedish cattle in the present study, O113:H21 (ST-223) and O130:H11 (ST-297), have caused sporadic cases of HUS in Sweden [35]. O113:H21 (ST-223) is a well-known LEE-negative HUSEC with cases reported from several countries [36], while O130:H11 has also been associated with HUS in Australia [37] and Argentina [9]. The presence of both of these strains in Swedish and Norwegian cattle thus should be considered a public health concern. Another potential HUSEC was an O163:H19, a serotype that has previously been linked to a case of HUS in the UK [38]. The remaining serotypes and STs found in the present study have not to our knowledge, been linked to severe illness in humans. However, several known LEE-negative HUSEC could not be distinguished from strains not known to cause HUS in terms of virulence gene repertoire (Fig 4), this might be due to the selection of genes included in this study which is biased towards LEE-positive STEC and other known pathotypes. Previous studies have had similar difficulties in identifying the virulence determinants of non-LEE HUSEC [39], but new potentially relevant markers are continuously being discovered [40].

STEC/ETEC hybrid isolates
In recent years, hybrid pathotypes of E. coli have been reported to be associated with diarrhea and HUS [32,[39][40][41][42]. Several isolates obtained in this study carried genes encoding heat-stable enterotoxin 1 (sta 1 ) in addition to stx 2a and can thus be considered hybrids between the STEC and ETEC (enterotoxigenic E. coli) pathotypes. The isolates appear to be related to some extent, belonging to phylogroup A and ST-10, but belonging to multiple serotypes with O2/ O50:H27 being the most common. This lineage was found in isolates from both Norway and Sweden, and STEC/ETEC hybrids of matching serotypes have previously been isolated from patients and cattle in Italy and Finland [41,43]. A single isolate of ST-330 in the present study had both a LEE-region, the ETEC sta 1 toxin and belonged to the same clonal complex as the ST-10 STEC/ETEC LEE-negative hybrids. A strain matching the description of the present finding (O101:H33, phylogroup A, ST-330, sta 1 +) have been associated with HUS in an infant in Finland in 2001 [44]; another isolate with matching phylogroup and ST, but no expressed serotype (ONT:H -), has been reported from a German HUS case [4].

Conclusions
In this study, we found a substantial proportion of stx 2a -positive samples from Swedish and Norwegian cattle. stx 2a and stx 2d were the most common variants among Swedish cattle samples analyzed by high throughput amplicon sequencing, however other variants and combinations of variants were also seen. This approach of using amplicon sequencing is a powerful tool for detecting all known stx 2 variants in a sample and reflects the importance of selecting more than one stx-positive isolate in a complex sample. Isolation and characterization of 39 stx 2apositive E. coli revealed that only a small proportion have similar virulence profiles as known HUSEC-strains and only a few known HUSEC lineages were identified among the stx 2a -positive isolates. We conclude that stx 2a -positive E. coli in cattle are ranging from strains similar to HUSEC to unknown STEC variants. It is currently unclear whether most stx 2a -positive E. coli pose a risk of causing HUS in a vulnerable patient, or if this capability is an emergent property of combinations of other virulence factors including key adhesins. Due to this unexplored field it would be of interest to compare human and animal isolates further by comparative WGS analysis.
Supporting information S1