Genomic analysis of Shiga toxin-producing Escherichia coli from patients and asymptomatic food handlers in Japan

Shiga toxin-producing Escherichia coli (STEC) can cause severe gastrointestinal disease and colonization among food handlers. In Japan, STEC infection is a notifiable disease, and food handlers are required to undergo routine stool examination for STEC. However, the molecular epidemiology of STEC is not entirely known. We investigated the genomic characteristics of STEC from patients and asymptomatic food handlers in Miyagi Prefecture, Japan. Whole-genome sequencing (WGS) was performed on 65 STEC isolates obtained from 38 patients and 27 food handlers by public health surveillance in Miyagi Prefecture between April 2016 and March 2017. Isolates of O157:H7 ST11 and O26:H11 ST21 were predominant (n = 19, 29%, respectively). Non-O157 isolates accounted for 69% (n = 45) of all isolates. Among 48 isolates with serotypes found in the patients (serotype O157:H7 and 5 non-O157 serotypes, O26:H11, O103:H2, O103:H8, O121:H19 and O145:H28), adhesion genes eae, tir, and espB, and type III secretion system genes espA, espJ, nleA, nleB, and nleC were detected in 41 to 47 isolates (85–98%), whereas isolates with other serotypes found only in food handlers were negative for all of these genes. Non-O157 isolates were especially prevalent among patients younger than 5 years old. Shiga-toxin gene stx1a, adhesion gene efa1, secretion system genes espF and cif, and fimbrial gene lpfA were significantly more frequent among non-O157 isolates from patients than among O157 isolates from patients. The most prevalent resistance genes among our STEC isolates were aminoglycoside resistance genes, followed by sulfamethoxazole/trimethoprim resistance genes. WGS revealed that 20 isolates were divided into 9 indistinguishable core genomes (<5 SNPs), demonstrating clonal expansion of these STEC strains in our region, including an O26:H11 strain with stx1a+stx2a. Non-O157 STEC with multiple virulence genes were prevalent among both patients and food handlers in our region of Japan, highlighting the importance of monitoring the genomic characteristics of STEC.

Introduction Shiga toxin-producing Escherichia coli (STEC) cause various gastrointestinal diseases in humans, including life-threatening hemolytic uremic syndrome (HUS) [1]. Although O157: H7 is the predominant pathogenic serotype, severe infections caused by non-O157 serogroups are increasingly reported worldwide [2]. STEC transmission occurs through intake of contaminated food or via person-to-person spread, with large-scale outbreaks having been reported [1]. In Japan, food safety control measures and a STEC surveillance system were instituted after a massive STEC epidemic occurred in Sakai city in 1996, and STEC infection became a notifiable disease [3]. To prevent the spread of infection via food, the Japanese Ministry of Health, Labor and Welfare requires food handlers to undergo routine stool examination for various infectious pathogens, including STEC, and asymptomatic STEC carriers are legally restricted from working as food handlers [4]. Despite these efforts, approximately 4,000 cases of STEC infection are still reported annually in Japan [3].
Shiga toxin (Stx) is the most important STEC virulence factor. Stx has 2 subtypes with variants, which are Stx1 (stx1a, stx1c, and stx1d) and Stx2 (stx2a, stx2b, stx2c, stx2d, stx2e, stx2f, and stx2g) [5]. In addition, highly pathogenic STEC possess other virulence factors that include adhesins, other toxins, and protein secretion systems [1]. Detection of genes encoding these virulence factors in STEC strains could provide useful information about risk factors that may contribute to human disease. In recent years, there has been a worldwide increase of reports about antimicrobial resistance (AMR) among STEC strains [6]. In STEC carriers taking antibiotics, resistant STEC strains may have a selection advantage over other intestinal bacteria. Because of the public health implications of STEC infection, a comprehensive investigation of virulence and AMR factors is required to assess the potential pathogenicity and antibiotic resistance of STEC isolates from patients and asymptomatic food handlers. Some European authors have investigated the molecular characteristics of STEC isolates [7,8], but no molecular epidemiological studies have been done to assess the relationship between STEC isolates from patients and food handlers. In the present study, we investigated molecular epidemiology of STEC infection in Miyagi Prefecture, Japan, and performed whole-genome sequencing (WGS) to characterize the genomic features of STEC isolates from patients and asymptomatic food handlers including virulence factors and AMR genes.

Bacterial strains and clinical data
From April 2016 to March 2017, we collected all 65 epidemiologically unlinked STEC isolates detected through public health surveillance for infectious diseases in Miyagi Prefecture, which is located in central northeastern Japan and has a population of about 2.3 million. Thirty-eight isolates were obtained from fecal samples of hospital patients and 27 isolates were detected by routine stool examination of asymptomatic food handlers. Isolation of STEC from stool samples was done with sorbitol-MacConkey agar containing cefixime and tellurite in addition to conventional E. coli isolation agar (e.g., triple sugar iron agar and lysine-indole-motility medium). A latex agglutination test (VTEC-RPLA, Denka Seiken, Japan) and PCR with the EVT-1&2 and EVS-1&2 primers (TaKaRa Biomedicals, Tokyo, Japan) were used to detect Stx and Stx genes, respectively. Patient data (e.g., age, sex, and clinical manifestation) were also collected through STEC public health surveillance. Patients were divided into four age groups: infants and small children (0-4 y), older children and adolescents (5-19 y), adults (20-64 y), and older people (>65 y) [9]. The age-specific incidence of STEC infections per 100,000 population by age group was calculated using Miyagi Prefecture population data obtained from the National Institute of Population and Social Security Research website (http://www.ipss.go.jp/). This study was approved by the institutional review board of Tohoku University Graduate School of Medicine (IRB no. 2018-1-368).

Whole-genome sequencing
Bacterial DNA from the 65 isolates was extracted as described previously [10], and a DNA library was prepared from each sample with a NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's instructions. Then WGS was performed using a MiSeq (Illumina, San Diego, CA, USA) to generate pairedend 300-bp reads, resulting in an average of 5,556,073 read pairs per isolate. All samples showed a minimum average 30-fold coverage. The passing filter ranged from 90.88 to 96.74% (mean, 93.19%), and the average Q30 ranged from 78.30 to 87.21% (mean, 83.50%). All of the sequence data reported here have been deposited in DDBJ/EMBL/Genbank Sequence Read Archive (SRA) under accession numbers DRX149942 to DRX150006.

Genetic analysis
Sequence reads were trimmed of adaptors and filtered to remove reads shorter than 36 bp using Trimmomatic [11], followed by assembly using Platanus assembler v 1.2.4. [12]. Specific genes and alleles were identified with the bioinformatic pipeline of the Center for Genomic Epidemiology (http://www.genomicepidemiology.org), using the default setting of a 90% ID threshold and 60% minimum gene length overlap, except where otherwise stated. Specifically, SerotypeFinder 1.1 [13] was used to identify serogenotypes, MLST Finder 2.0 [14] was employed for multilocus sequence typing (MLST), VirulenceFinder 1.5 [15] was used for virulence genes, and ResFinder 3.0 [16] was employed for acquired AMR genes. We also searched for the major adhesin gene saa, which cannot be detected by VirulenceFinder, using BLAST (http://blast.ncbi.nlm.nih.gov).

Statistical analysis
Fisher's exact test and the two-sample t-test were used for analysis of categorical variables and continuous variables, respectively. In all analyses, P<0.05 was considered statistically significant.

Serogenotyping and MLST
Among the 65 STEC isolates, serogenotyping and MLST revealed 18 different serogroups and 20 different sequence types (Fig 1). Seven sequence types had already been reported as STECs causing human disease according to Enterobase (http://enterobase.warwick.ac.uk), while the other 13 sequence types, including O103:H8 ST2836, are new as STEC strains. An O-group was not detected in the one OUT (O-serotype untypable) isolate by SerotypeFinder. In this isolate, no O-processing genes (wzx, wzy, wzm, wzt) were detected by BLAST. It is possible that this isolate could be assigned to serogroups O14 or O57 since O-processing genes for these serogroups have not been found in their genomes [20], or it could represent a new serogroup.

Clinical characteristics of patients and food handlers
The majority of the patients (28/38, 74%) had bloody diarrhea and 1 patient (2.6%) developed HUS, while the symptoms of the remaining 10 patients (36%) were unknown. The patients were aged between 11 months and 85 years (median: 17.5 years), whereas the food handlers were aged from 21 to 71 years (median: 54 years). Sixty-one percent (15/23) of the patients and 74% (20/27) of the food handlers were female.
The annual age-specific incidence of O157 and non-O157 infections in Miyagi Prefecture during the study period is summarized in Table 1. In this region, the overall incidence of STEC infection was 1.6, with infants and small children having the highest incidence (13.5, P<0.001 vs. each other age group). Importantly, the incidence of non-O157 infections was significantly higher in infants and small children (11.3) than in the other age groups (P<0.001 vs. each other age group), while there was no significant difference in the incidence of O157 infections among the age groups.

Virulence genes
Among a total of 76 virulence genes registered in VirulenceFinder, 44 genes (58%) were detected among the isolates and there was a median of 17 virulence genes per isolate (range: . Isolates from the patients harbored significantly more virulence genes than isolates from the food handlers (a median of 18 and 10 virulence genes per isolate, respectively, P<0.001), and O157 isolates had significantly more virulence genes than non-O157 isolates (a median of 19 and 16 virulence genes per isolate, respectively, P = 0.013). Eight different Stx subtypes (combinations) were detected among the isolates, with stx1a-only being most frequent (n = 27, 42%), followed by stx1a+stx2a (n = 12, 18%). The distribution of virulence genes among isolates from the patients or food handlers and among O157 or non-O157 isolates is shown in S1 Fig and S1 Isolates with major serotypes had significantly more virulence genes than isolates with minor serotypes (a median of 18 and 7 virulence genes per isolate, respectively, P<0.001). The Stx subtype stx1a was significantly more frequent among isolates with major serotypes, while stx2d and stx2e were detected significantly more often in isolates with minor serotypes. Among the 48 isolates with major serotypes, adhesion genes eae, tir, and espB were detected in 47 (98%), 46 (96%), and 45 (94%) isolates respectively, and secretion system genes espA, espJ, nleA, nleB, and nleC were detected in 46 (96%), 46 (96%), 41 (85%), 45 (94%), and 41 (85%) isolates respectively, whereas all isolates with minor serotypes were negative for all of these genes (all P<0.001) ( Table 2).
Among the isolates from patients, O157 and non-O157 isolates had a comparable number of virulence genes (a median of 17 virulence genes per isolate, respectively, P = 0.32). Stx gene stx1a, adhesion gene efa1, secretion system genes espF and cif, and fimbrial gene lpfA were significantly more frequent among non-O157 isolates from patients than among O157 isolates from patients ( Table 2). In addition, stx1a, lpfA, and secretion system gene tccP were significantly more frequent in isolates from patients that were infants and small children than in isolates from patients of other age groups (S2 Table). The isolate from the 1 patient with HUS had stx2c-only, as well as adhesion genes eae, tir, and espB, and secretion system genes espA, espJ, nleA, nleB and nleC.
The additional search for saa using BLAST showed that only 5 out of 65 isolates (7%) possessed this gene. All of these isolates were non-O157 and from food handlers.

Phylogenetic analysis
Phylogenetic analysis was performed using 132,711 SNPs identified within the core genome of 71 STEC isolates (including the 6 reference strains) (Fig 2). The STEC isolates were divided into two clades, O157 and non-O157, except that the O145:H28 isolate clustered with the O157 isolates. Isolates with the same O serotype formed a cluster together, except for O103 and O8. In addition, the O103:H8 ST2836 isolates clustered with the O26:H11 isolates and were separated from the O103:H2 ST17 isolate. Within the O157:H7 cluster, isolates positive for stx1a+stx2a formed a subcluster. Isolates positive for microcin genes mcmA, mchB, mchC, and mchF were assigned to a subcluster within the O26:H11 cluster.

Discussion
To the best of our knowledge, this is the first genomic epidemiological study to investigate STEC isolates from patients and asymptomatic food handlers. Isolates with 5 non-O157 serotypes (O26: H11, O103:H2, O103:H8, O121:H19, and O145:H28), which had as many virulence genes as O157 isolates were prevalent among both patients and food handlers, whereas isolates with the remaining 12 serotypes were only found in food handlers and were negative for major virulence genes, eae, tir, espB, espA, espJ, nleA, nleB, and nleC. Non-O157 isolates were especially prevalent among children under 5 years of age. WGS analysis revealed clonal expansion of highly virulent STEC strains (e.g., O26:H11 strain with stx1a+stx2a) in our region (Miyagi Prefecture, Japan). Although the overall isolation rate of non-O157 STEC strains was reported to be 30-40% in Japan [3], more than half of the STEC isolates were non-O157 strains in our region. Similar to the increment of non-O157 isolates revealed by this study, non-O157 infections have been increasingly reported worldwide [2,21]. Notably, non-O157 infection was predominant among children under 5 years old in this study. Among the isolates from patients, adhesin gene efa1, secretion system genes espF and cif, and the gene lpfA encoding fimbriae, were significantly more frequent among non-O157 isolates than O157 isolates. lpfA was reported to be involved in prolonged shedding of STEC in young children [22]. In Japan, direct person-toperson contact is the suspected route of transmission for the majority of STEC infection outbreaks among children [23]. Prolonged shedding of STEC can facilitate its spread. Our findings suggested that these genes may be associated with a high frequency of non-O157 STEC infection among children.
The isolates with major serotypes (O157:H7, O26:H11, O103:H2, O103:H8, O121:H19, and O145:H28) harbored significantly more virulence genes than the isolates with minor serotypes. Apart from O103:H8, these major serotypes have been linked to epidemics and serious infections and have been frequently detected among clinical isolates worldwide [24]. Most of the isolates with major serotypes, including the isolate from a patient with HUS, possessed adhesion genes eae, tir, and espB, and secretion system genes espA, espJ, nleA, nleB and nleC, whereas the isolates with minor serotypes were negative for all of these genes. Studies have shown that the eae gene encoding intimin, an outer membrane protein involved in close attachment, is closely linked to the pathogenesis of STEC infection, along with other genes clustering on the bacterial chromosome (such as tir, espA, espB, and espJ) that form a pathogenicity island called the locus of enterocyte effacement [8,25]. Other studies have shown that effectors outside this locus encoded by nleA, nleB, and nleC are required to form attaching and effacing lesions in the intestinal epithelium, which allow STEC to colonize the human gut [26]. Accordingly, these virulence factors may play a key role in the pathogenesis of STEC infections. The current STEC surveillance system for food handlers in Japan is only based on serotyping and detection of Stx [3]. However, we think that STEC surveillance should focus on the above-mentioned virulence genes, such as eae, tir, espB, espA, espJ, nleA, nleB and nleC.
O103:H8 ST2836 STEC with multiple virulence genes was newly detected in this study. O103:H8 ST2836 isolates formed a separate cluster from the known isolates of O103:H2 ST17 on the phylogenetic tree, suggesting that these two clusters of serogroup O103 had different origins. As previously reported, STEC are E. coli strains of different lineages that have acquired virulence genes independently at different time points [27], and STEC strains from the same O serogroups are polyphyletic since horizontal transfer of the O-antigen gene can occur among different E. coli strains [28]. These points raise the possibility that new serotypes of highly virulent STEC may emerge.
There have only been a limited number of epidemiological studies on AMR in STEC isolates [6]. The resistance genes with the highest prevalence among our STEC isolates were aminoglycoside resistance genes (e.g., aadA, aph(3')-I, and str), followed by sulfamethoxazole/ trimethoprim resistance genes (e.g., sul and dfrA). In Japan, sulfamethoxazole/trimethoprim and aminoglycosides are antibacterial agents commonly used in domestic animals [29], which are the main reservoir of STEC, and the prevalence of aminoglycoside resistance among STEC isolates from cows in Japan has increased during the past decade [30]. Antimicrobial therapy is generally not recommended for STEC infection due to the possible risk of HUS, but it may be beneficial for patients with persistent diarrhea or food handlers with long-term STEC carriage [31]. Transfer of mobile genetic elements was reported to facilitate the spread of AMR genes to other bacteria [6]. Accordingly, it is important to monitor AMR in STEC isolates and prevent misuse/overuse of antibiotics based on the One Health approach [32].
WGS-based phylogenic analysis revealed a variety of SNP variants among isolates from the same serogenotype or same ST clade, suggesting dissemination of diverse STEC strains throughout our region. The O157:H7 isolates with stx1a+stx2a formed a subcluster within the O157:H7 cluster, and the O26:H11 isolates positive for microcin genes formed a subcluster in the O26:H11 cluster. STEC isolates possessing stx1a+stx2a have been linked to outbreaks associated with a high frequency of HUS [33]. The presence of microcin genes indicates environmental plasticity of the isolates since microcin is a bactericidal antibiotic [34]. These results highlight the fact that diverse strains with differing levels of virulence can exist within the same STEC serogroup.
Our phylogenetic analysis also detected 9 clonal expansions of STEC strains suggesting circulation of these strains among patients and food handlers in our region. One of the strains was serogenotype O26:H11 ST21 strain harboring stx1a+stx2a, which differed from a newly emerging virulent O26:H11/H-ST29 STEC clade reported in Japan by Ishijima et al [35]. In general, the majority of STEC O26:H11 isolates are only positive for stx1a [36,37], highly virulent stx2a-containing O26:H11 strains have been increasingly reported worldwide in recent years [36]. The spread of O26:H11 strains with stx2a could pose a threat in our region. WGS has been employed to investigate the molecular epidemiology of STEC [7], since it is a powerful tool for performing high-resolution molecular typing, population structure analysis, and detailed molecular characterization of microbes [38]. Further genome-based epidemiological studies are needed to provide a better understanding of STEC isolates for assistance in developing prevention and control strategies.
This study had several limitations. First, there were only a few of STEC strains from the same lineage or serotype, although we assessed all of the STEC isolates detected through public health surveillance in our region during the study period. Second, we could only obtain restricted epidemiological and clinical information. Third, while this in silico study was focused on putative virulence genes, the pathogenicity of STEC isolates needs to be clarified by in vitro and in vivo experimental studies.
In conclusion, we found that genetically diverse non-O157 isolates (O26:H11, O103:H2, O103:H8, O121:H19, and O145:H28) with as many important virulence genes as O157 isolates (including eae, tir, espB, espA, espJ, nleA, nleB and nleC) plus AMR genes (such as aminoglycoside and sulfamethoxazole/trimethoprim resistance genes) were prevalent among both patients and asymptomatic food handlers in Miyagi Prefecture, Japan. Our WGS analysis demonstrated the importance of monitoring the genomic characteristics of STEC isolates from asymptomatic food handlers in addition to symptomatic patients.