Quantification of enterohemorrhagic Escherichia coli O157:H7 protein abundance by high-throughput proteome

Enterohemorrhagic Escherichia coli (EHEC) O157:H7 is a human pathogen responsible for diarrhea, hemorrhagic colitis and hemolytic uremic syndrome (HUS). To promote a comprehensive insight into the molecular basis of EHEC O157:H7 physiology and pathogenesis, the combined proteome of EHEC O157:H7 strains, Clade 8 and Clade 6 isolated from cattle in Argentina, and the standard EDL933 (clade 3) strain has been analyzed. From shotgun proteomic analysis a total of 2,644 non-redundant proteins of EHEC O157:H7 were identified, which correspond approximately 47% of the predicted proteome of this pathogen. Normalized spectrum abundance factor analysis was performed to estimate the protein abundance. According this analysis, 50 proteins were detected as the most abundant of EHEC O157:H7 proteome. COG analysis showed that the majority of the most abundant proteins are associated with translation processes. A KEGG enrichment analysis revealed that Glycolysis / Gluconeogenesis was the most significant pathway. On the other hand, the less abundant detected proteins are those related to DNA processes, cell respiration and prophage. Among the proteins that composed the Type III Secretion System, the most abundant protein was EspA. Altogether, the results show a subset of important proteins that contribute to physiology and pathogenicity of EHEC O157:H7.


Introduction
Enterohemorrhagic Escherichia coli (EHEC) O157:H7 is a zoonotic pathogen belonging to Shiga toxin-producing E. coli (STEC) and responsible for different diseases as diarrhea, hemorrhagic colitis and hemolytic uremic syndrome (HUS). HUS is distributed worldwide and considered to be a public health problem in several countries [1,2] is the country with the highest incidence of HUS in the world, with approximately 14 cases per 100,000 in children under 5 and a report of 500 cases per year [3,4]. Cattle are the main reservoir of EHEC. Several studies have shown that most cases related to infection in human may be attributed to the high consumption of foods of bovine origin and especially ground beef is the main source of contamination [5]. Great efforts had been made to characterize strains of E. coli O157:H7 isolated from Argentinian cattle [6]. Using the analysis of simple nucleotide polymorphisms, we have classified 16 strains of STEC O157:H7 in clade 6 and 8, which are the most virulent clades [6]. In vitro and in vivo experimental results showed that the strains Rafaela II (clade 8) and 7.1 Anguil (clade 6) have a high virulence potential when compared with other strains and the standard strain EHEC O157:H7 EDL933 [7]. These results enabled us to characterize the high prevalence of strains clade 6 and 8 in the Argentinian cattle. Importantly, these two clades might contribute to a high incidence of HUS in Argentina.
The availability of whole genome sequences of different EHEC strains has enabled genomewide comparisons to identify factors that might be correlated to physiology and virulence of this pathogen [8]. In addition, the implementation of system biology approaches, such as prediction of protein-protein network, has contributed substantially in the understanding of the pathogen and interactions with its host [9].
Information about the functions and activities of the individual proteins and pathways that control these systems is essential to understand complex processes occurring in living cells. Large scale quantitative proteomics is a powerful approach used to understand global proteomic dynamics in a cell, tissue or organism, and has been widely used to study protein profiles in the field of microbiology [10]. Furthermore, the study of the abundance of proteins in different conditions or during different stages of growth or disease can provide important information about the activities of individual protein components or protein networks and pathways. The rapid growth of proteomic and genomic methods and tools has managed to reveal the basic protein inventory of a few hundred different organisms. Quantitative proteomic approaches have been applied to determine the absolute or relative abundance of proteins. This information gives insights about the biological function and properties of the cell as well as how cells respond to environmental or metabolic changes or stresses [11,12]. Quantitative proteomics analysis can contribute to the generation of datasets that are critical for our understanding of global proteins expression and modifications underlying the molecular mechanism of biological processes and disease states.
In a previous study, we reported the use of isobaric tags for comparative quantitation (TMT) method to identify the differentially expressed proteins among three EHEC O157:H7 isolates: Rafaela II (Clade 8), Anguil 7.1 (Clade 6) and EDL933 (Clade 3) [7]. The proteome differences observed among these strains are related mainly to proteins involved in both virulence and cellular metabolism; which might reflect the virulence potential of each strain [7]. The aim of the present study was to promote a more comprehensive insight into the molecular basis of EHEC O157:H7 physiology. For this purpose, we applied high-throughput proteomics to combine the proteome of three EHEC O157:H7 isolates: Rafaela II, Anguil 7.1 and EDL933 and normalized spectrum abundance factor (NSAF) approach [13] to quantify the EHEC O157:H7 proteome.

Bacterial strain and growth conditions
The EHEC O157:H7 strains Rafaela II (clade 8) and 7.1 Anguil (clade 6) isolated from cattle in Argentina and EDL933 (clade 3) strain recovered from a patient in USA were routinely maintained in Luria-Bertani broth (LB, Difco Laboratories, USA) or in LB 1.5% bacteriological agar plates, at 37˚C. For the proteomic studies, bacterial strains were cultured as previously described by Amigo et al. [7]. Overnight cultures of the different EHEC O157:H7 strains growth in LB were inoculated (1:50) in Dulbecco's modified Eagle's medium (DMEM)-F12 nutrient until reach the mid-exponential growth phase (OD 600 nm = 0.6) under a 5% CO 2 atmosphere at 37˚C.

Protein extraction and preparation of whole bacterial lysates for LC-MS/ MS
After bacterial growth, protein extractions were performed according to Amigo et al. [7]. Three biological replicates of each culture were centrifuged at 5000 x g for 20 min at 4˚C. The cell pellets were resuspended in ice-cold lysis buffer (50 mM Tris-HCl, pH 7.5, 25 mM NaCl, 5 mM DTT and 1 mM PMSF) and disrupted by three cycles in liquid N 2 and subsequently placed in boiling water. The resulting lysates were centrifuged at 30,000 × g for 10 min and precipitated with 5 volumes of ice-cold acetone at -20˚C overnight. Next, the protein pellets were resuspended in buffer containing 8 M urea, 2 M thiocarbamide and 200 mM tetraethylammonium bromide at pH 8.5. The protein concentration was determined by the Bradford assay using BSA curve as a standard. Subsequently, the samples were reduced with tris-(2-carboxyethyl)-phosphine (200 mM), alkylated with iodoacetamide (375 mM) and enzymatically digested with sequencing grade trypsin. Finally, the samples were labeled with TMT Reagents 6-plex Kit according to the manufacturer's instructions.

Liquid chromatography and mass spectrometry
The proteomic analyses were performed using High pH Reverse Phase Fractionation and Nano LC-MS/MS Analysis by Orbitrap Fusion. Firstly, the labeled peptides were pooled together and desalted using Sep-Pak SPE (Waters) to remove salt ions. The hpRP chromatography was performed with Dionex UltiMate 3000 model on an Xterra MS C18 column (3.5 um, 2.1 × 150 mm, Waters). The sample were dissolved in buffer A (20 mM ammonium formate, pH 9.5) and then eluted with a gradient of 10 to 45% buffer B (80% acetonitrile (ACN)/ 20% 20 mM NH 4 HCO 2 ) for 30 min, followed by 45% to 90% buffer B for 10 min, and a 5-min hold at 90% buffer B. Forty-eight fractions collected at 1 min intervals were merged into 12 fractions. The nano LC MS/MS analysis was carried out using a Orbitrap Fusion tribrid (Thermo-Fisher Scientific, San Jose, CA) mass spectrometer with an UltiMate 3000 RSLC nano system (Thermo-Dionex, Sunnyvale, CA). The fraction was injected onto a PepMap C18 trapping column (5 μm, 200 μm × 1 cm, Dionex) and separated on a PepMap C18 RP nano column (3 μm, 75 μm × 15 cm, Dionex). For all the analysis, the mass spectrometer was operated in positive ion mode, MS spectra were acquired across 350-1550 m/z scan mass range, at a resolution of 12,0000 in the Orbitrap with the max injection time of 50 ms. Tandem mass spectra were recorded in high sensitivity mode (resolution >30000) and made by HCD at normalized collision energy of 40. Each cycle of data-dependent acquisition (DDA) mode selected the top10 most intense peaks for fragmentation. The data were acquired with Xcalibur 2.1 software (Thermo-Fisher Scientific).

Database searching, protein identification and abundance estimation
Tandem mass spectra were extracted and charge state deconvolution and deisotoping were not performed. All MS/MS samples were analyzed using Mascot (Matrix Science, London, UK; version 2.4.1). Mascot was set up to search the EDL933_NCBI_20141031.fasta; TW14539_ex-clusive_20150310 database (unknown version, 6341 entries) assuming the digestion enzyme trypsin. Mascot was searched with a fragment ion mass tolerance of 0.020 Da and a parent ion tolerance of 8.0 PPM. Carbamidomethyl of cysteine and TMT-6plex of lysine and the n-terminus were specified in Mascot as fixed modifications. Deamidated of asparagine and glutamine and oxidation of methionine were specified in Mascot as variable modifications. Scaffold (version Scaffold_4.8.8, Proteome Software Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could achieve an FDR less than 1.0% by the Scaffold Local FDR algorithm and contained at least 1 identified peptide. Label free quantification value was calculated by Normalized spectrum abundance factor (NSAF) algorithm [14].

Bioinformatics analysis
Functional annotations were assigned by the COG database [15]. Metabolic pathways were determined by analyzing proteins with the Kyoto Encyclopedia of Genes pathways and Genomes (KEGG) [16].

Global proteomic analysis and functional classification of Escherichia coli (EHEC) O157:H7 proteome
In this study we have promoted insights into EHEC O157:H7 proteome from a dataset generated with strains E. coli O157:H7 Rafaela II, Anguil 7.1 and EDL933. The strains were grown in D-MEM media and then, proteins from total bacterial lysates were extracted and digested in solution. The resulting peptides were analyzed by 2D-LC MS/MS. From this proteomic analysis, we detected 2,644 non-redundant EHEC O157:H7 proteins (S1 Table). When comparing this result with in silico data of EHEC O157:H7 genome, approximately 47% of the predicted proteome of this pathogen was identified (Fig 1A). To determine the abundance of the identified proteins, the NSAF approach [13] was used. This approach determines to significance of the expression changes based on individual protein intensities (Zybailov et al., 2006) (S1 Table).
According this analysis a dynamic range of protein abundance was generated (Fig 1B). Fifty proteins were identified as most abundant in EHEC O157:H7 proteome ( Table 1). Of the total of proteins identified, 25 proteins are encoded by genes that are present in the pO157 plasmid; however, these proteins did not show a high abundance level (S1 Table) We subsequently performed functional annotation of the identified proteins using gene ontology [15]. Cluster of orthologous group (COG) analysis grouped the identified proteins into four important functional groups: (i) metabolism, (ii) information storage and processing, (iii) cellular processes and signaling, and (iv) poorly characterized (Fig 2A). Although most of the identified proteins are related to cellular metabolism, the most abundant proteins are involved in the translation process, followed by energy metabolism and posttranslational modification, protein turnover and chaperones, which shows an intense metabolic activity mainly in the protein synthesis (Fig 2B). On the other hand, most of the less abundant proteins are involved in replication, recombination and repair (Fig 2B).
Pieper et al. [17] and Ishihama et al. [18] also conducted proteomic studies on E. coli K-12 and EHEC O157:H7 strain 86-24, respectively, to determine the absolute abundance of proteins. Thirteen proteins of the most abundant proteins in our study were also found as the most abundant proteins in E. coli K-12 (Table 2). Those proteins are related to carbohydrate metabolism, transcription, translation, posttranslational modification and signal transduction mechanisms [18]. On the other hand, only 11 proteins of the most abundant group (Table 2) were the most abundant ones in the data obtained from quantitative proteome of EHEC O157: H7 strain 86-24 [17]. Some of those proteins (e. g. TerD, TerE, EspA and DNA-damageinducible protein I) are absent from E. coli K-12. Interestingly, when comparing our results with those of Pieper et al. [17] and Ishihama et al. [18], the E. coli proteome was evaluated in different grown condition. Despite the different growth conditions, glyceraldehyde-3-phosphate dehydrogenase, translation elongation factor Tu, DNA-binding protein H-NS, alkyl hydroperoxidereductase protein C, GroEL chaperone and 50S ribosomal protein L7/L12 were detected as the most abundant proteins as well ( Table 2). These results suggest a set of proteins that may play an important role in the biology of E. coli.
We also detected shiga-toxin subunits such as StxA, StxB, Stx2a and Stx2cb; these proteins, however, were not among the most abundant proteins (Fig 1). Pieper et al. [17] also obtained similar results in EHEC 86-24 proteome. This low abundance can be associated with environmental or nutritional conditions that contribute to the bacterial lysis and consequently to the production of the toxin [17,19,20].

Metabolic network analysis
To identify the most relevant biological pathways of the identified proteins, we performed a KEGG enrichment analysis. This analysis provides a comprehensive understanding about pathways that might contribute to cellular physiology [16]. When we evaluated the most abundant proteins, we identified 10 pathways that were considered significant (p < 0.05), among them the Glycolysis / Gluconeogenesis was the most significant (Fig 2C). On the other hand, among the less abundant proteins were proteins related to ABC transport. Different studies have reported that glycolysis / gluconeogenesis pathway might influence in the colonization process of EHEC in the gastrointestinal tract of both mouse and bovine [21,22]. Although glycolysis substrates inhibit the expression of genes that are localized in locus of enterocyte effacement (LEE), this pathway plays an important role in the initial colonization and maintenance of EHEC in the mouse intestine. In addition, gluconeogenesis not only induces LEE gene expression, but contributes also to the later stages of EHEC colonization in mouse [21,23]. In our proteomic analysis, 23 proteins that composed the Glycolysis / Gluconeogenesis pathway of E. coli were identified (Fig 3). NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was the second most abundant protein of EHEC O157:H7 proteome ( Table 1). This important cytoplasmic protein of the Glycolysis pathway is also described as a moonlight protein, owing to the distinct functions performed by this enzyme in different cellular localization [24]. Some studies showed that GAPDH secreted by EHEC and enteropathogenic E. coli (EPEC) strains can bind to fibrinogen and epithelial cell, which could contribute to the pathogenesis of this bacterium mainly through cell adhesion [25,26]. Another protein that is also described as a moonlight protein and was detected among the most abundant proteins of the EHEC proteome is enolase ( Table 1) [27]. This glycolytic enzyme that plays an important role in the carbon metabolism also acts in the RNA degradosome process, mainly in the RNA processing and gene regulation. In E. coli, enolase-RNase E/ degradosome complex regulates bacterial morphology under anaerobic condition by inducing a filamentous form, which is observed by some pathogenic E. coli strains under oxygen limiting conditions [27]. Although, the lipid metabolism not has been detected among the KEEG pathway with significant p-value, the acyl carrier protein (ACP) was detected as the more abundant protein of EHEC O157:H7 proteome. Studies showed that E. coli contain in its genome only a copy of acpP gene, which codify to an ACP, this protein plays key role in the fatty acid biosynthesis and is also required for the growth of E. coli (Rawlings and Cronan, 1992;De Lay and Cronan, 1996). During the fatty acid biosynthesis, ACP is post translationally modified by 4'-phosphopantetheinyl (4'-PP). The acyl intermediates generated are bound to the 4'-PP thiol through a thioester linkage, which allows ACP to transport intermediaries among the fatty acid synthetic enzymes [28,29].

Information storage and processing
Most proteins described as the most abundant are involved in translation processes. Similar results had been observed in E. coli K-12 [17]. In addition, according to the KEGG enrichment analysis, the ribosome was strongly enriched (Fig 2C). We identified proteins involved in structural elements of the ribosome as well as related to initiation, elongation and terminations steps, which are required to the translation process [30]. These results show an intense metabolic activity of EHEC mainly in protein synthesis. Among these proteins, the translation elongation factor Tu was identified (EF-Tu) ( Table 1). EF-Tu could play a role in the resistance process of this bacterium in the gastrointestinal tract [31], as well as against cellular damage generated by the bile salt sodium deoxycholate [32]. Unlike E. coli K-12 [17], the proteins involved in transcription process in EHEC were identified as most abundant. CspA was identified to be among the most abundant proteins as well. This RNA chaperone is described as the major cold shock protein of E. coli. CspA binds to RNA molecules and destabilizes stem loop structures to prevent and resolve misfolding of RNA [33].

Cellular processes and signaling
Flagella are filamentous structures that contribute to pathogenesis of pathogenic E. coli, mainly in motility, adhesion and biofilm production [34]. Generally, this organelle is constituted by basal body, hook and a filament that is composed by flagelin or flagellar antigen FliC, which belongs to the H-antigens group [35,36]. FliC was detected as highly abundant ( Table 1). In addition, a study performed in EPEC showed that FliC might be involved in the inflammatory response during the EPEC infection, due to the capacity of flagelin to induce interleukin-8 (IL-8) release in T84 cells [36]. During infection, E. coli is subject to different environmental conditions, for example, temperature changes that occur both in external ambient and within host. In our proteomic analysis, DnaK, GroEL and GroES were detected among the most abundant proteins ( Table 1). Studies have shown that these proteins contribute to the resistance process of EHEC under elevated temperature [37,38]. In addition, Kudva et al. [39] demonstrated that DnaK and GroEL were induced when EHEC was grown in bovine rumen fluid, thus showing the contribution of these proteins in the adaptation of EHEC to the bovine rumen.
Other type of stress commonly found by EHEC during the infection process is oxidative stress, which is generated by reactive oxygen species (ROS) such as superoxide anion (O 2 -) hydrogen peroxide (H 2 O 2 ) and the hydroxyl radical (OH � ) produced mainly by host immune response [40]. Thus, to adapt and survive under this stress condition, this bacterium presents different anti-oxidant systems. We detected two members of peroxiredoxins (Prxs) family: periplasmic thiol peroxidase (Tpx) and alkyl hydroperoxide reductase C (AhpC) system ( Table 1). These two antioxidant systems play an important role in the scavengers of H 2 O 2 and organic hydroperoxides [41,42]. Glutaredoxin 3 (Grx3) was also among the most abundant proteins ( Table 1). Grx3 is associated with Glutaredoxin (Grx) system, whose function is to reduce disulfide bond in target proteins to control the intracellular redox environment [43]. In addition, Smirnova et al. [44] showed that glutaredoxin proteins might be involved in the resistance of E. coli to antibiotics as ampicillin. Altogether, these different systems promote an efficient pathway of antioxidant defense in EHEC that contributes to the pathogenesis of this bacterium.
The ter operon related to tellurite resistance is widely spread in several Gram positive and Gram negative pathogenic species [45,46]. In EDL933, this operon is composed by six genes (terZABCDE). Among the proteins expressed by that operon, only TerC was absent from our proteomic analysis. Interestingly, TerD and TerE proteins were among the most abundant proteins of EHEC O157:H7 proteome ( Table 1). A study performed with Uropathogenic E. coli (UPEC) isolates showed that the introduction of the ter gene cluster contributes to improve bacterial fitness inside macrophages [47]. On the other hand, Yin et al. [48] demonstrated that ter genes contribute to adherence of EHEC O157:H7 to epithelial cells. However, the true role of these genes in the EHEC pathogenesis remains unclear. Although tellurium is absent from the EHEC niche, interestingly, proteomic studies have detected tellurium resistance proteins in EHEC O157 proteome in different media and growth conditions such as D-MEM [49], minimal medium [50], CHROMagar STEC [51], bovine fluid rumen [39] and under conditions that stimulate the quorum sensing pathway [52]. Despite the several studies in this area, more efforts are necessary to unveil the true role of the tellurium resistance proteins in EHEC pathogenesis.

Locus of Enterocyte Effacement (LEE)
The LEE is a pathogenicity island of 35.6 kb that is organized into five polycistronic operons (LEE1 to LEE5) and is an additional bicistronic operon of glr regulatory proteins [53]. LEE is related to intimate adherence of EHEC to cell host and is required for attaching and effacing (A/E) lesions, followed by the translocation of effector proteins that contribute mainly to host modulation of the immune system [54]. In addition, LEE contains the genes that encode the Type III secretion system (T3SS) as well as some effectors molecules that are exported by this system. The T3SS is responsible for the translocation of effectors from within the host cell, whose are directly involved in the EHEC pathogenesis, mainly in the host modulation of the immune system [54]. In this study the EHEC strains were grown in D-MEM, a medium known to induce expression of genes encoding T3SS [55]. We identified 24 LEE-encoded proteins (Fig 1C, S1 Table). Among these proteins, the most abundant were EspA (filamentous structure of the T3SS), Tir (translocated intimin receptor), EspB (pore formation and effector activity) and EspD (outer membrane adhesin) (Fig 1C).
Interestingly, these proteins play an important role in the E. coli O157 adhesion [56,57]. On the other hand, EspA, EspB, Tir and Intimin are potential vaccine candidates against EHEC infection [58,59]. EspA, which was detected as the most abundant protein of LEE, forms a channel that connect the bacterial cytoplasm with the host cell; this exportation conduct allows the translocation of effectors from within the host cell [60]. EspB together with EspD are responsible for the formation of the translocation pore and for the effector translocation of Tir. In addition, EspB can inhibit the interaction between myosin and actin, which promotes loss of microvilli and consequently contributes to the induction of diarrhea [61]. The interaction between Tir and Intimin contributes directly to EHEC O157:H7 persistence during the infection process [62,63]. Furthermore, Tir and Intimin are involved in the modulation of host immunity. Tir might inhibit tumor necrosis factor receptor-associated factor 6 (TRAF-6)mediated by NF-κB activation [64]. Instead, intimin can induce a T-helper cell type 1 response as well as to stimulate the proliferation of spleen CD4+ T lymphocytes and cells from lymphoid tissues [65,66].

Conclusion
In this work, we applied the quantitative proteomic (TMT)-based and emPAI analyses to estimate the quantification of EHEC O157:H7 proteome of combined proteomes of two EHEC O157:H7 isolates from Argentinian cattle and of the standard strainEDL933. These comprehensive proteomic analyses generated a quantitative dataset of EHEC proteome composed of a subset of proteins involved in different biological processes. All these proteins together might form a network of factors that play an important role in the pathogenesis and physiology of this pathogen. Altogether, the results presented in this study provide insights into the functional genome of EHEC O157:H7 at the protein level and could contribute to the understating of the factors associated with the biology of this pathogen.
Supporting information S1 Table. Total list of proteins identified and quantified by NSAF approach. (XLSX)