The Core Proteome and Pan Proteome of Salmonella Paratyphi A Epidemic Strains

Comparative proteomics of the multiple strains within the same species can reveal the genetic variation and relationships among strains without the need to assess the genomic data. Similar to comparative genomics, core proteome and pan proteome can also be obtained within multiple strains under the same culture conditions. In this study we present the core proteome and pan proteome of four epidemic Salmonella Paratyphi A strains cultured under laboratory culture conditions. The proteomic information was obtained using a Two-dimensional gel electrophoresis (2-DE) technique. The expression profiles of these strains were conservative, similar to the monomorphic genome of S. Paratyphi A. Few strain-specific proteins were found in these strains. Interestingly, non-core proteins were found in similar categories as core proteins. However, significant fluctuations in the abundance of some core proteins were also observed, suggesting that there is elaborate regulation of core proteins in the different strains even when they are cultured in the same environment. Therefore, core proteome and pan proteome analysis of the multiple strains can demonstrate the core pathways of metabolism of the species under specific culture conditions, and further the specific responses and adaptations of the strains to the growth environment.


Introduction
Over 2500 serotypes have been reported in Salmonella, and most of them result in diarrhea. Within these serotypes, Salmonella enterica serovar Typhi and Paratyphi, can lead to systemic infections in humans, known as typhoid and paratyphoid fever. These diseases cause epidemics in Asia, Africa and Latin America [1,2]. Before the 1990s, S. Typhi was the main causative agent of enteric fever in southeast Asia and in China, but in the mid-1990s, the number of cases caused by S. Paratyphi A started to increase, and paratyphoid fever subsequently became the major enteric fever [3,4,5,6].
The whole genomes of some S. Typhi and S. Paratyphi A strains have been sequenced [7,8,9,10]. Genetically monomorphic genomes and relatively low sequence diversity were found, which may be the result of a high restriction of host adaption [11]. Multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) [12] were used to generate phylogenetic information and obtain a population variance analysis, and for S. Typhi and S. Paratyphi A genotyping. Genomic sequencing and a single nucleotide polymorphism (SNP) analysis provided high-throughput and high-resolution genome variation methodology [13], and were applied for the epidemic analysis of S. Typhi strains [14,15,16,17]. All of the results showed a low level of genetic variation in S. Paratyphi A, and a high clonality of strains involved in epidemics.
A genome comparison among different strains is used to identify the core genome and pan genome [18]. The core genome includes the core, conserved genes and surviving characteristics which keep the microorganism evolving. In contrast, the pan genome includes newly transferred genes, and demonstrates the diversity of the organism. Genome comparisons help investigators discover the divergence of the same genes between different organisms. However, a genome analysis cannot show the differences in the protein levels, which are the actual determinants of the growth and survival of the organism. Proteomic studies can illustrate the expression levels of various gene products under given culture conditions, discover the responses to different biological systems and uncover protein modifications and protein-protein interactions [19,20]. A comparison of the proteomes of different strains can indicate their shared and unique features. Besides the shared proteins, it may also help identify newly acquired gene products.
Many technologies for proteome analysis are in use [21,22]. In this study, we conducted a comparative proteomics analysis for four strains with different geospatial and temporal characteristics by performing 2-DE, and obtained their core and pan proteomes. We found that the proteome was highly conserved for the four S. Paratyphi A strains, consistent with the conservative genomes of S. Paratyphi A. However, some of the core proteins had significant differences in abundance among the strains, suggesting that there are variations in the protein expression in different strains, even though the strains have strict convergence in their genomes.

Strains
Among the strains collected during the surveillance of typhoid and paratyphoid fever in China, and from the PFGE (XbaI) subtyping database, we selected the S. Paratyphi A strains from patients in 2-DE analysis: YN07077 (isolated in Yunnan province in 2007) and GZ9A05036 (isolated in Guizhou province in 2005), which have the predominant PFGE subtype, and ZJ98053 (isolated in Zhejiang province in 1998), with the nondominant subtype, for the 2-DE analysis. Strain ATCC 9150, which was isolated in Malaysia in 1993 clinically, was also included for comparison, it has a different PFGE subtype from the other three strains.

PFGE
We performed PFGE according to the method previously conducted in the paper [23].

Protein Extraction
The protein samples used for 2-DE were prepared according to the protocol described in a previous study [24]. In brief, the strains were cultured in Colombia blood agar for 16-18 hours, then the cells were scraped from four plates (9 cm in diameter) and washed four times in ice-cold low salt PBS. The cells were resuspended in deionized water and urea (7 M), thiourea (2 M), CHAPS (4%) and IPG buffer (1%), then DTT (1%) was added respectively, in a final volume of 5 ml. A protease inhibitor cocktail tablet (Roche applied science) was added to each sample. The samples were sonicated to lyse the cells, then 125 mg RNase A and 50 U DNase were added. The samples were kept at ambient temperature for 1 hour to make proteins sufficient dissolution, centrifuged at 40,0006g for 1 hour, then the supernatant was collected and the protein content was quantified with the PlusOne Quant Kit. The samples (800 mg protein) were aliquotted and either directly used for IEF or frozen at 280uC until use.

2-DE and Image Scanning
Isoelectric focusing (IEF; 17 cm, pH 4-7, Bio-Rad; 18 cm, pH 6-11, Amersham Biosciences) and 12.5% sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) were performed according to the manufacturer's instructions (Bio-Rad, PROTEAN IEF CELL, Protean II Xi apparatus). Briefly, passive rehydration was performed for 4 hours, and active rehydration was performed for 8 hours at 50 V, and IEF was conducted using the following conditions: 300 V linear for 1 hour, 600 V linear for 1 hour, 1000 V linear for 1 hour, 8000 V linear for 1 hour and 8000 V rapid for 8 hours. After the IEF and equilibration, the proteins were transferred by SDS-PAGE, using 10 mA for the electrophoresis of each strip for 30 minutes, which was then increased to 30 mA until the bromophenol blue line just shifted off of the lower edge of the gel. The procedure was then stopped, and the gel was dyed with Coomassie blue G-250. The gels were scanned with a UMAX2100XL device (Umax Technologies Inc.). All the samples were replicated the same procedure for three times.

In-gel Protein Digestion and Identification
The Coomassie-stained protein spots were cut and in-gel protein digestion was conducted as the previously described protocol [25]. Protein identification was carried out by using tandem matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF/TOF) mass spectrometry (MS, 4700 MALDI-TOF/TOF Mass Spectrometer, Applied Biosystems) as described previously [26]. The spectrum of every sample was acquired in the mass range between 800 and 4000 Da by using 1500 laser shots. MS/MS spectra were acquired by using 2000 laser shots with air as the collision gas. The single charged peaks were analyzed by using an interpretation method provided in the 4000 Series ExplorerTM software version 3.0, which selected the five most intense peaks and automatically generated the MS/MS spectra by excluding the peaks associated with the matrix and those were formed due to trypsin autolysis. The spectra were processed and analyzed by the Global Protein Server Workstation (GPS Applied Biosystems, Foster City, CA, USA), which uses internal Mascot v2.1 software for searching the peptide mass fingerprints. The searches were performed by using the NCBI non-redundant protein database (ftp://ftp.ncbi.nih.gov/blast/db/FAST/nr.gz, updated in 2011) with the following criteria: NCBI bacteria database; trypsin digestion; Moxidation and iodoacetamide alkylation as the variable modifications; missed digestion site of 1; and the MS mass error of 0.1 Da. Identifications with a GPS confidence interval greater than 95% were accepted. The inversion database was used to remove false positives (Protein identification was listed in Table S1 and Table S2, MS map of some proteins was listed in Attachment S2).

Data Analysis
An analysis of the proteomic data was performed using the PDQuest TM Advanced 2-DE Analysis software program. We used the basic model and default parameters (Attachement S1). After matching the spots using the software program, we revised the protein spot identification manually. Each spot displayed in all four gels was allocated to the core proteins, while spots displayed in only one strain were considered to be specific proteins. The data could be output using the following steps within the same window: File, Export, Export (Text) Experiment, Spot data by gel. We selected the center position option, so the (X, Y) values for each protein could be obtained. To normalize the coordinate values, all of the core proteins in each strain were designated to use the same coordinate value as ATCC9150, while the other shared proteins (minus the core proteins) were normalized using the same coordinate values as ATCC 9150, ZJ98053 or YN07077. For example, when a spot was found for ATCC 9150, its coordinate value in all strains that displayed the spot was designated to be the same as in ATCC 9150. Spots that were not present in ATCC 9150, but were found for ZJ98053, were designated to be the same as in ZJ98053. If spots were not found in either ATCC 9150 or ZJ98053, but were displayed in the gels for YN07077, its coordinate value would be designated to be the same as that in YN07077. Specific proteins for each strain were assigned an original coordinate value.
A scatter plot for pan proteins was generated using the Origin software program (Origin Lab), since each protein in each strain has a specific coordinate value (X, Y). Red represented the core proteins shared by all four strains, blue represented ATCC 9150specific proteins, green represented ZJ98053-specific proteins, dark green represented YN07077-specific proteins, cyan represented GZ9A05036-specific proteins and black was used to indicate proteins other than the core and specific proteins.
Core protein and pan protein trend lines were generated using the Origin software program. A similarity matrix was generated according to the r values produced by the PDQuest software, version 8.0.1 (Bio-Rad).

The Core Proteome and Pan Proteome of the Epidemic Strains
The 2-DE was performed within two pH ranges for the four strains of S. Paratyphi A, and the scanned patterns were analyzed using the PDQuest software program (Fig. S1-S8). Within the range of pH 4-7, 849, 858, 857 and 860 spots were detected in the strains ATCC 9150, ZJ98053, YN07077 and GZ9A05036 respectively, and 380, 389, 366 and 355 spots were detected within the range of pH 6-11 in these strains. Any spot detected in all four strains was considered to be a core protein, and the total number of core proteins identified was 739 and 318 within the ranges of pH 4-7 and pH 6-11. The core proteins covered from 85.9%-87.0% of the spots within the range of pH 4-7 and 81.8%-89.6% of the spots within the range of pH 6-11 in each strain, which suggested a high similarity in the protein expression among S. Paratyphi A strains, indicating that the proteome was highly conserved.
Within the ranges of pH 4-7 and pH 6-11, there were 946 and 435 pan proteins for the four strains. Core proteins covered a proportion of 78.1% and 73.1% of the pan proteins, confirming their conservation.
To display the proportions of core proteins and strain-specific proteins in pan proteins, we drew scatter diagrams to show the pan proteome within the two pH ranges. The principle and process have already been described above. In brief, in the scatter diagrams, specific proteins in the four strains were represented by four different colors (there were no specific proteins for ZJ98053 within the pH range of 4-7). Core proteins are presented in red. The proteins other than the core proteins and specific proteins are shown in black (Fig. 1). We also presented a constitution map to show the proportion of core proteins and strain-specific proteins within the pan proteins.
The trend lines for the core and pan proteins exhibited the amount of protein change for each of the four S. Paratyphi A strains (Fig. 2). From strain ATCC 9150 to ZJ98053, which were isolated in 1993 and 1998, respectively, the number of pan proteins significantly increased. During this period, the incidence of paratyphoid fever increased dramatically in Southeast Asia and China. After adding strains YN07077 and ZJ98053, the slope of the increase slowed down, indicating that the proteome did not change very much. As far as the core protein trend line was concerned, it decreased quickly at the beginning and then slowed down, but the core proteins still covered a large proportion of the total proteins in each strain, suggesting that S. Paratyphi A has a conservative proteome.
The above data showed the expression level of proteins included in the core proteome, which include crucial proteins involved in the normal biological processes occurring within cells, which maintain the cells' survival and basic physiological processes. The core proteome was distinguished from the core genome, because the latter is only theoretically crucial, and the gene transcription has not been confirmed.
The two pairing proteome comparisons among these four strains displayed various similarities, which were somewhat consistent with the PFGE clustering. However, there were also many differences among the strains (Fig. 3). Strains YN07077 and GZ9A05036 were the closest (with similarity of 80.4%) in terms of the protein pattern, and they had the same PFGE pattern. Compared to ATCC 9150, ZJ98053 was more similar to YN07077 and GZ9A05036 in terms of the proteome pattern, with similarity values of 79.44% and 78.16% respectively. Strain ATCC 9150 showed less similarity to YN07077 and GZ9A05036 (74.3% and 71.7%) than strain ZJ98053. Since strain ATCC 9150 was isolated in Malaysia in 1993, while strains ZJ98053, YN07077 and GZ9A05036 were from adjacent provinces in China, this suggests that the geospatial and temporal characteristics of the strains influence their proteomic pattern. In terms of the PFGE subtyping, strain ZJ98053 showed a nondominant pattern, ATCC 9150 showed a subdominant pattern and strains YN07077 and GZ9A05036 showed a predominant pattern. Strain ATCC 9150 was closer to YN07077 and GZ9A05036 than to ZJ98053 in terms of PFGE clustering. The differences in the proteomic and genomic patterns were likely due to the fact that the proteomic studies explored the more rapid proteomic response in cells when they were adapting to the environment around them, while the genome may take a longer time to show changes.

Constitution of the Expressed Proteins
Among the core proteins, the largest functional category was energy metabolism, then protein fate, protein synthesis, cellular processes, transport and binding proteins, central intermediary metabolism, etc. (Fig. 4). The functional constitution of the pan proteins other than the core proteins was slightly different from that of the core proteins. Energy metabolism was still the main category, but transport and binding proteins was the second most common functional category (Fig. 5).

Diverse Expression Levels of the Core Proteins
Although these four S. Paratyphi A strains had a conserved proteome and they shared over 80% of their proteins, differences in the abundance of some protein spots were observed among the strains. Fig. 6 showed that some spots had a higher abundance in ATCC9150 than in the other three strains. Fig. 7 showed that other spots had a lower abundance in ATCC9150 than in the other three strains. Of these differentially-expressed spots, strain ZJ98053 had a more consistent protein expression level with YN07077 and GZ9A05036 compared to ATCC 9150, however, its proteome had a higher regression value with strain ATCC 9150 than with the other two strains (Fig. 3).

Strain-specific Proteins
We blasted all the coding genes for the MS identified proteins to ATCC9150 genome, but did not find new acquired gene products. All proteins were variants of the core proteins and non-core proteins.

Discussion
In this study, the core proteome and pan proteome of four S. Paratyphi A strains cultured under laboratory conditions were compared, based on the core genome and pan genome comparison method. The previous genome comparisons revealed that S. Paratyphi A was highly clonal [10,17]. We also found that there was limited genetic diversity in terms of the level of protein expression when strains were cultured under the same conditions. In the four tested strains, the core proteins covered a large proportion (.70%) of the pan proteomes. For each strain, the core proteins covered a proportion from 81.8% to 89.6% of the global proteins. Thus, the proteome of S. Paratyphi A was also highly conserved, which was consistent with the highly clonal genome.
The PFGE cluster analysis showed that strain YN07077 had the same pattern as GZ9A05036, less similarity with ATCC9150 and much less similarity with ZJ98053. Nevertheless, based on the regression matrix derived from the proteomic analysis, strain ZJ98053 was approaching YN07077 and GZ9A05036 in similarity, with less in common with ATCC9150. In terms of the amount of proteins, strains YN07077, GZ9A05036 and ZJ98053 had 813 (pH 4-7) core protein spots, which decreased to 739 (pH 4-7) after adding strain ATCC 9150, which indicated that similar genomes do not necessarily result in similar proteomes. Although the S. Paratyphi A strains had both conservative proteomes and genomes, they actively displayed distinct metabolic and other characteristics, which were not apparent at the genome level. Moreover, strains YN07077, GZ9A05036 and ZJ98053 were isolated from very close geographical regions, which might be the epidemiological basis for their high similarity in terms of the proteome, and their trend lines for core proteins and pan proteins exhibited no big changes and there were not significant differences between their proteomes, suggesting that the genomes and expression profiles of these strains were quite conservative, and that they had undergone stable evolution.  According to the functional classification of core proteins, the function of most of the core proteins was mainly focused on the survival of the organisms. Interestingly, some of the pan proteins (excluding the core proteins) fit in similar functional categories, which may reflect the high concordance of the expression profiles of these strains based on their conservative genomes. When the bacteria were grown under nutrient-rich conditions, the spread of the functional classification was nonspecific, because they were mainly experiencing routine metabolism that did not require new adaptations to improve survival.
Although S. Paratyphi A had a highly conserved proteome in terms of the protein species, some core proteins had significant fluctuations with regard to their abundance between strains. Strains YN07077, GZ9A05036 and ZJ98053 had some protein spots that were expressed at a similar abundance, such as spots SSP 3403, SSP 3806, SSP 4302, SSP 6304 and SSP 7806 at pH 4-7 and SSP 7117 at pH 6-11, which were expressed at a much higher abundance in these three strains than in the ATCC9150 strain. Both spots SSP 3403 and SSP 3806 were identified as outer membrane protein A, the surface-exposed porin proteins in high-copy number [27], which may play an important role in the structural stability and in the maintenance of the cell morphology, but has low-efficiency porin activity [28,29,30,31]. It exposes to and interactes with outside circumstance factors. Their variants with subtle difference on modifications might adapt to diverse environments and host immunity, which might subsequently develop to inherited and characterized phenotypes. SSP 4302, SSP 7802 and SSP 7806 were correlated to the central  stationary-phase-specific sigma subunit of RNA polymerase s s [32,33], SSP 4302 (arcA) is a negative regulator for rpoS [34], SSP 7802 and SSP 7806 were positively regulated by rpoS [35]. It has been proved that rpoS was essential for Salmonella virulence, rpoS mutant of serovar Typhi is less cytotoxic for macrophages than the parental strain, therefore rpoS maybe involved in the virulence of serovar Typhi [36]. S. Paratyphi A has similar infection mechanism to S. Typhi, we could speculate that different growth status and cytotoxicity of bacteria might result in diverse expression of response factors in the regulative cascade.
However, some core proteins had a higher abundance in the ATCC9150 strain than in the other three strains, such as SSP 2402, SSP 5309 and SSP 6613. Genes of SSP 2402 (rbsK) and SSP 5309 (rbsB) locate in the same operon, which participate D-Ribose transportation and utilization [37]. This operon is transposable [38]. Up to now the real role of the higher expression is still unknown, but it may imply their biological roles in vary degrees in different strains and need further studies in detail.
ZJ98053 was in the middle in terms of its year of isolation (1998) compared with the other three strains (1993 for ATCC9150, 2005 for GZ9A05036 and 2007 for YN07077), but it was geographically close to strains GZ9A05036 and YN07077, and it exhibited high genomic similarity to ATCC9150 and high proteomic similarity to YN07077 and GZ9A05036. It also showed independent characteristics from all the other strains. For example, spots SSP 3204, SSP 6703 and SSP 1405 were more abundant in strain ZJ98053 than in the other three strains, which suggests that ZJ98053 might have evolved separately from the other three strains.  The above differentially-expressed core protein spots were spread throughout various metabolic pathways. The variable expression levels of core proteins revealed the metabolic diversity present in the different strains. Thus, even core proteins produced under the same culture conditions can display diverse expression levels and different modifications to exert different functions, which eventually become a characteristic genetic phenotype [39,40]. Such phenotypes were common in this study, and may have been connected to the function of the individual proteins.
With regard to the specific spots, we blasted (http://blast.ncbi. nlm.nih.gov/Blast.cgi) the gene sequence for the ATCC 9150 genome, and found that there were limited differences caused by the differences in the genome or pseudogenes. Most differentiallyexpressed spots were considered to have been caused by differences in the transcription level or post-translational modifications.
A high-throughput genome comparison can provide a detailed gene map, including the genes, their arrangement, recombination, pseudogene accumulation and similarity between strains, whereas information about the gene expression, protein modification and regulatory network cannot be obtained from such studies. Different expression profiles (including the protein species and their abundance) can be observed even when strains have the same or similar gene clusters, since large differences can arise due to differences in the gene expression, regulatory networks and protein modifications. Thus, biological studies, and interpreting the results of such studies, remain challenging even when the whole genome sequences are known. Proteomic studies can provide information about the true expression of the genes under the studied culture condition, and the core proteome reveals the conservative expression of the genomes of different strains under this condition. Further, proteomic comparisons may show the genome-based differences, and even evolutionary relationships, among the strains, even when the genome sequences are unknown.
In summary, we herein compared the core proteome and pan proteome of S. Paratyphi A strains isolated during recent epidemics. Our results may provide a new approach to analyzing the expression profiles of strains at the species level, which can help to understand their genetic differences, without requiring the genomic sequence, and can facilitate understanding their common biological processes under specific conditions, which will provide information about their fundamental metabolism and survival strategies. In addition, more sensitive and high-throughput technology, such as iTRAQ-based LC-MS/MS analyses, may make it possible to perform large scale analyses of proteomic data, and may also provide information for a powerful database that can be used to assess newly-identified or emerging strains. Table S1 Protein identification for the global spots of strain ATCC9150 and differential spots of strain ZJ98053, YN07077 and GZ9AO5036 within pH range 4-7.