Proteomic Biomarkers Associated with Streptococcus agalactiae Invasive Genogroups

Group B streptococcus (GBS, Streptococcus agalactiae) is a leading cause of meningitis and sepsis in newborns and an etiological agent of meningitis, endocarditis, osteoarticular and soft tissue infections in adults. GBS isolates are routinely clustered in serotypes and in genotypes. At present one GBS sequence type (i.e. ST17) is considered to be closely associated with bacterial invasiveness and novel proteomic biomarkers could make a valuable contribution to currently available GBS typing data. For that purpose we analyzed the protein profiles of 170 genotyped GBS isolates by Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI). Univariate statistical analysis of the SELDI profiles identified four protein biomarkers significantly discriminating ST17 isolates from those of the other sequence types. Two of these biomarkers (MW of 7878 Da and 12200 Da) were overexpressed and the other two (MW of 6258 Da and 10463 Da) were underexpressed in ST17. The four proteins were isolated by mass spectrometry-assisted purification and their tryptic peptides analyzed by LC-MS/MS. They were thereby identified as the small subunit of exodeoxyribonuclease VII, the 50S ribosomal protein L7/L12, a CsbD-like protein and thioredoxin, respectively. In conclusion, we identified four candidate biomarkers of ST17 by SELDI for high-throughput screening. These markers may serve as a basis for further studies on the pathophysiology of GBS infection, and for the development of novel vaccines.


Introduction
Group B streptococcus (GBS), also referred to as Streptococcus agalactiae, is the leading cause of infection in newborns [1]. This bacterial pathogen is also a causative agent of invasive infections in adults such as meningitis, endocarditis, and soft tissue and osteoarticular infections [2][3][4]. Historically, the GBS isolates have been classified into ten different serotypes according to their capsule polysaccharides [5,6]. Although one of these serotypes, serotype III, is generally associated with late-onset neonatal disease [7], serotyping has turned out to be insufficient to distinguish isolates involved in other clinical outcomes of GBS infection. To improve the diagnostic and prognostic classification of GBS isolates, several molecular biology methods have been developed: multilocus enzyme electrophoresis [8], ribotyping [9,10], random amplified polymorphism DNA analysis [11], pulsed-field gel electrophoresis [12], and more recently, multilocus sequence typing (MLST) [13]. The application of MLST has contributed to better resolution of GBS isolates and the identification of bacterial genogroups more often associated with invasive infections in newborns [13]. MLST-based classification has been extended by multilocus variable number of tandem repeat analysis (MLVA) [14]. MLVA [13,14] and other genotyping studies [15,16] have shown that isolates belonging to one particular genotype cluster, the sequence type 17 (ST17), are associated with more invasive behavior, especially in the late-onset GBS disease in newborns.
A small number of genomic biomarkers of GBS virulence has recently been proposed [13,17,18] and several genes, including gbs2018, have been found to be associated with the ST17 genotype cluster [15,19]. However, these genes are found in no more than 70% of the cases of late-onset meningitis in newborns.
The principal aim of our study was to identify proteomic biomarkers of S. agalactiae genogroups that are commonly associated with invasive disease. We used the high-throughput technology Surface-Enhanced Laser Desorption/Ionization Timeof-Flight Mass Spectrometry (SELDI; SELDI ProteinChip) which allows generation and analysis of discriminating protein patterns from hundreds of samples that are tested in a single experiment [20,21]. Proteomic identification of the statistically significant biomarkers was facilitated by the availability of three complete genomes sequences of S. agalactiae strains A909, NEM316, 2603V/ R, and the incomplete genome sequences of five strains (18RS21, 515, CJB111, COH1, H36B) (http://cmr.jcvi.org/tigr-scripts/ CMR/shared/Genomes.cgi). We analyzed 170 isolates of S. agalactiae by SELDI ProteinChip analysis and found four biomarkers which were significantly associated with genogroups defined by MLST, and in particular for isolates from the invasive ST17 and for isolates belonging to closely related genotypes. The purification of these four biomarkers allowed proteomic determination of their primary sequence.

Bacterial isolates, serotypes and genotyping
The 170 GBS isolates used for SELDI profiling were obtained from cerebro-spinal fluid (CSF) of children with meningitis (n = 54), clinically healthy women with vaginal carriage of this bacterium (n = 54), the respiratory tract of patients with respiratory infections (n = 24), blood cultures from adults patients with endocarditis (n = 15) according to the modified Duke criteria [22], and milk samples from cases of bovine mastitis (n = 23). All GBS isolates were identified by Gram-staining, colony morphology, beta-hemolysis and Lancefield group antigen determination (Slidex Strepto KitH, bioMérieux, Marcy l'Etoile, France). In addition, the isolates were identified according to capsular serotype with the PastorexH rapid latex agglutination test (Bio-Rad, Hercules, USA), and by MLST and MLVA [13,14,23]. The isolates were representative of the S. agalactiae population and belong to the main clonal lineages defined by MLST [13]. MLST was performed previously and data were not available for the 24 isolates from patients with infection of the respiratory tract [14]. Briefly, PCR was used to amplify fragments of about 500 base pairs from seven housekeeping genes (adhP, pheS, atr, glnA, sdhA, glcK and tkt) as described by Jones et al. [13]. The seven PCR products were purified and sequenced, and an allele number was assigned to each fragment on the basis of its sequence. A sequence type (ST), based on the allelic profile of the seven amplicons, was assigned to each isolate. This previous work made use of the Streptococcus agalactiae MLST website (http://pubmlst.org/ sagalactiae/) [24]. Based on allelic profile data, a dendrogram was drawn using BioNumerics 6.5 software (Applied Maths, Sint-Martens-Latem, Belgium). An unweighted pair group method using arithmetic averages (UPGMA) was used for cluster analysis. Three reference strains of GBS with completely sequenced genomes (NEM316, 2603 V/R and A909) were used as controls.

Culture conditions
GBS bacteria were cultured for 24 hours in Todd-Hewitt broth under agitation at 37uC, and the cultures (10 ml) were centrifuged at 30006 g for 10 min and at 4uC. The cell pellet was washed in phosphate-buffered saline, pH 7.4 supplemented with PMSF at 0.2 mM final concentration. After centrifugation at 3 0006 g for 10 min at 4uC, the cell pellet was immediately frozen on dry ice and stored at 280uC.

Protein extraction
The frozen cell pellets were thawed and resuspended in 1 ml of lysis buffer (16 mM Na 2 HPO 4 , 4 mM NaH 2 PO 4 , 150 mM NaCl, 1% Triton X-100) supplemented with the protease inhibitor cocktail COMPLETE (Roche, ref. 11697498001). The suspension was transferred into a FastPROTEIN BLUE tube and homogenized in a FastPrep apparatus (MP Biomedicals) according to the following protocol: six cycles (40 sec each) at power setting 6, with cooling of the tubes on ice for 5 min between each cycle. After centrifugation at 15 0006 g for 15 minutes, the supernatants of each sample were divided into several aliquots and stored at 280uC.

ProteinChip array processing
Two types of ProteinChip ion-exchange arrays, Q10 and CM10, were assembled into a 96-well bioprocessor (Bio-Rad) and preactivated for 30 min with their respective buffers (100 mM Tris-HCl, pH 9.0 or 100 mM sodium acetate, pH 4.0). In the next step, 180 ml of the respective binding buffer for the array was mixed with 20 ml of the protein extract (previously adjusted to a final protein concentration of 0.5 mg/ml in all samples), and incubated for 60 min. All protein samples were tested in triplicate. After two washes with the binding buffers and one quick rinse with HPLC grade water, the spots were loaded twice with 1 ml of a saturated solution of sinapinic acid dissolved in 50% ACN (acetonitrile)/0.5% TFA (trifluoroacetic acid)(v/v). All steps were carried out at room temperature (18-20uC), using the Micromix-5 platform shaker and the robot-pipetting workstation Biomek 3000 (Beckman-Coulter). The arrays were processed in the PCS 4000 ProteinChip Reader (Bio-Rad) which was programmed in a positive ion mode and at ion acceleration potential of 20 kV.
Spectra processing and statistics using ProteinChip Data Manager 3.0.7 software (Bio-Rad) After calibration and normalization of all spectra using the total ion current method, clusters of peaks with the same mass were defined at the following settings: S/N (first pass) $5, minimum peak threshold: 20%, mass error: 0.3%, S/N (second pass) $2. Three types of computer-generated statistics were used for data analysis: the non-parametric Mann-Whitney U test, the Kruskal-Wallis H test, and the method of heat maps/hierarchical clustering.

Ion-exchange chromatography (IEX)
Protein extracts dissolved in the same lysis buffer as that used for SELDI EDM experiments were dialyzed overnight at 4uC, under agitation against 1000-fold volume of 20 mM Tris-HCl, pH 9.0. The dialyzed samples were fractionated by ion-exchange chromatography using HiTrap Q HP columns (Amersham, ref. 17-1153-01). All IEX steps were carried out at flow rate of 1 ml/min and the column was placed in a column oven at 20uC. A stepwise elution protocol was applied: (i) an initial isocratic step with buffer A (20 mM Tris-HCl, pH 9.5) for 5 min; (ii) a linear gradient between buffer A and buffer B (buffer A with 500 mM NaCl) for 15 minutes: (iii) an isocratic step for 5 minutes with buffer B; (iv) a linear gradient for 10 minutes between buffer B and buffer C (buffer A with 1 M NaCl).

Reversed Phase -High Pressure Liquid Chromatography (RP-HPLC)
Fractions from the IEX containing the target protein were further subjected to RP-HPLC on Stability columns (CIL Cluzeau, France) of two formats (either C4/300 Å /5 mm/ 250 mm64.6 mm or C8/100 Å /5 mm/250 mm64 mm) using the Perkin-Elmer HPLC system, series 200, and two buffers: buffer A (1% ACN/0.1% TFA) and buffer B (90% ACN/0.1% TFA). Elution from the C4 column involved an initial isocratic step for 10 minutes with buffer A followed by linear gradient between buffer A and B for 20 min. Elution from the C8 column involved an initial isocratic step with buffer A for 10 minutes followed by linear gradient between buffer A and B for the next 5 minutes reaching 75% of buffer B, a second isocratic step for the next 5 minutes with 75% of buffer B, linear gradient between buffer A and B to reach 100% of B for 5 minutes, and a final isocratic step with buffer B for 10 minutes. Other conditions for both C4 and C8 columns: flow rate 21 ml/min; temperature of the column oven 240uC; absorbance 2280 nm, fraction size 21 ml.
Mass spectrometry (MS)-assisted control of protein purity; tricine SDS-PAGE The fractions from RP-HPLC were concentrated 20-fold in a vacuum centrifuge (miVac, Genevac) to a final volume of ca. 50 ml. Aliquots of 3 ml of the concentrate were spotted on gold arrays (Bio-Rad) and tested in MALDI mode using the SELDI PCS 4000 apparatus with the following acquisition protocol: focus mass 10 000, laser energy 3 000, matrix attenuation 2 500, partition 1/1, 20 shots. The fraction containing the target protein was completely dried in the miVac. It was then reconstituted in tricine SDS sample buffer containing the NuPAGE reducing agent (Invitrogen). This mixture was then divided into two samples that were heated at 40uC for 30 min, and were separated in parallel by 1D Tricine SDS-PAGE using home-cast Tris-tricine gels (18%T/ 6% C; stacking gel: 2 cm/resolving gel: 16 cm), at a constant 30 V for 1 hour followed by a constant 60 mA for the next 15 hours. Three lanes, one on each side and one in the middle of the gel, were loaded with prestained molecular weight (MW) markers (Fermentas, ref. SM1861). After completion of the electrophoresis, the MW markers served to indicate the approximate position of the target proteins in the unstained gel; thirty 1 mm-thick gel slices covering two adjacent lanes expected to contain the same protein were excised. Each gel slice was further divided into two equal parts each corresponding to one lane. The target protein was extracted from one of these slices by passive elution as described previously [25], and the mass of the passively eluted target protein was confirmed on gold arrays. The proteins in the corresponding second slice were subjected to trypsin digestion and LC-MS/MS microsequencing as described below.

Trypsin digestion
The protein containing slices were destained in a solution of 25 mM NH 4 HCO 3 /50% ACN and rinsed twice in ultrapure water. They were then shrunk in 100% ACN for 10 min. ACN was removed and the gel pieces were dried at room temperature, covered with the trypsin solution (10 ng/ml, in 40 mM NH 4 HCO 3 and 10% ACN), rehydrated at 4uC for 10 min, and incubated overnight at 37uC, with rotary shaking. The supernatants were collected, and an extraction solution of ACN/HCOOH/H 2 O (47.5:5:47.5, vol:vol:vol) was poured onto the gel slices which were agitated for 15 min. This extraction step was repeated twice. The supernatants were pooled, concentrated in a vacuum centrifuge to a final volume of 25 ml, acidified by addition of 1.5 ml of 5% HCOOH, and stored at 220uC.

NanoLC-MS/MS analysis
Peptide mixtures from each gel slice were analyzed on an Ultimate 3000 Nano LC system (Dionex) coupled to a nanospray LTQ Orbitrap XL mass spectrometer (ThermoFinnigan, San Jose, CA). Ten microliters of peptide digests were loaded onto a 300 mm i.d. 65 mm C18 PepMapTM trap column (LC Packings), at a flow rate of 30 ml/min. The peptides were eluted from the trap column onto an analytical 75 mm i.d. 615-cm C18 PepMap column (LC Packings). The mobile phases were a mix of solvent A (0.1% HCOOH/5% ACN) and solvent B (0.1% HCOOH/80% ACN). Elution was performed using a 5-40% linear gradient of solvent B for 35 min. The separation flow rate was set at 200 nl/ min. The acquisition in a data-dependent mode alternated between an MS scan survey over an m/z range of 300-1700 and five to ten MS/MS scans, with collision-induced dissociation (CID) as activation mode. The MS/MS spectra were acquired using a 2-m/z unit ion isolation window and normalized collision energy of 35%. The dynamic exclusion duration was set at 30 sec and monocharged ions were rejected.

Database search and processing of results
SEQUEST was used through a Bioworks 3.1.1 interface (ThermoFinnigan, San Jose, CA) to search a subset of the NCBI non-redundant database restricted to Streptococcus agalactiae entries. Peak lists were created using extract-msn (BioWorks 3.3.1 Thermo Scientific) with the default settings. Data files in the DTA specific format (DTA stands for the extension ''.dat'') were generated from the MS/MS spectra that attained a minimal intensity (n$100) and a sufficient number of ions (n$5). The DTA file generation authorized the averaging of several MS/MS spectra corresponding to the same precursor ion with a tolerance of 50 ppm. Spectra from peptides with molecular masses between 600 Da and 4500 Da were retained. The search parameters were as follows: mass accuracy of the monoisotopic peptide precursor set to 10 ppm and that for the peptide fragments was set to 0.5 amu. Only b-ions and y-ions were considered for mass calculation. The oxidation of methionine (+16 Da) was considered as a variable modification. Two missed trypsin cleavages were allowed. Only peptides with Xcorr values higher than 2.0 (double charge), 2.5 (triple charge) and 3.0 (more than three charges) were retained. In all cases, we required the peptide p-value to be lower than 0.001 and the DeltaCn value to be above 0.1. All protein identifications were based on the detection of a minimum of two distinct peptides. Using these parameters, we did not detect any false positives. Shared peptides were only counted for the proteins that had the most matching peptides.

Assay on reproducibility
In order to evaluate reproducibility, we first checked to what extent selected cell growth and extraction conditions affected stability of the phenotypic protein profiles obtained with the SELDI ProteinChip. The S. agalactiae reference strain NEM316 was cultured independently six times under standard growth conditions, and the bacterial cells harvested were subjected to the same extraction procedure, as described in Materials and Methods. The samples were tested on two types of arrays, CM10 and Q10, using identical spectrum acquisition protocols within the range 3000-20000 Da. The protein patterns of the six cultures were perfectly reproducible with respect to presence of all peaks with S/N.2. The coefficients of variation between the global protein patterns were less than 20% for both arrays, and thus within the acceptable range for the SELDI methodology. The protein extract of isolate NEM316 was used as an additional intraand inter-array control and was systematically tested in all expression difference mapping (EDM) experiments.

EDM and candidate biomarker selection
The protein patterns of 170 isolates of GBS and of three reference strains of the same taxon were obtained on two types of ProteinChip arrays, CM10 and Q10. The spectral data obtained were subjected to statistical analysis: univariate p-value tests (Mann-Whitney and Kruskal-Wallis) and multivariate heat maps/ hierarchical clustering.
The initial multivariate analysis ( Figure 1) provides a global view of 164 protein biomarkers found in the five GBS groups of isolates (bovine mastitis, respiratory infections, meningitis, endocarditis, vaginal carriage) that are clustered by mass and intensity. Some combinations of biomarkers (arbitrarily grouped into the yellowand white-bordered rectangles on Figure 1) allow one or two GBS groups to be discriminated from the others, but none of these combinations was completely homogenous in terms of overexpression or underexpression (i.e. red or green boxes in the respective rectangles). Another practical consideration was that the number of candidate biomarkers was too large for parallel purification and identification by an academic laboratory like ours.
In order to limit the number of the candidate biomarkers while retaining the most statistically discriminant for further identification, we analyzed the same data with the non-parametric Kruskal-Wallis H test. This test allows simultaneous comparison of the five GBS groups of isolates. In contrast to the previous heat map analysis, the Kruskal-Wallis H test compares the intensity of single biomarkers between the five GBS groups.
In this way, 20 candidate biomarkers were selected with p,0.01 on the Q10 surface and 26 candidate biomarkers with p,0.01 on the CM10 surface (Table S1). However, none discriminated between all the five groups. Most of these biomarkers were overexpressed in one group of isolates, underexpressed in another, and expressed at an intermediate level in the other groups of isolates. Several candidate biomarker patterns were identified with at least two-fold differences in intensity which is a semiquantitative mode of measurement for protein expression), e.g.: (i) overexpressed in the endocarditis isolates and underexpressed in the vaginal carriage isolates: p14884 (CM10), p16015 (CM10), p9860 (Q10) and p8889 (Q10); (ii) overexpressed in the bovine mastitis isolates and underexpressed in the vaginal carriage and the meningitis isolates: p6258 (CM10), p5787(Q10), p6946 (Q10), p8144 (Q10), p8941 (Q10), p9762 (Q10); (iii) overexpressed in the bovine mastitis isolates and underexpressed in the respiratory infection isolates: p12205 (Q10); (iv) overexpressed in the respiratory infection isolates and underexpressed in the meningitis isolates (all detected on Q10): p4744, p4926, p6118, p10464, and p19375; (v) overexpressed in the meningitis isolates and underexpressed in the respiratory infection isolates (all detected on Q10): p6100, p6182, p7878, and p12200).
As a third statistical analysis, we subjected the spectral data to the non parametric Mann-Whitney test which can be applied only to two groups of variables. Therefore, we clustered the data into two groups according to their membership of the highly virulent ST17 genotype: ST17 and non ST17, irrespective of the isolate origin. An MLST analysis has been reported describing the membership of ST17 for 146 of the GBS isolates included in this study. These isolates belong to the vaginal carriage, meningitis, endocarditis, and bovine mastitis groups. MLST analysis has not been performed on the 24 GBS isolates from respiratory tract infections, and therefore the SELDI spectral data of this group were excluded from the Mann-Whitney test analysis.
We selected for further structural identification four protein biomarkers, identified both by the Mann-Whitney U test (clustering by ST17 genotype, Table 1), and by the Kruskal-Wallis H test (clustering according to the origin of the S. agalactiae isolates; scatter plots on Figure 2). These biomarkers were designated p6258, p7878, p10464 and p12200 where ''p'' stands for protein and the numbers indicate the mass in Daltons. The biomarker p7878 was of particular interest since it was not only discriminant with a ca. 7-fold difference in average intensity between the meningitis isolates and the respiratory infection isolates, but displayed more than a 2-fold difference between the meningitis isolates and the vaginal carriage isolates.
The level of expression of the biomarkers p6258, p7878, p10464 and p12200 in the 146 S. agalactiae isolates from the vaginal carriage, meningitis, endocarditis and bovine mastitis groups, as well as from the three reference GBS strains (NEM316, A909 and 2603V/R) was associated with the sequence types (STs) and the MLST groups, as defined by UPGMA analysis (Figure 3). The presence or the absence of these four biomarkers (Figure 3, right part) coincided with the nine MLST groups (A to I) ( Figure 3, left part). Thus, the presence of p6258 was associated with isolates in particular STs, e.g. ST1 with seven peaks in seven tested isolates (7/7), ST7 (3/3), ST22, ST61, and ST67 (2/2). The biomarker p7878, present in all except one of the 46 isolates belonging to ST17, is also found in all isolates of MLST group A and the closely related MLST group B. The p7878 biomarker is also detected in four isolates which are not clustered in these two MLST groups: ST26 (two isolates), ST300 and ST302 (each with one isolate). The biomarker p10464 is expressed by all of the 14 isolates of ST7, ST10 and ST12, and all 22 isolates belonging to MLST groups C, D and E, as well as in some isolates belonging to MLST group G (particularly ST1 and ST2). However, p10464 was not detected in any of the ST19 isolates of MLST group F (the closest to MLST group G). The biomarker p12200 is present in almost all isolates of MLST groups A and B (54 out of 56), similar to p7878. However, p12200 is also found in isolates of MLST groups E and F, whereas p7878 is not. Finally, the distribution of p10464 and p12200 among sequence types and MLST groups indicates that the presence of these two proteins appears to be mutually exclusive ( Figure 3).

Mass spectrometry (MS)-assisted purification and identification of biomarkers
The four selected biomarker proteins were purified in several steps including ion-exchange chromatography (IEX), reversed phase -high pressure liquid chromatography (RP-HPLC), and one-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (1D SDS-PAGE). Throughout purification, all fractions obtained were systematically tested for the presence of the target protein by SELDI (the IEX fractions) or matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI) (the RP-HPLC fractions and those of passive elution from the 1D SDS-PAGE gel slices). The rationale for this approach of mass spectrometry (MS)-assisted purification is that the first liquid chromatography step of fractionation by IEX is carried out under conditions very similar to those of the SELDI profiling experiment. For example, the biomarker p7878 was initially detected in the SELDI expression difference mapping (EDM) experiments on Q10 anion-exchange chips whose active group is a quaternary ammonium. For the initial IEX purification of p7878, we used an anion exchange resin with very similar chemistry to that of the Q10 SELDI chips (i.e. prepacked HiTrap Q columns) and the binding buffer was of the same composition and pH as that used in SELDI profiling (i.e. 100 mM Tris-HCl, pH 9). Forty IEX fractions obtained after application of a linear gradient of 0-500 mM NaCl were tested in SELDI for the presence of the target protein (p7878) under the same conditions as those of the initial EDM, i.e. the same ProteinChip array type, binding buffer, acquisition protocol with identical laser intensity, focus mass, matrix attenuation, spot partition, number of shots kept, etc. In this way, two or three consecutive fractions containing p7878 were selected. During the next RP-HPLC step, the proteincontaining fractions were dissolved in ACN/TFA, which allows their rapid concentration under SpeedVac conditions, as well as quality control tests on reusable GOLD arrays (in MALDI mode), in the same mass spectrometer (a PCS 4000) and with the same acquisition protocol. Finally, the proteins were extracted in organic solvent by passive dilution from the half of any gel slice obtained from the 1D SDS-PAGE step, which allows rapid SpeedVac concentration and quality control by MALDI. Only after this final confirmation both of target protein mass and of the absence of contaminants, the other half of the gel was used for sequencing by LC-MS/MS (liquid chromatography coupled to tandem mass spectrometry). For all mass spectrometry experiments (SELDI and MALDI), the spectra were calibrated using a set of reference proteins of known mass. This resulted in only small differences, less than 0.05%, in the masses observed for the target proteins, for example, the p7878 in all IEX, HPLC and 1D SDS-PAGE fractions displayed mass variations of no more than 63 Da.
The masses of the proteins identified by LC-MS/MS were compared in silico to those of peptides in the proteome database for the three complete genomes sequences of S. agalactiae strains A909, NEM316, 2603V/R, and the incomplete genome sequences of five strains (18RS21, 515, CJB111, COH1, H36B). Table 2 shows that the criteria for successful identification (number of different peptides, sequence coverage) are largely fulfilled, and summarizes selected characteristics of the four biomarkers, which were identified as CsbD-like protein (p6258), the small subunit of exodeoxyribonuclease VII (exoDNase VII) (p7878), thioredoxin (p10464), and the L7/L12 subunit of ribosomal protein 50S or RpL7/L12 (p12200).
The BLAST (Basic Local Alignment Search Tool) results of the first 100 sequences with closest homology/identity (Table 3) show that, with the exception of the CsbD-like protein, the three proteins we identified as biomarkers in S. agalactiae are highly conserved and share homology or identity with proteins in other streptococci. Comparison between the proteins in beta-hemolytic S. agalactiae and S. pyogenes reveals full sequence identity for thioredoxin, 80% maximum identity for the small subunit of exodeoxyribonuclease VII, and 77% maximum identity for the ribosomal protein L7/L12. These three proteins in S. agalactiae are most similar (72-83% identity) to those in alpha-hemolytic streptococci (S. pneumoniae, S. mutans, S. sanguinis) and enterococci (E. faecium). The maximum identity of these proteins between the taxon S. bovis and other taxa of streptococci is highest for the thioredoxin (83%), and slightly lower for the small subunit of exoDNase VII (81%) and RpL7/L12 (74%). The CsbD-like protein displays roughly 50% maximum identity with its structural homologs in other Streptococcus taxa (Table 3).

Discussion
We studied a large selection of 170 representative isolates of S. agalactiae to identify proteomic biomarkers that could better characterize their MLST genotypes [14]. Moreover, isolates belonging to the main STs have been found to be associated with particular clinical manifestations: isolates belonging to ST1, ST10 and some genotypically similar STs are often associated with infections in adults [26]; isolates from ST23, ST19 and other closely related STs are associated with vaginal carriage and early infections in newborns [27,28]; and isolates from ST17 and genotypically related isolates are associated with the late-onset infections in newborns [13,29]. Although it is tempting to associate these proteomic biomarkers with the different STs and MLST groups of S. agalactiae in the context of pathology, our hypothesis requires further experiments using other methods.
Despite extensive genotyping studies, relatively limited information is available about the proteins responsible for GBS virulence and invasiveness [19,29,30,31]. Mass spectrometry can be used to search for proteins with biological activities, and for bacterial classification. Several techniques are currently in use, including MALDI-TOF-MS, LC-ESI-MS and SELDI-TOF-MS. When applied to complex protein samples (e.g. bacterial extracts), MALDI and SELDI experiments detect ca. 100 different proteins, and typical LC-ESI-MS experiment may reveal more than 500 proteins. However, LC-ESI-MS is labor intensive, time-consuming and requires multiple replicates [32]. We used SELDI, in which proteins are defined by their m/z, and protein abundance was estimated in a semi-quantitative manner. Previous MALDI-based studies classified S. agalactiae isolates only according to patterns of protein masses [33,34]. By contrast, we determined protein abundance as being differently distributed among groups of isolates, and identified the primary sequences of some of these potential biomarkers. The predicted masses of the putative biomarkers are slightly higher than those identified by SELDI ( Table 2); this may be due to some low-level fragmentation resulting from either the FastPrep sonication of bacteria or proteolysis in the aqueous buffers used. Also, to maximize the purification yield, we used the S. agalactiae isolates with the highest detectable levels of the protein for each of the four biomarkers, but none of these isolates was a reference isolate with a completely sequenced genome. The BLAST data (Table 3) indicate that despite the homology, there may be differences in the primary sequences of the proteins produced by different S. agalactiae isolates.
According to the EMBL-EBI database (pfam05532), CsbD is a bacterial protein produced in response to general stress. Its expression in Bacillus subtilis is mediated by sigma-B [s(B)], an alternative sigma factor controlling the general stress regulon [35]. Sigma-B is activated in response to numerous physical stress stimuli and conditions of energy starvation. The exact role of CsbD in the stress response is unclear. In Escherichia coli, the putative stress-response protein YjbJ, identified by MALDI-TOF- Figure 2. Expression levels of four biomarker proteins produced by S. agalactiae isolates clustered into five groups according to origin/clinical outcome. X -axis: bovine mastitis (red), endocarditis (blue), meningitis (green), respiratory infections (magenta), vaginal carriage (cyan). Y-axis: protein abundance expressed as absolute intensity (mA/laser pulse). Each point represents the mean intensity of one sample tested in duplicate. P-values obtained with the Kruskal-Wallis H test: p6258 -,1.10 26 ; p7878 -0.000059; p10464 -0.09447; p12200 -0.000004. doi:10.1371/journal.pone.0054393.g002 Table 1. Biomarkers selected by SELDI profiling of two groups of S. agalactiae strains, positive or negative for ST17. TOF-MS/MS, is considered to belong to the CsbD family, but no particular function has been assigned to this protein [32]. We found no reports concerning CsbD-like proteins in S. agalactiae.
Thioredoxins are a family of small redox-active proteins that undergo reversible oxidation/reduction and help to maintain the redox state of cells. They serve as cofactors for a number of enzymes involved in the detoxification of reactive oxygen or nitrogen species. Thioredoxins serve as a cofactor in many thioredoxin reductase-catalyzed reductions in a manner similar to glutathione in thioltransferase reactions. In bacteria, thioredoxins contribute to various important functions such as DNA synthesis (thioredoxin is a hydrogen donor for ribonucleotide reductase), protein disulfide reduction, prevention of oxidative stress, protein repair by methionine sulfoxide reduction, and assimilation of sulfur by sulfate to sulfite reduction [36]. Eukaryotic thioredoxin may be a secreted growth factor or a chemokine for immune cells, which implies potential applications in cancer therapy [37,38]. There are structural differences between the bacterial thioredoxin reductases, which have low molecular weights, and their mammalian counterparts, which have high molecular weights.
These differences could be exploited for the treatment of infections using inhibitors specific for bacterial thioredoxin reductases [37].
The biomarker protein p12200, over-expressed in ST17 isolates of S. agalactiae, was identified as the ribosomal protein L7/L12 (RpL7/L12) that has been extensively studied in other bacterial species. RpL7/L12 is a ribosomal 50S protein found under two forms: RpL7 is the acetylated form of RpL12 [39]. In E. coli, two RpL7/L12 molecules dimerize and associate with another ribosomal protein, RpL10, to form a stalk complex interacting with translation factors during protein biosynthesis [40][41][42][43][44]. RpL7/L12 was identified with the translation factor EF-Ts in culture supernatants of group A streptococci of the M1 and M3 serotypes, suggesting secretion or specific release [44]. The overexpression of this ribosomal protein in S. agalactiae isolates may be associated with modulation of the translation of proteins including virulence factors. RpL7/L12 is believed to be a component of a ''divisome'' protein complex, together with the translation factors EF-Ts and the glucan-binding protein GbpB, involved in cell wall synthesis and expansion in E. coli [45], Streptococcus suis [46] and Streptococcus mutans [47]. RpL7/L12 is a highly antigenic and immunogenic protein [48][49][50][51][52] and its high Figure 3. Expression levels of p6258, p7878, p10464 and p12200 in 149 S. agalactiae isolates genotyped by multilocus sequence typing (MLST) and represented as a dendrogram showing genetic relationships among the different sequence types (STs). The dendrogram, based on MLST data, was constructed using BioNumerics 6.5 software (Applied Maths, Sint-Martens-Latem, Belgium). Cluster analysis was based on an unweighted pair group method using arithmetic averages (UPGMA). The nine main MLST groups (A to I) are indicated on the right of the dendrogram as vertical dotted lines. The allelic profiles corresponding to each sequence type (ST), the number of isolates belonging to each ST, the origin of the isolate, the serotype and the presence (+) or absence (2) of the biomarkers of interest are reported in detail. Within each ST, the presence of the biomarker is indicated as a ratio between the number of isolates with a detectable corresponding peak in SELDI and the number of all isolates in the ST. A grey scale also illustrates the prevalence of the biomarker (dark grey when the peak corresponding to the biomarker is detected in all isolates; light grey when the peak corresponding to the biomarker is partially detected). doi:10.1371/journal.pone.0054393.g003 degree of conservation among bacterial species is the cause of many serological cross-reactions. For example, RpL7/L12 purified from the gastric pathogen Helicobacter pylori cross-reacts with serum antibodies from Helicobacter pylori-negative patients [53]. Mucosal immunization of mice with RpL7/L12, glyceraldehyde-3-phosphate dehydrogenase and four other protein antigens of S. pneumoniae enhances bacterial clearance; this protein is therefore a promising vaccine candidate [54].
The most significantly differentially expressed biomarker identified in our study (more than 4-fold more abundant in the ST17 than other isolates) is the small subunit of exodeoxyribonuclease VII (exoDNase VII). It belongs to the family of exonuclease VII small subunits (Exonuc_VII_S superfamily; NCBI -PRK00977). ExoDNase VII (EC: 3.1.11.6) is composed of one large and four small subunits and catalyzes exonucleolytic cleavage in either 59-.39 or 39-.59 direction to yield 59-phosphomononucleotides. Its main biological function is its contribution to DNA mismatch repair (MMR) (http://www.ncbi.nlm.nih.gov/ biosystems/5044). MMR is a highly conserved biological pathway that plays a key role in maintaining genome stability. The Escherichia coli MMR pathway has been extensively studied [55]: MMR corrects DNA mismatches generated during DNA replication, thereby preventing mutations from becoming permanent in dividing cells. MMR also suppresses homologous recombination and was recently shown to play a role in DNA damage signaling [55].
In the study by Lartigue et al. [34] some ST17 isolates were distinguished from others according to their MALDI protein spectra. Interestingly, two proteins defined by their mass only (i.e. MW of 6258 Da and 7625 Da) seem to be very close to two candidate biomarkers identified in our study: CsbD-like (6258 Da) and exoDNase (7878 Da); this previous report agrees with our findings of exoDNase overexpression in ST17 isolates and CsbDlike protein underexpression in the isolates of other sequence types. However, the conditions of protein extraction and mass spectrometry processing in the two studies are different; Lartigue et al. [34] did not purify or sequence the proteins so the identity of the proteins they describe remains unknown.
In conclusion, we have identified four candidate biomarkers that are differentially associated with different genotypes. These findings seem particularly relevant to ST17 and ST17-related genotypes. The underexpression of thioredoxin and CsbD-like protein in some groups of isolates merits further study. The two other candidate biomarkers, RpL7/L12 and exoDNase, are overexpressed in ST17 isolates. Of the four biomarkers identified is exodeoxyribonuclease VII, an enzyme contributing to maintaining genomic stability and adaptative plasticity; we found it to be more than four times more abundant in isolates of the highly invasive ST17 than other isolates of S. agalactiae. Although the literature suggests possible involvement of these proteins in pathological mechanisms of infection by S. agalactiae, further studies comparing isogenic mutants in functional studies would allow meaningful insights. Their identification, and the availability of their gene sequences, means that it may be possible to produce them as recombinant proteins. This would allow specific antibodies to be raised against the whole molecules and/or to synthetic peptides mimicking their antigenic regions. Potentially, depending upon the biomarker, these reagents could be used for the development of novel vaccines or protein arrays.