The Construction and Evaluation of Reference Spectra for the Identification of Human Pathogenic Microorganisms by MALDI-TOF MS

Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is an emerging technique for the rapid and high-throughput identification of microorganisms. There remains a dearth of studies in which a large number of pathogenic microorganisms from a particular country or region are utilized for systematic analyses. In this study, peptide mass reference spectra (PMRS) were constructed and evaluated from numerous human pathogens (a total of 1019 strains from 94 species), including enteric (46 species), respiratory (21 species), zoonotic (17 species), and nosocomial pathogens (10 species), using a MALDI-TOF MS Biotyper system (MBS). The PMRS for 380 strains of 52 species were new contributions to the original reference database (ORD). Compared with the ORD, the new reference database (NRD) allowed for 28.2% (from 71.5% to 99.7%) and 42.3% (from 51.3% to 93.6%) improvements in identification at the genus and species levels, respectively. Misidentification rates were 91.7% and 57.1% lower with the NRD than with the ORD for genus and species identification, respectively. Eight genera and 25 species were misidentified. For genera and species that are challenging to accurately identify, identification results must be manually determined and adjusted in accordance with the database parameters. Through augmentation, the MBS demonstrated a high identification accuracy and specificity for human pathogenic microorganisms. This study sought to provide theoretical guidance for using PMRS databases in various fields, such as clinical diagnosis and treatment, disease control, quality assurance, and food safety inspection.


Introduction
The rapid and accurate identification of pathogens plays an important role in public health-related fields, such as the clinical diagnosis of infections, the prevention and control of infectious diseases, and food safety inspection. An ideal diagnostic method for infectious diseases should not only be fast, reliable, and safe but should also generate easily interpreted results at a reasonable cost. Existing pathogen diagnostic methods include immunological approaches, molecular diagnostic techniques, and identification through phenotypic characteristics and/or biochemical reactions. Although these different diagnostic methods offer various advantages, none of these approaches are capable of ideally satisfying all pathogen diagnostic requirements or are applicable for highthroughput pathogen diagnosis. Therefore, the exploration of new diagnostic methods has continued to remain a focus of research efforts. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is an emerging technique for the rapid and high-throughput identification of microorganisms. Typically, a new approach must undergo long-term and thorough verification before it is accepted for widespread global use. At present, scientists around the world continue to evaluate and improve the capabilities of MALDI-TOF MS for the identification of various samples, including clinical isolates [1][2][3], yeasts, fungi [4][5][6], and aerobic, microaerophilic, and anaerobic microorganisms [7][8][9] Researchers are also striving to accurately assess the capabilities of the MALDI-TOF MS approach for the identification of specific pathogenic genera and species [10][11][12][13][14][15][16][17][18]. However, there remains a dearth of studies in which numerous pathogenic microorganisms from a particular country or region are utilized for systematic analyses of the MALDI-TOF MS system. These analyses are essential for the detailed evaluation of this system.
In this study, peptide mass reference spectra (PMRS) were constructed from numerous human pathogens, including enteric, respiratory, zoonotic, and nosocomial pathogens, that were selected based on infectious disease classifications and the importance, reported risks, and potential risks posed by each pathogen. These PMRS were used to enhance an existing database that had previously included spectra for numerous Western strains. Furthermore, in this investigation, systematic analyses of the enhanced database were conducted to evaluate the accuracy and specificity of pathogen identification. This study provides theoretical evidence for the applicability of the MALDI-TOF MS approach in various fields, such as clinical practice, the prevention and control of infectious diseases, quality assurance, and food safety inspection.

Materials and Methods
The selection and confirmation of pathogens In this study, a total of 1019 strains of enteric, respiratory, zoonotic, and nosocomial pathogens were selected based on infectious disease classifications and the importance, reported risks, and potential risks posed by the infectious diseases associated with these pathogens. These 1019 strains included 74 international standard strains and 945 (92.7%) isolated strains. In particular, 921 (90.3%) of the selected strains were Chinese isolates obtained in different years from various geographical locations. All strain identities were confirmed by molecular and biochemical analyses, and strains were cultured in accordance with appropriate standard culture methods.

Sample preparation and data acquisition
Samples of the examined strains were pre-extracted using previously described procedures [2]. Samples from liquid Mycoplasma, Leptospira, and Borrelia burgdorferi cultures were subjected to the following treatment prior to protein extraction. Cultures were collected and centrifuged at 12,0006g at 4uC for 10 min, and the resulting supernatants were discarded. Cell pellets were resuspended in sterile physiological saline and then centrifuged at 12,0006g at 4uC for 10 min; the resulting supernatants were again discarded. A Microflex LT (Bruker Daltonics, Bremen, Germany) MALDI-TOF MS system was used in this study, and the database of PMRS consulted in this investigation was provided by the Biotyper 3.0 software package (Bruker Daltonics, Bremen, Germany). MALDI-TOF MS Biotyper system (MBS) parameters were established and sample acquiring methods were performed in accordance with previously described procedures [2]. Twenty spots were dropped onto a sample target for acquiring twenty spectra for a reference spectrum construction. Two spots were prepared for each strains for validation. The highest peak intensity of each spectrum was more than 10,000.

The construction of PMRS
PMRS were constructed as previously described [19]. The parameters used were as follows: Desired mass error for main spectra projection (MSP), 200; desired peak frequency minimum, 25%; and max. desired peak number for the MSP: 70. For each database entry, 20 individually measured mass spectra were imported into the MSP.

The evaluation of PMRS
A cross-validation method was utilized in which database searches were performed that excluded reference spectra for the strain that was to be identified. Species containing only one strain were not validated. Identification was determined using Biotyper system parameters in which all scores ranged from 0 to 3, and identifications with scores higher than 1.7, 2.0, and 2.3 were regarded as genus identifications, species identifications, and species identifications at a high confidence level, respectively. Identification performances were classified into the following three categories: a lack of identification capability, misidentification, and accurate identification. In this study, these categories were used for database analyses to evaluate the identification accuracy and specificity (the misidentification rate) generated using the original reference database (ORD) and the new reference database (NRD) of the Biotyper system at both the genus and species levels.

Statistical analysis
The SPSS 19.0 software package was used to analyze experimental data. The same set of spectra was used in queries of the two different databases (ORD and NRD) examined in this experiment, and statistical analyses were performed by utilizing the resulting scores as the two variables for paired t-tests. P,0.05 was regarded as a significant difference.

Results
The construction of the PMRS and supplementation of the ORD In this study, PMRS were constructed from 1019 pathogen strains (886 bacterial strains, 64 Mycoplasma strains, and 69 spirochaetal strains) from 94 species (83 bacterial species, 6 Mycoplasma spp., and two Spirochaeta spp.) in 31 genera. The examined strains included 479 strains from 46 species of enteric pathogens, 211 strains from 21 species of respiratory pathogens, 152 strains from 17 species of zoonotic pathogens, and 177 strains from 10 species of nosocomial pathogens (The raw peak list information of the strains (exclude the high pathogenic microorganisms) used in this study were supplied as File S1). The ORD was enhanced by the addition of not only novel PMRS for 23 species in four genera (Brucella, Leptospira, Bartonella, and Mycoplasma) but also supplementary PMRS for 380 strains from 52 species, which were used to address deficiencies in database content ( Table 1).

Evaluations of MBS identification accuracy
Of the 1019 pathogen strains used in this study, 720 strains (70.7%) and 517 strains (50.7%) were accurately identified at the genus and species levels, respectively, in searches of the ORD. Cross-validation analyses indicated that of the 995 strains remaining after the exclusion of the 24 strains that were the only examined strain for a species, 711 strains (71.5%) and 510 strains (51.3%) were accurately identified at the genus and species levels, respectively, in searches of the ORD (which included 3995 PMRS). In contrast, 992 strains (99.7%) and 931 strains (93.6%) were accurately identified at the genus and species levels, respectively, in searches of the NRD (which included 5014 PMRS). To exclude differences in identification accuracy produced by a lack of appropriate PMRS at the species level, analyses were performed using only pathogens with PMRS in the ORD. In these analyses of 631 strains, 572 strains (90.6%) and 510 strains (80.8%) were accurately identified at the genus and species levels, respectively, in searches of the ORD, whereas 630 strains (99.8%) and 620 strains (98.3%) were accurately identified at the genus and species levels, respectively, in searches of the NRD ( Figure 1).

Analyses of MBS identification specificity
Of the 995 examined pathogen strains, 12 strains (1.2%) and 126 strains (12.7%) were misidentified at the genus and species levels, respectively, in searches of the ORD. At the genus level, 11        Shigella strains were misidentified as Escherichia strains, and one Cronobacter strain was misidentified as an Enterobacter strain; at the species level, major misidentifications included 31 species from 8 genera, including Salmonella, Shigella, Cronobacter, Enterobacter, Vibrio, Aeromonas, Burkholderia, and Acinetobacter (Table 1, Figure 2 A1). In searches of the NRD, one strain was misidentified at the genus level (0.1%); one Shigella strain was misidentified as an Escherichia coli. At the species level, 54 strains (5.4%) were misidentified; specifically, Vibrio cholera and Vibrio mimicus strains were easily misidentified as strains of the closely related species Vibrio albensis, poor species identification was observed for Cronobacter spp., and Enterobacter cloacae and Enterobacter asburiae were indistinguishable. In addition, suboptimal species specificity was observed for various Salmonella serotypes, four Shigella serogroups, and Brucella species. In total, species-level misidentifications involved pathogenic strains from 8 genera and 25 species (Table 1, Figure 2 A2). The 631 examined strains with PMRS in the ORD were all correctly identified at the genus level in searches of either the ORD or the NRD. At the species level, searches of the ORD led to the misidentification of 12 strains (1.9%), including Enterobacter cloacae, Salmonella Stanley, Vibrio mimicus, Aeromonas caviae, and Aeromonas hydrophila (Table 1, Figure 2 B1), whereas searches of the NRD led to the misidentification of four strains (0.6%), including Enterobacter cloacae, Aeromonas hydrophila, and Acinetobacter pittii (Table 1, Figure 2 B2).

Discussion
Studies by other researchers and our preliminary investigations have demonstrated that database quality is a key factor affecting the application of an MBS for microorganism identification. A complete database of microorganism PMRS is a prerequisite for ensuring that the MBS will exhibit strong identification capabilities [11,14,[19][20][21]. The pathogens examined in this study included 94 different pathogen species (bacterial species, Mycoplasma spp., and Spirochaeta spp.) that accounted for 60.6% of the 155 pathogenic species documented in the ''Catalog of Human-borne Pathogenic Microorganisms''. This catalog, which was published by the Ministry of Health of the People's Republic of China, includes various bacterial, fungal, Chlamydia, Mycoplasma, Rickettsia, and Borrelia species. With respect to diseases, enteric pathogens (46 species, 479 strains), respiratory pathogens (21 species, 211 strains), zoonotic pathogens (17 species, 152 strains), and nosocomial pathogens (10 species, 177 strains) were included in this study. The pathogens of interest included highly pathogenic species, such as Vibrio cholerae and Brucella spp., and common foodborne pathogens (Campylobacter jejuni, Staphylococcus aureus, the enterohemorrhagic Escherichia coli O157:H7, Salmonella spp., Shigella spp., Vibrio cholerae, Vibrio parahaemolyticus, Proteus spp., and Bacillus cereus). The wide range of pathogen PMRS constructed in this study contributed to not only supplementing the Biotyper ORD at the genus and species levels but also adding data for numerous new pathogens to this database; thus, our results provide fundamental support for the application of the MBS in various fields, such as the clinical diagnosis of infections, the prevention and control of infectious diseases, food safety inspection, and quality assurance for export and import goods.
Previous studies reported MBS identification accuracies of 79.7-100% at the species level [11,[15][16][17][18][22][23][24][25][26]. However, in this study, the accurate identification rates for 1019 strains in searches of the ORD (with 3995 PMRS) were 70.7% at the genus level and only 50.7% at the species level. The expansion and enhancement of the ORD to form the NRD (with 5014 PMRS) resulted in improvements of 28.2% (from 71.5% to 99.7%) and 42.3% (from 51.3% to 93.6%) in identification rates at the genus and species levels, respectively. This result fully validated the strength of MBS identification capabilities. Two major factors affect MBS identi- Figure 1. Comparisons of the identification accuracies generated using the ORD and the NRD for pathogenic microorganisms. A: The identification of 1019 strains using the ORD; B: The identification of pathogens (after excluding 24 strains from single-strain species) using the ORD and the NRD; C: The identification of pathogens (after excluding 24 strains from single-strain species and strains without reference spectra in the ORD) using the ORD and the NRD. doi:10.1371/journal.pone.0106312.g001 fication accuracy. The first of these factors is the pathogen PMRS included in the database. However, the ORD could not satisfactorily identify varietal strains because of the low number and diversity of PMRS in this database. For example, Acinetobacter pittii and Acinetobacter nosocomialis, which are strains with low pathogenicity, are frequently misidentified as Acinetobacter baumannii in clinical diagnostics [27], leading to the adoption of inappropriate therapeutic approaches. However, in this study, a complete PMRS database enabled Acinetobacter baumannii to be accurately distinguished from Acinetobacter pittii and Acinetobacter nosocomialis. The second factor affecting MBS identification accuracy is that MBS databases lack PMRS for diverse highpathogenicity strains, including various Brucella, Leptospira, and Bartonella strains; thus, these strains cannot be identified using the MBS. The majority of MBS users can provide only limited contributions to database improvement due to strain-related resource limitations. MBS usage demands vary by field, and pathogen identification procedures that solely rely on commercial databases cannot meet the identification needs of certain fields. The ideal solution to this issue is to develop a global platform based on a MALDI-TOF MS identification system that allows the merits of this system, such as its rapidity, accuracy, and applicability for high-throughput procedures, to be fully realized through data sharing.
High specificity is critical for pathogen identification. The consequences of pathogen misidentification are typically more harmful than the consequences of an inability to identify a pathogen because identification results greatly affect subsequent Figure 2. A statistical chart indicating misidentification rates before and after database expansion. A: Distributions of pathogen misidentifications (after excluding 24 strains from single-strain species) generated using the ORD and the NRD; A1: The distribution of pathogen misidentifications generated using the ORD for 995 strains; A2: The distribution of pathogen misidentifications generated using the NRD for 995 strains; B: Distributions of pathogen misidentifications (after excluding 24 strains from single-strain species and strains without reference spectra in the ORD) generated using the ORD and the NRD; B1: The distribution of pathogen misidentifications generated using the ORD for 631 strains; B2: The distribution of pathogen misidentifications generated using the NRD for 631 pathogenic strains. Green indicates the strains that were misidentified using the ORD but accurately identified by searching the NRD. Bold text indicates strains that were misidentified by the NRD. doi:10.1371/journal.pone.0106312.g002 Identification of Pathogenic Microorganisms by MALDI-TOF MS PLOS ONE | www.plosone.org treatment and control strategies in a variety of fields, including clinical diagnosis, the prevention and control of infectious diseases, and food safety inspection. The main genus-level misidentifications produced by the MBS were the misidentification of Shigella and Cronobacter strains as Escherichia and Enterobacter strains, respectively. At the species level, the misidentification of Salmonella and Brucella species accounted for 53.7% and 25.9%, respectively, of the total misidentification events. A clear understanding of the differences in identification specificity among different pathogen peptide mass spectra will facilitate the development of scientific approaches to solve identification specificity-associated problems. For Shigella and Escherichia coli, neither the commercial Biotyper database nor the SARAMIS database contains Shigella PMRS; as a result, all tested Shigella strains would be identified as Escherichia coli strains, leading to misidentifications that do not improve system identification capabilities. Thankfully, we found that building the GA model using the ClinProTools software can distinguish between Shigella and Escherichia coli using peptide mass spectra (data not shown), which has also been reported recently [28]. In this study, only 8 species-level misidentifications out of the total of 54 species-level misidentifications were observed among Salmonella with numerous PMRS. Numerous PMRS were also available for Brucella melitensis and Brucella abortus, and these two species could be specifically identified. Overall, the use of the NRD instead of the ORD reduced misidentification rates by 91.7% and 51.7% at the genus and species levels, respectively. Thus, the database enhancements produced by this study significantly reduced misidentification rates at both the genus and species levels. The study results suggest that instead of excluding strains with low identification specificity from MBS databases, large quantities of PMRS must be constructed for these strains. Depending on the database of the user, additional tests to verify the identities of these strains may be required, and alternative identification methods should be utilized as needed.
Of the 1019 pathogen strains identified in this study, 90.3% were Chinese isolates; moreover, 91.9% of the 593 Chinese isolates with PMRS in the ORD produced spectra matching the PMRS for the Chinese isolates that were newly incorporated into the NRD. Furthermore, statistical assessments revealed significant geographical distribution patterns (Table 2), most notably Helicobacter pylori. After the incorporation of PMRS for Chinese isolates into the NRD, the identification rates for 56 Helicobacter pylori strains in this study increased from 5% to 93%. Microorganism identification using the MBS involves the examination of peptides or small protein molecules with molecular weights of 2-20 kDa; the majority of these molecules are ribosomal proteins [21,29]. It has been demonstrated that the toxicity and pathogenicity of diverse pathogens can vary with geographical location. This geographical specificity suggests that for certain strains, ribosomal proteins might differ in various locations; as a result, these strains might exhibit varying identification specificities in different regions, and different identification potentials for these strains might be observed if the same pathogen identification system is used in different countries. The geographical specificity of peptide mass spectra of various pathogens suggests that MBS might be an ideal tool to not only classify certain pathogens but also track the source and spread of these pathogens.
As natural and societal factors change, continued increases in the global flow of populations and materials could promote the global transmission of pathogenic microorganisms. Rapid and accurate diagnoses are vital to the battle against infectious human diseases, as these diagnoses can provide the basis for effective treatment and evidence to support the development of disease prevention strategies. MALDI-TOF MS will inevitably become the most common, accurate, high-throughput, and economical tool for the identification of the majority of pathogenic microorganisms. An ideal way to improve this identification approach is to develop a MALDI-TOF MS-based global platform that allows the merits of this system, such as its rapidity, accuracy, and applicability for high-throughput procedures, to be fully realized through data sharing.

Supporting Information
File S1 The original peak lists of the strains used in this study (excluding the highly pathogenic organisms. (ZIP)