A MALDI-TOF MS database with broad genus coverage for species-level identification of Brucella

Brucella are highly infectious bacterial pathogens responsible for a severely debilitating zoonosis called brucellosis. Half of the human population worldwide is considered to live at risk of exposure, mostly in the poorest rural areas of the world. Prompt diagnosis of brucellosis is essential to prevent complications and to control epidemiology outbreaks, but identification of Brucella isolates may be hampered by the lack of rapid and cost-effective methods. Nowadays, many clinical microbiology laboratories use Matrix-Assisted Laser Desorption Ionization–Time Of Flight mass spectrometry (MALDI-TOF MS) for routine identification. However, lack of reference spectra in the currently commercialized databases does not allow the identification of Brucella isolates. In this work, we constructed a Brucella MALDI-TOF MS reference database using VITEK MS. We generated 590 spectra from 84 different strains (including rare or atypical isolates) to cover this bacterial genus. We then applied a novel biomathematical approach to discriminate different species. This allowed accurate identification of Brucella isolates at the genus level with no misidentifications, in particular as the closely related and less pathogenic Ochrobactrum genus. The main zoonotic species (B. melitensis, B. abortus and B. suis) could also be identified at the species level with an accuracy of 100%, 92.9% and 100%, respectively. This MALDI-TOF reference database will be the first Brucella database validated for diagnostic and accessible to all VITEK MS users in routine. This will improve the diagnosis and control of brucellosis by allowing a rapid identification of these pathogens.


Introduction
Brucella are important pathogens in medical and veterinary context. These Gram-negative bacteria can be transmitted from their animal reservoir to humans, usually by ingestion of contaminated milk products or direct contact, causing brucellosis. This zoonosis causes a severely debilitating illness characterized by intermittent fever, chills, sweats, weakness, myalgia, osteoarticular or obstetrical complications and endocarditis.
This disease is largely unreported and the true incidence of human brucellosis is thus unknown [1]. According to the World Health Organization (WHO), half a million new cases are reported each year, most of them in the poorest rural areas of the world [2]. Indeed, while the disease has been successfully prevented in most industrialized countries, it remains a significant burden in the Mediterranean region, all over Asia, sub-Saharan Africa, and certain areas in Latin America. Approximately half of the human population worldwide is considered to live at risk of exposure [3]. Moreover, due to the low dose required to cause infection (10-100 colony-forming units) and the potential for aerosol dissemination, Brucella was considered a potential bioterrorism agent early in the 20th century [1] and its possession and use is still strictly regulated in many countries.
Currently, the Brucella genus consists of eleven recognized species plus several isolates that have not yet been officially designated. The major zoonotic species are B. melitensis, B. abortus and B. suis which are subdivided into biovars by a set of phenotypic characteristics including lipopolysaccharide (LPS) epitopes, phage sensitivity, dye sensitivity and a battery of biochemical tests. These three species are also the most common in domestic livestock. B. melitensis is responsible for the majority of human cases in the Mediterranean basin, the Arab peninsula, Latin America countries and Asia, while B. abortus is more prevalent in the United States, Northern Europe and Africa [4]. B. suis and B. canis infections are more sporadic in humans. Very rare human infections have also been reported with B. inopinata [5,6], B. ceti [7,8] and B. neotomae [9,10].
Clinical microbiology laboratories play a key role in the diagnosis and management of human brucellosis and should be able to provide a rapid and exact identification of Brucella spp. Currently, the most suitable tool for identification of bacteria is Matrix Assisted Laser Desorption/Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS). This method provides rapid, sensitive and cost-effective identification and is currently replacing phenotypic microbial identification. Its accuracy however largely depends on the coverage of the database of the commercially available MALDI-TOF MS systems. With regards to Brucella, identification was not possible because this genus was not represented in the databases of the two main MALDI-TOF MS system manufacturers (i.e. bioMérieux and Bruker) [11][12][13]. Only the Bruker Security Relevant (SR) database, or custom databases developed in some laboratories, can identify these highly pathogenic bacteria, but access to these databases is not possible in some countries due to export restriction regulations [13][14][15]. Moreover, only B. melitensis is included in the SR database.

Bacterial strains
The bacterial strains used for the construction of the database are listed in Tables 1 and S1. Each of these strains was cultivated on several different media (S1 Table). The bacterial isolates used for the external evaluation and their culture conditions are listed in Tables 2 and S2. All strains used in this study were previously characterized using an established workflow (phenotypic assays, Multiple-Locus Variable number tandem repeat Analysis -or MLVA-, wholegenome sequencing) [16].

MALDI-TOF MS samples
Samples used to build the spectra database were prepared according to a previously established inactivation protocol [19] consisting in resuspending two full loops of bacteria (i.e. multiple colonies) in 200 μL of solvent mix, vortexing (10 sec), centrifuging (10,000 g, 2 min) at room temperature, removing 190 μL and resuspending in the 10 μL of solvent left in the tube. For the external evaluation study, this protocol was simplified by suspending only one loop of bacteria in 100 μL of solvent mix, vortexing (10 sec) and incubating at room temperature (20-25˚C, 3 minutes). Bacteria were efficiently inactivated by this method and the biomass concentration of the samples allowed identification by MALDI-TOF MS, demonstrating that the centrifugation step in the original protocol was not required.

MALDI-TOF MS analysis
One μL of each sample was applied to a single well of a disposable, barcode-labeled target slide (VITEK MS-DS, bioMérieux), overlaid with 1 μL of a saturated solution of alpha-cyano-4-hydroxycinnamic acid matrix in 50% acetonitrile and 2.5% trifluoroacetic acid (VITEK MSCHCA, bioMérieux) then air dried. For the database construction, several independent measurements were recorded for each strain (see S1 Table for the different culture  conditions).
For instrument calibration, an Escherichia coli reference strain (ATCC 8739) was directly transferred to designated spots on the target slide using the procedure recommended by the manufacturer.

Construction and optimization of the database
The database was built as previously described [20]. Briefly, peak lists were binned by assigning each peak within the mass range of 3.000-17.000 Da to one of 1,300 bins. A predictive model was then established for each species using the Advanced Spectra Classifier (ASC) algorithm developed by bioMérieux (La Balme les Grottes, France). The outcome of this procedure provided an assignment of a dimensionless weight for each bin and for each species. As a result, a specific pattern of weights for the 1,300 bins was obtained and combined for all species in a weighted bin matrix. For optimization, the spectral data were partitioned into 5 complementary subsets. One round of cross-validation involved a learning phase on 4 subsets ("training set") and a validation of the predictive model on the remaining subset ("testing set"). Five rounds of cross-validation were performed by permutation, and the results from the five rounds combined. To assess the accuracy of the database and calculate its performance in cross-validation, individual spectra were re-used as template for identification. The ASC algorithm compares the acquired spectrum to the specific pattern of each organism/organism group in the database and calculates a percent probability, or confidence value (%ID), which represents the similarity in terms of presence/absence of specific peaks between spectra. A perfect match provides a % ID of 99.9%. %ID >60 to 99.8% are considered as good. Scores <60% are considered to have no valid identification. The VITEK MS system renders the following types of identification results: "Single Choice", when the spectrum acquired presents a high level of similarity (%ID >60 to 99.9%) with only one specific pattern in the database; "Low discrimination", when the spectrum acquired presents a high level of similarity with 2 to 4 specific patterns in the database; or "No Identification", when the spectrum acquired either does not match with any pattern in the database, or presents a high level of similarity to more than 4 specific patterns. During cross-validation, identification was considered as correct when the result was consistent with the reference identification. Low discrimination results were considered as correct if the expected identification was included in the matches. A misidentification was defined as discordant organism identification between the cross-validation result and the reference identification.

Evaluation of performances by external validation
External spectra were generated from bacteria cultivated with different growth conditions (media, incubation time, etc) to mimic possible inter-laboratory variations. To reflect clinical laboratory practice, inactivated samples were spotted in duplicate, and analyzed with the updated database. If only one of the two spectra allowed a correct identification, the isolate was considered correctly identified. The cut-off for identification confidence was as described above.

Results
To update the MALDI-TOF MS VITEK database, we used 84 Brucella strains, either reference strains or well characterized clinical/veterinary isolates (Tables 1 and S1), to generate independent spectra covering the Brucella genus. After initial selection based on quality criteria such as peak resolution, signal to noise ratio, number of peaks, absolute signal intensity, and intraspecific similarity, 590 spectra were retained and submitted for biomathematical analyses using an iterative system (bioMérieux patented ASC algorithm).
Using an optimization process, we next evaluated the possibility to discriminate between different Brucella species and biovars. Discrimination between the different species was obtained, with the exception of B. ceti and B. pinnipedialis. These two species could not be clearly separated, as illustrated by the intertwining of their spectra on a dendrogram (Fig 1). Distinguishing the different biovars of B. melitensis and B. abortus was not possible. Discrimination between several of B. suis biovars was obtained (S1 After optimization, cross validation was performed to evaluate the performance of the updated database, which contains 37,902 spectra covering 1,095 bacterial species including Brucella. This mathematical method is used to assess how accurately the database can perform. Correct identification at the genus level was obtained in 97.29% of cases (Table 3). Importantly, the remaining 2.71% of results corresponded to "no ID", but never to an incorrect identification. At the species level, the performance varied between the different classes. For the three main zoonotic species (B. melitensis, B. abortus and B. suis), correct identification was obtained with 96.06%, 100% or 89.34% of spectra, respectively. Finally, as an external validation, the database was challenged with the MALDI-TOF spectra from 48 independent Brucella isolates, and 2 strains of Ochrobactrum, which are "near neighbors" of the Brucella genus (Tables 2 and S2).
The implemented database allowed correct identification at the genus level in 88.4% of cases, all the other results being "No-identification" but never misidentification as another genus (Tables 4 and 5). At the species level, the performances varied. For B. melitensis, B. abortus, and B. suis, correct identification was obtained for 100%, 92.3% or 100% of strains, respectively. It should be noted however that only one extra B. suis isolate was available to be tested in the external validation.
Interestingly, the rare clinical isolate 02/611, described as B. ceti-like after molecular characterization [7], was indeed identified within the B.ceti/B. pinnipedialis class. Also, both the Bullfrog (B13-0095) and the Australian rodent (NF2637) isolates were identified as B. inopinata, in agreement with previous work showing that these belong to the atypical Brucella clade of this genus [27,28]. The two isolates belonging to "B. abortus biovar 7", a rare biovar of this species, were identified as B. abortus. Finally, the recombinant 16M strain overexpressing the green fluorescent protein (GFP) was correctly identified as B. melitensis using this database. Moreover, using different culture conditions for 16M did not affect its identification by MALDI--TOF MS (S3 Table).

Discussion
A major asset of this MALDI-TOF MS database is its ability to identify Brucella isolates at the species level, which is essential for following epidemiological outbreaks. Obtaining such a resolution was very challenging for this genus, as highlighted in previous studies [29], because of the high similarity between species at the genetic level [30]. Discrimination between species was made possible using a patented approach to differentiate closely related species using internal calibration and a two-step algorithm. This was not sufficient to distinguish the two species of Brucella from marine mammals (B. ceti and B. pinnipedialis). This is in agreement with a recent Multi-Locus Sequence Analysis (MLSA) showing that the taxonomy is inconsistent with the phylogeny of these two species, and that taxonomic rearrangement should be envisaged [31]. This MALDI-TOF MS database is however able to discriminate eight different Brucella species, which include the most common in human or animal disease.
The updated database allowed correct identification of Brucella isolates at the genus level in 88.4% of cases. It is important to mention that none of them was identified as Ochrobactrum spp., a misidentification that is common with other standard identification methods [32][33][34] and recently reported using the VITEK MS database currently available [35]. Analysis at the species level gave only one discordant result, corresponding to cross identification between two Brucella species. Such result would have no consequence for human medicine, as identification at the genus level is sufficient to prescribe the appropriate treatment. As for all MAL-DI-TOF databases, the limitation of this system is its inability to identify non-clinically validated species or species not included in the database. However, the large coverage of the Brucella genus (in particular the most common species) in this database makes this risk is very minor. Diminution of the performance at the genus and/or species level was due to "no ID" results for some rare and/or atypical Brucella spp. (B. neotomae strain 5K33, B. microti strain Fig 1. Proximity of B. ceti and B. pinnipedialis MALDI-TOF MS spectra. Cluster analysis, using correlation-based dissimilarity, was performed to assess the discriminating power of MALDI-TOF MS between the spectra corresponding to the species B. ceti (in red) or B. pinnipedialis (in blue). The threshold of 50 common peaks, which is considered as a minimum for considering that spectra are different, is shown as a dotted line. https://doi.org/10.1371/journal.pntd.0006874.g001 Identification of Brucella by MALDI-TOF MS Table 3. Results of identification in cross validation studies. Identifications (ID) results, either at genus or species/class level, were classified as correct (either "Single Choice" or "Low Discrimination"), discordant or no-identification (No ID). This table gives the % of each type of identification results for the indicated species/classes using the updated database.  Table 4. Identification results in external validation. Identification (ID) results, either at genus or species/class level, were classified as correct, discordant or no-identification (No ID). This table gives the % of strains for which each type of identification result was obtained using the updated database. The number (n) of strains tested for each species/class is also indicated in parenthesis. N/A = not applicable.  CCM4915, and the rodent isolate NF2653), several strains from marine mammals, and the vaccine strain B. abortus RB51. These results were not due to the quality of MALDl-TOF spectra, which was good (based on the number of spectral peaks,  [31]. Importantly, the MALDI-TOF database allowed the correct identification as Brucella of several recently discovered "atypical" isolates [5,6,28,36]. These strains represent a serious problem for diagnosis laboratories, as they are not identified as Brucella using classical phenotypic tests. It is possible that similar strains have been isolated in the past but misidentified. Very little is known concerning the ability of these new species to cause disease in humans or livestock. The possibility to identify these isolates as Brucella will thus be important for both human and animal health.

IDENTIFICATION AT THE GENUS
Overexpression of an exogenous protein (GFP) did not affect the identification of B. melitensis 16M. This is important since recombinant Brucella strains are common tools in research laboratories and could potentially infect lab workers. Moreover, the use of such Brucella strains as vaccines was proposed, since the presence anti-GFP antibodies would allow distinguishing vaccinated animals from naturally infected ones [37].
In conclusion, this updated MALDI-TOF MS database is a new diagnostic tool that allows the identification of Brucella. It combines precision of identification (broad coverage of the Brucella genus together with species-level identification) and widespread availability. After integration in the VITEK MS (v3.2), this will be the first Brucella database validated for diagnostic with CE accreditation and accessible to all users in routine. This will allow accurate diagnosis and timely treatment in brucellosis. These highly infectious pathogens also causing one of the most frequent laboratory-acquired infection [38], their rapid identification by MALDI-TOF MS will decrease the risk of accidental infection of laboratory workers. A paradox of global health however is that the countries where brucellosis is endemic may not have access to MALDI-TOF MS. This could be circumvented by the use of the in-tube inactivation Identification of Brucella by MALDI-TOF MS method described earlier [19], which will allow the shipment of erstwhile infectious samples to mass spectrometry platforms.
Supporting information S1 Fig. Discrimination between the different B. suis biovars. Multidimensional Scaling (MDS) analysis of MALDI-TOF spectra obtained with B. suis isolates. The similarity between spectra is represented as distances, which depend on the presence/absence of peaks and their intensity in compared spectra. Results are presented on the three first dimensions. The color code used for each biovar is indicated in the figure.