MALDI-TOF mass spectrometry on intact bacteria combined with a refined analysis framework allows accurate classification of MSSA and MRSA

doi:10.1371/journal.pone.0218951

Table 1.

Experimental data used throughout the paper.

Each dataset consists of 10 MRSA and 10 MSSA samples.

More »

Expand

Fig 1.

MALDI-TOF positive ion mass spectra of MSSA (A) and MRSA (B) in the reflectron mode. Bacteria were grown on TSB medium, washed three times with double-distilled water and loaded into the MALDI target followed by the addition of the matrix directly on the bacteria.

More »

Expand

Fig 2.

Two workflows used in this paper.

(A) Workflow suggested by Gibb et al.[15] (B) Workflow that allows selecting important features and building a statistical model for supervised learning. The feature matrix generated from testing data is consistent with the feature matrix generated from training data when using this workflow.

More »

Expand

Fig 3.

Results based on the first workflow.

Dark circles: MSSA samples, grey circles: MRSA samples. Nodes with the same shape stand for the technical replicates. (A): Based on feature matrix extracted from MALDIquant, using correlation based distance metric in HC plot. (B): optimized binary matrix output from BinDA (using only top 100 features).

More »

Expand

Fig 4.

The impact of the top ranked features used on statistical model performance.

Training and testing datasets were created by splitting either datasets (A) or samples (B). Error bars were calculated by either mean±sd/√32 (A) or mean±sd/√80 (B).

More »

Expand

Fig 5.

Comparisons of top ranked features using both Jaccard Index (A-B) and MSE (C-D). The error bars for the points on the solid lines (control) were obtained by mean±sd/√1000, but they are two small to be observed. Also note that for A and B, the line plots for the two workflows are very close to each other. Training and testing datasets were created by splitting either biological replicates (A, C) or samples (B, D). Top ranked features were compared between I_BI and I_RF (I_BI*I_RF) or P_BI and P_RF (P_BI*P_RF). Solid bars were calculated by either mean±sd/√32 (A, C) or mean±sd/√80 (B, D).

More »

Expand

Fig 6.

Classification accuracy rates as a function of the number of datasets (A-B) or samples (C-D) used in training procedure: (A): One of seven datasets was tested, while a subset of the rest datasets were for training. We also made sure that the technical replicate of testing data was not included in training. (B): Similar in (A), but we do not align testing data according to the reference peaks extracted from training data. (C) 24 out of 80 samples were used for testing, where the 80 samples come from the 4 biological replicates. The way we split samples made sure that the biological replicates of the same sample did not appear in both training and testing data. (D) Procedure similar to the panel (A), but here we do not align testing data according to the reference peaks extracted from training data. Error bars in panels (C-D) were calculated by mean±sd/√40. Error bars were not shown in panels (A-B) because different number of testings were performed in each case.

More »

Expand

Fig 7.

Density plot (bandwidth = 50) of the top 100 ranked features based on 4 variations of the second workflow.

Here we pick 4 biological replicates from 7 datasets and repeatedly build model and rank features based on 3 biological replicates (A): I_BI, (B): I_RF, (C): P_BI, (D): P_RF.

More »

Expand