RAId_aPS: MS/MS Analysis with Multiple Scoring Functions and Spectrum-Specific Statistics

doi:10.1371/journal.pone.0015438

Figure 1.

Illustration of APP mass grid with internal structure.

In addition to show the basic mass grid, this figure illustrates,using the peptide lengths as an example, the possibility of including additional structures in the (raw) score histogram associated with each mass index. The basic idea of obtaining the score histogram via dynamic programming is explained in the Method section. The key step to incorporate additional structure is to let the (weighted) count associated with each (raw) score be further categorized by the lengths of partial peptides reaching each mass index. In the end, one will apply the length correction factor to the raw score to obtain the real score histogram. Apparently, one may also keep track of the number of () peaks accumulated within the raw score histogram. Again, the factorial contribution can be added at the end prior to the construction of the final score histogram.

More »

Expand

Figure 2.

Example processed spectra from different scoring functions versus the original spectrum.

The centroid spectrum used has a parent ion mass of Da. In panel (A), the original spectrum is displayed; (B) shows the processed spectrum generated by the filtering protocol of RAId_DbS scoring function; (C) exhibits the processed spectrum generated by the filtering protocol of K-score; while (D) and (E) correspond respectively to the processed spectra produced by XCorr and Hyperscore.

More »

Expand

Figure 3.

Histograms of correlations between filtering strategies.

Used in this plot are raw centroid spectra from the ISB data set [28]. Each raw spectrum will have four different processed spectra come from each of the four different filtering strategies. The mass fragments of every filtered spectrum are then read to a mass grid. The spectrum is then viewed as a vector with non-vanishing components only at the populated component/mass indices. One then normalizes each filtered spectrum vector to unit length. An inner product of any two filtered spectral vectors represents the correlation between them. When the spectral quality does not pass a method-dependent threshold, the corresponding filtering protocol may turn the raw spectrum into a null spectrum without further searching the database. For a given pair of filtering methods and a raw spectrum, if each of the two filtering methods produces a nonempty filtered spectrum, one may turn those filtered spectra into spectral vectors and compute their inner product, i.e., their correlation. For each pair of filtering methods, these inner products are accumulated and plotted as a correlation histogram. All six pairwise combinations are shown.

More »

Expand

Figure 4.

Score correlations.

A subset of the ISB centroid data set [28] was used to perform this evaluation. For each scoring function, when the best hit per spectrum (analyzed using the analysis program that the scoring function was originally used for) is a true positive, that candidate peptide is scored again using the corresponding scoring function implemented in RAId_aPS. Each true positive best hit thus gives rise to two scores and plotted using the following rule: the first score is used as the ordinate while the second score (from RAId_aPS) is used as the abscissa. Including spectra, panel A is for the RAId score. Panel B is for Hyperscore and contains spectra. The result of K-score is shown in panel C with spectra. Shown with spectra, panel D documents the results for XCorr.

More »

Expand

Figure 5.

E-value accuracy assessment.

The agreement between the reported -value and the textbook definition is examined using centroid data (A1–A4 subsets of ISB data set). The random database size used is 500 MB. The molecular weight range considered while searching the database is . In each panel, the dashed lines, corresponding to and , are used to provide a visual guide regarding how close/off the experimental curves are from the theoretical curve.

More »

Expand

Figure 6.

ROC curves for the centroid data (A1–A4 of the ISB data set [28]).

For each of the four scoring functions considered, a set of ROC curves is shown. These ROC curves include the results from running the designated program associated with that scoring function, the results from running RAId_aPS in the database search mode, and the results from combining with each of the three other scoring functions. Panel (A) shows the results from RAId score, whose designated program is RAId_DbS. Panel (B) displays the results from K-score, whose designated program is X!Tandem. Panel (C) exhibits the results from XCorr, which is mostly employed by SEQUEST. Panel (D) presents the results from Hyperscore, whose designated program is also X!Tandem. Instead of using only XCorr (like RAId_aPS), SEQUEST first selects the top candidates using SP score. As shown in panel (C), for centroid data there is an advantage to filtering candidates with the SP score. However, it is also seen that by combining XCorr with either RAId score or Hyperscore, equally good results can be attained without introducing the SP score heuristics.

More »

Expand

Figure 7.

ROC curves for the centroid data (A1–A4 of the ISB data set [28]) when considering only the best hit per spectrum.

For each of the four scoring functions considered, a set of ROC curves is shown. These ROC curves include in the consideration only the best hit per spectrum from running the designated program associated with that scoring function, the best hit per spectrum from running RAId_aPS in the database search mode, and the best hit per spectrum from combining with each of the three other scoring functions. Panel (A) shows the results from RAId score, whose designated program is RAId_DbS. Panel (B) displays the results from K-score, whose designated program is X!Tandem. Panel (C) exhibits the results from XCorr, which is mostly employed by SEQUEST. Panel (D) presents the results from Hyperscore, whose designated program is also X!Tandem. Instead of using only XCorr (like RAId_aPS), SEQUEST first selects the top candidates using SP score. As shown in panel (C), for centroid data there is advantage to filter candidates with the SP score. However, it is also seen that by combining XCorr with either RAId score or Hyperscore, equally good results can be attained without introducing the SP score heuristics.

More »

Expand

Table 1.

An output example of the combined E-value from RAId_aPS.

More »

Expand

Figure 8.

Illustration of RAId_aPS performance when combining three different scoring functions.

Panel (A) shows the results from the profile data (NHLBI data set [4]), while panel (B) exhibits the results from the centroid data (A1–A4 of the ISB data set [28]). Panel (C) shows the results from the profile data but keeping only the best hit per spectrum, while panel (D) exhibits the results from the centroid data but keeping only the best hit per spectrum.

More »

Expand

Figure 9.

Example score PDF (normalized histogram) output by RAId_aPS.

An MS spectrum of parent ion mass Da is queried with default parameters, and the resulting score PDF for RAId, K-score, XCorr, and Hyperscore are shown respectively in panels A, B, C, and D. The number of APP within 3Da of parent ion mass is about .

More »

Expand

Figure 10.

Example of reanalyzing output files from other search engine by combining with statistical significance assignment from RAId_aPS.

In this example, we use the Mascot output files resulting from querying profile spectra (panel (A), the NHLBI data set) and centroid spectra (panel (B), A1–A4 of the ISB data set [28]) to the NCBI's nr database with proteins highly homologous to those that were present in the mixture removed. Since each data set is from a known mixture of proteins, it is possible to remove the proteins homologous to the true positives from the nr database. We then combine the calibrated -value [4] of Mascot with the -value obtained from RAId_aPS when either RAId score, Hyperscore, K-score or XCorr is used.

More »

Expand

Table 2.

Example retrieval tests based on FDR.

More »

Expand