Proteomics biomarker discovery for individualized prevention of familial pancreatic cancer using statistical learning

doi:10.1371/journal.pone.0280399

Fig 1.

Averaged ROC curves of test results of the first simulation experiment.

The shaded areas represent the one standard deviation interval of the corresponding methods, estimated via 100 realizations.

More »

Expand

Fig 2.

Summary of numbers of selected variables and selection rates of relevant variables.

(A) Numbers of selected variables per simulation in 100 realizations. (B) selection rates of relevant variables per simulation in 100 realizations.

More »

Expand

Fig 3.

Number of selected stable variables in all scenarios using adaptive lasso and glmboost.

The colors represent the number of jittered points (red: Low density, black: High density). A total number of 100 realizations of simulated datasets were drawn for each scenario.

More »

Expand

Fig 4.

Summary of RVDR of each subset in all scenarios using adaptive lasso and glmboost.

The colors represent the number of subsets resulting in at least one stable variable (blue: Low, orange: High). A total number of 100 realizations of simulated datasets were drawn for each scenario.

More »

Expand

Fig 5.

Averaged ROC curves of the prediction performance glmboost (blue) and gamboost (orange).

The averaged ROC curves were estimated based on 40 subsamples generated by the repeated stratified 4-fold cross-validation in the three comparisons of the FaPaCa sample. The shaded areas represent the one standard deviation intervals. At the right bottom corner, the average AUCs and their standard deviations are shown.

More »

Expand

Fig 6.

Number of selected assays by glmboost and gamboost in all scenarios.

Each boxplot summarizes the results of 40 subsamples generated by repeated stratified 4-fold cross-validation in the respective scenario L-HisSig (red), w.o-HisSig (green) and w.o-L (yellow).

More »

Expand

Fig 7.

Summary of the stability selection results using glmboost and gamboost.

The stability selection results using glmboost (bottom) and gamboost (top) among the scenarios L-HisSig (red), w.o-HisSig (green) and w.o-L (yellow). The grey line represents the corresponding cut-off level under the assumption of a unimodal distribution. The assays with selection probability higher than (black font) or slightly below the cut-off (red font) are annotated. The results of stability selection are summarized in S2 Table.

More »

Expand

Fig 8.

The estimated probability of being classified as ‘L’ in scenario w.o-L using PCSK9 with gamboost.

Blue and orange solid lines represent the prediction results of the fitted gamboost and the bootstrapped mean prediction results, respectively. The dotted lines describe the bootstrapped 95% confidence interval. The left plot shows the observation of status w.o and the right plot the observation of status L.

More »

Expand

Table 1.

The estimated odds ratios of FGF-BP1, PCSK9, PLA2G7, and MSLN via linear base-learners in each scenario.

More »

Expand

Fig 9.

The estimated classification results and the standard errors using gamboost in scenarios L-HisSig and w.o-HisSig.

(A) and (C): The estimated classification results by gamboost in scenarios L-HisSig and w.o-HisSig. The colors represent the tendency of prediction results for the lesion status (blue = w.o, yellow = L and brown = HisSig). The contours indicate the estimated response with an interval width of 0.2. (B) and (D): The estimated standard errors of classification results in both scenarios obtained by bootstrapping. Red regions represent high estimated standard errors (low certainty). The contours represent the estimated standard errors with an interval width of 0.1.

More »

Expand