Which Is a More Accurate Predictor in Colorectal Survival Analysis? Nine Data Mining Algorithms vs. the TNM Staging System

doi:10.1371/journal.pone.0042015

Table 1.

Variables Available for Analysis.

More »

Expand

Table 2.

Nine algorithms used in the construction of prediction models.

More »

Expand

Figure 1.

The optimization and prediction system.

SEER dataset prediction result A represents the 9*2 predictive results trained by nine data mining algorithms together with two variable selection methods and tested on the SEER testing dataset with all 20 variables. SEER prediction result B represents the 9*2 predictive results tested on the SEER testing dataset with 14 variables supported by both SEER and CMU-SO datasets. CMU-SO prediction result represents the 9*2 predictive results tested on the CMU-SO testing dataset with 14 variables supported by both SEER and CMU-SO datasets.

More »

Expand

Figure 2.

The ROC curve from two different testing datasets.

A. Comparison of the predictive accuracy of three prognostic models: ANFIS together with GA, NB together with BSFS and the AJCC 7^th TNM staging system using SEER testing dataset with 14 variables as a testing dataset. B. Comparison of the predictive accuracy of three prognostic models: LR together with BSFS, ANFIS together with GA and the AJCC 7^th TNM staging system using the CMU-SO testing dataset as a testing dataset.

More »

Expand

Table 3.

AUC^a calculated by testing prediction models on SEER^b.

More »

Expand

Table 4.

AUC^a calculated by testing prediction models on CMU-SO^b.

More »

Expand

Figure 3.

Survival rates at eight risk levels.

Comparative result between predictive survival rates to the real-world survival rates at eight different risk levels. The predictive survival rate is based on a predictive model built by LR together with BSFS.

More »

Expand