Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Flowchart of the IDFO approach.

This flowchart contains three basic steps: (i) PRP ranking of molecular features, (ii) model construction and (iii) feature optimization and validation.

More »

Fig 1 Expand

Fig 2.

An overall scheme of the prediction pipeline.

This pipeline contains four mainly procedures: I) Data processing. We assembled a collection of 3198 tumor samples in five types of human cancers, which composed of four types of molecular data including: lncRNA, microRNA, mRNA, and DNA methylation. Each type of molecular data in respective cancers was processed into array based matrix using CAPM preprocessing methods. II) Feature ranking. Molecular features associated with prognosis were analyzed and sorted according to the PRP algorithm. For each type of molecular data, we selected top-weighted 100 signatures as the initial feature sets in each of the five cancers, respectively. III) Model building and feature selection. In this process, we adopted five classifiers in combination with four feature selection algorithms to establish the prognosis prediction baseline. During the feature selection process, each group of features was trained with 10,000 times randomization and the best performing feature panel with highest bootstrap accuracy was chosen for each molecular data per cancer. IV) Validation. To evaluate the predictive power of each molecular data, the best performing features were further applied to independent test in each cancer cohorts, respectively.

More »

Fig 2 Expand

Fig 3.

The performances of diverse molecular signatures in training (A-E) and testing (F) data sets across five TCGA cancers.

(A-E) Best predictive models of each molecular data of five human cancers across different feature selection strategies (columns indicated feature selection strategies: LASSO, PFS, SVM-RFE, RF-IS, and MCS; rows indicated molecular signatures), (F) Test set accuracies of four types of molecular signatures in five TCGA cancers (rows indicated molecular data types, columns indicated cancer types). In order to distinguish the results between training and testing, we utilized blue-colored items for training results (Fig 3A–3E) and red-colored items for testing results. * BRCA = breast invasive carcinoma; COAD = colon adenocarcinoma; LUSC = Lung squamous cell carcinoma; UCEC = Uterine Corpus Endometrioid Carcinoma; OV = Serous cystadenocarcinoma.

More »

Fig 3 Expand

Fig 4.

Comparison of the predictive performance of integrated multi-type molecular data and single molecular data in cancer outcome stratification.

(A) BRCA (Noverlaps = 178), (B) COAD (Noverlaps = 161), (C) LUSC (Noverlaps = 97), (D) OV (Noverlaps = 145), (E) UCEC (Noverlaps = 84). For the respective models in each type of cancer, we performed 10,000 times of random splitting with 2/3 training and 1/3 testing using IDFO pipeline. The dotted red box indicated the significantly improved performance of two integrated models in (D) OV and (E) UCEC compared with individual data type models (two-sided Wilcoxon signed rank test, P < 0.01); the dotted blue box indicated the three individual data type models of mr, lnr and meth. The integrated group are composed of both double-combination and triple-combination molecular signature models. Individual group contained the three individual molecular data type models. The gray line across seven boxes shows the predictive patterns of integrated groups and individual groups. Noverlaps is the number of overlap sample occurred in all three molecular data profiles (mRNA, lncRNA and DNA methylation), lnr = lncRNA, mr = mRNA, meth = DNA methylation, mr+lnr = mRNA + lncRNA, mr+meth = mRNA + DNA methylation, lnr+meth = lncRNA + DNA methylation, mr+lnr+ meth = mRNA + lncRNA +DNA methylation.

More »

Fig 4 Expand

Fig 5.

Survival analysis on IDFO predictors of four types of molecular data in five cancers.

The Kaplan-Meier overall survival curves of two outcome groups classified by MCPHR models using IDFO-identified predictors of each molecular data of each cancer. (a) the BRCA lncRNA cohort; (b) the BRCA DNA methylation cohort; (c) the BRCA microRNA cohort; (d) the BRCA mRNA cohort; (e) the COAD lncRNA cohort; (f) the COAD DNA methylation cohort; (g) the COAD microRNA cohort; (h) the COAD mRNA cohort; (i) the LUSC lncRNA cohort; (j) the LUSC DNA methylation cohort; (k) the LUSC microRNA cohort; (l) the LUSC mRNA cohort;(m) the OV lncRNA cohort; (n) the OV DNA methylation cohort; (o) the OV microRNA cohort; (p) the OV mRNA cohort;(q) the UCEC lncRNA cohort; (r) the UCEC DNA methylation cohort; (s) the UCEC microRNA cohort; (t) the UCEC mRNA cohort. The difference in outcome of two outcome groups was tested using Kaplan-Meier survival analysis. Likelihood ratio = the likelihood ratio test.

More »

Fig 5 Expand