COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms

doi:10.1371/journal.pcbi.1012275

Fig 1.

COPS benchmark workflow.

Starting from multi-omic data, ANF, iCluster+ and IntNMF yield clustering results directly, MOFA is used to acquire lower dimensional representation that is clustered with standard algorithms, while MKKM-MR is used to fuse kernels before applying kernel k-means. For pathway-based kernels, cancer-associated KEGG pathways were used to define multiple pathway-kernels for each omic. Finally, clusters were evaluated with respect to clustering stability, survival relevance, and accuracy, as well as their trade-offs.

More »

Expand

Table 1.

List of benchmarking dataset from TCGA and the gold-standard subtype as well as survival type considered (OS–overall survival; PFI—Progression-free interval).

More »

Expand

Fig 2.

Stability of cancer clustering results.

The boxplots show Jaccard-index-based stability for different multi-omics clustering approaches and different number of clusters (k) across 10 repeats of 5-fold cross-validation. The box and middle line represent the second and third quartiles and the median while the whiskers extend to the maximum value or 1.5 times inter-quartile range from the box edges.

More »

Expand

Fig 3.

Cluster survival results.

The boxplots show the p-value of a likelihood-ratio test between Cox PH models such that known covariates are accounted for. Values for different multi-omics clustering approaches and different number of clusters (k) across 10 repeats of 5-fold cross-validation. The box and middle line represent the second and third quartiles and the median while the whiskers extend to the maximum value, or 1.5 times inter-quartile range from the box edges. The red line represents the threshold p = 0.05.

More »

Expand

Fig 4.

Cancer subtype agreement.

The boxplots show the adjusted rand index (ARI) between the clusters and gold-standard subtypes on the y-axis. For each dataset only the clustering result corresponding to the known number of subtypes is shown for the considered multi-omics clustering approaches across 10 repeats of 5-fold cross-validation. The box and middle line represent the second and third quartiles and the median while the whiskers extend to the maximum value or 1.5 times inter-quartile range from the box edges.

More »

Expand

Fig 5.

Multi-objective clustering result evaluation.

Each point represents a clustering result obtained using different methods and numbers of clusters. The x and y axis represent the median survival p-value and clustering stability Jaccard index calculated across resampled datasets. The line connects the set of non-dominated results.

More »

Expand

Table 2.

List of methods available in COPS grouped by type of method and the required inputs which correspond to prior biological knowledge that is integrated. Different feature extraction and clustering algorithms can be used in multiple configurations.

More »

Expand

Table 3.

Lists of metrics available in COPS, grouped by metric type and purpose for which they can be applied.

Metrics reported in bolds were used in this study. The metrics highlighted in bold were employed in this study.

More »

Expand