Cluster analysis on high dimensional RNA-seq data with applications to cancer research - An evaluation study

doi:10.1371/journal.pone.0219102

Table 1.

Summary of data sets.

More »

Expand

Table 2.

Contingency table.

More »

Expand

Fig 1.

Principal component analysis to visualize separation between subtypes.

Figures based on the first two principal components, where the subtypes are marked in different colors; Breast: ER+(red) and ER-(blue), Brain: IDHnocodel (red) and IDHcodel (blue), Kidney: type 1(red) and type 2(blue), Stomach: CIN (red) and MSI (blue).

More »

Expand

Fig 2.

Performance of clustering methods and selection methods.

Adjusted Rand index for clustering result compared to gold standard partition. Figure A shows results for the clustering methods, where each box contains observations from the five selection methods. Figure B shows results for the selection methods, where each box contains observations from the six clustering methods.

More »

Expand

Fig 3.

Sample size.

Boxplots of adjusted Rand index for different number of observations and a symmetric distribution of the subtypes. Each box contains mean adjusted Rand index values (taken over of all 30 clustering approaches) for 10 replicates.

More »

Expand

Fig 4.

Similarity between clustering approaches for different sample sizes.

Average pairwise adjusted Rand index (apARI) between clustering approaches for different number of observations.

More »

Expand

Fig 5.

Distribution of subtypes.

Adjusted Rand index for subtype fractions 10%– 50%. Each box contains mean adjusted Rand index values (taken over of all 30 clustering approaches) for 10 replicates.

More »

Expand

Fig 6.

Gender difference.

Adjusted Rand index for Brain, Kidney and Stomach when dividing samples by gender. All data sets had a symmetric distribution (i.e. 50% of each subtype).

More »

Expand