Table 1.
Summary of data sets.
Table 2.
Contingency table.
Fig 1.
Principal component analysis to visualize separation between subtypes.
Figures based on the first two principal components, where the subtypes are marked in different colors; Breast: ER+(red) and ER-(blue), Brain: IDHnocodel (red) and IDHcodel (blue), Kidney: type 1(red) and type 2(blue), Stomach: CIN (red) and MSI (blue).
Fig 2.
Performance of clustering methods and selection methods.
Adjusted Rand index for clustering result compared to gold standard partition. Figure A shows results for the clustering methods, where each box contains observations from the five selection methods. Figure B shows results for the selection methods, where each box contains observations from the six clustering methods.
Fig 3.
Boxplots of adjusted Rand index for different number of observations and a symmetric distribution of the subtypes. Each box contains mean adjusted Rand index values (taken over of all 30 clustering approaches) for 10 replicates.
Fig 4.
Similarity between clustering approaches for different sample sizes.
Average pairwise adjusted Rand index (apARI) between clustering approaches for different number of observations.
Fig 5.
Adjusted Rand index for subtype fractions 10%– 50%. Each box contains mean adjusted Rand index values (taken over of all 30 clustering approaches) for 10 replicates.
Fig 6.
Adjusted Rand index for Brain, Kidney and Stomach when dividing samples by gender. All data sets had a symmetric distribution (i.e. 50% of each subtype).