Skip to main content
Advertisement

< Back to Article

Fig 1.

Illustration of heterogeneous cluster structures in two contexts (datasets).

Each context corresponds to different data source (gene expression, ribosome profiling, proteomics etc.) describing the same set of biological samples. In the first context, there are two distinct clusters on the local level. In the second context, there is only a single local cluster. From the overall perspective, there are two global clusters defined by the combined behaviour across the two contexts.

More »

Fig 1 Expand

Fig 2.

Example of the simulated data for p = 0 and p = 0.5, which show different degrees of dependence.

The x axis corresponds to the data in the first dataset (context), the y axis represents the data in the second dataset (context). The two subfigures show the two extreme situations: (a) For p = 0, we get two global clusters. Cluster membership is fully dependent on each other in both datasets. (b) For p = 0.5, we get four global clusters, where cluster membership in one dataset is fully independent on cluster membership in the second dataset.

More »

Fig 2 Expand

Fig 3.

ARI comparing global clustering of simulated datasets for varying values of p (see Fig 2).

Each point corresponds to the corresponding algorithm applied to one dataset, the plot shows also the loess curve for each method. Higher values correspond to better agreement between the estimated cluster assignments and the true cluster membership.

More »

Fig 3 Expand

Fig 4.

ARI comparing global clustering of simulated datasets with misspecified number of clusters for varying values of p, when we set the number of global clusters to 5.

Each point corresponds to a corresponding algorithm applied to one dataset, the plot shows also the loess curve for each method. Higher values correspond to better agreement between the estimated cluster assignments and the true cluster membership.

More »

Fig 4 Expand

Fig 5.

Sizes of global clusters identified in the breast cancer dataset from TCGA, using the model with 3 context-specific clusters and up to 18 global clusters.

More »

Fig 5 Expand

Fig 6.

Survival curves for global clusters in the breast cancer dataset from TCGA, using the model with 3 context-specific clusters and up to 18 global clusters.

The differences between the survival curves are significant with p = 0.0382 using the log-rank test.

More »

Fig 6 Expand

Fig 7.

PCA projection of the global clusters in individual contexts in the breast cancer dataset from TCGA, from the model with 3 context-specific clusters and up to 18 global clusters.

The colours correspond to the colours used in the survival curves in Fig 6. (a) Gene expression context. (b) DNA methylation context. (c) miRNA expression context. (d) RPPA context.

More »

Fig 7 Expand

Fig 8.

PCA projection of the local clusters identified in individual contexts in the breast cancer dataset from TCGA, from the model with 3 context-specific clusters and up to 18 global clusters.

(a) Context 1 represents the gene expression dataset which contains three local clusters. (b) Context 2 represents the DNA methylation dataset and contains two local clusters. (c) Context 3 represents the miRNA expression dataset with only 1 cluster. (d) Context 4 corresponds to the RPPA dataset, which contains three local clusters.

More »

Fig 8 Expand

Fig 9.

PCA projections of the global clusters identified in individual contexts in the breast cancer dataset from TCGA.

The two highlighted clusters differ only in the gene expression context but they are merged in the other contexts. (a) Context 1 represents the gene expression dataset where the two clusters are separate. (b) Context 2 represents the DNA methylation dataset. (c) Context 3 represents the miRNA expression dataset. (d) Context 4 corresponds to the RPPA dataset.

More »

Fig 9 Expand

Fig 10.

Survival curves for clusters in Fig 9.

The highlighted clusters have different survival probabilities with p = 0.012 under the log-rank survival model.

More »

Fig 10 Expand

Fig 11.

Log likelihoods (a) and survival p-values (b) of models with different numbers of global clusters.

The first significant difference in survival corresponds to the model with the highest log likelihood.

More »

Fig 11 Expand

Fig 12.

Average number of occupied clusters across different numbers of global clusters.

The number of clusters is the average of the posterior number of global clusters that have any samples assigned to them across the MCMC iterations. The figure shows both the total number of occupied clusters and the number of clusters that have more than 5 samples assigned to them.

More »

Fig 12 Expand

Fig 13.

Consistency between global clustering results for different number of global clusters with 3 context-specific clusters, as measured by the ARI.

More »

Fig 13 Expand

Fig 14.

Consistency between local clustering results for different number of global clusters with 3 context-specific clusters, as measured by the ARI.

The ARI values show several local optima. (a) Gene expression context. (b) DNA methylation context. (c) miRNA context. (d) RPPA context.

More »

Fig 14 Expand

Fig 15.

Consistency between global clustering results for different number of local context-specific clusters, as measured by the ARI.

The compared models were trained with 18 global clusters and 3 to 5 context-specific clusters.

More »

Fig 15 Expand

Fig 16.

Deviance information criterion (DIC) as a method for selecting number of clusters.

The plot shows the DIC for a range of numbers of global clusters when the number of local clusters is set to three. The DIC is minimized for 18 global clusters.

More »

Fig 16 Expand

Fig 17.

Consistency of results and survival p-values for clusters identified in the lung cancer dataset, with a range of numbers of global clusters and 3 local clusters.

(a) Consistency of results with respect to the ARI between different settings of numbers of global clusters. (b) p-values corresponding to the different numbers of global clusters.

More »

Fig 17 Expand

Fig 18.

Deviance information criterion (DIC) for selecting number of clusters in the lung cancer dataset.

The plot shows the DIC for a range of numbers of global clusters when the number of local clusters is set to three. The DIC is minimized for 53 global clusters.

More »

Fig 18 Expand

Fig 19.

Consistency of results and survival p-values for clusters identified in the kidney cancer dataset, with a range of numbers of global clusters and 3 local clusters.

(a) Consistency of results with respect to the ARI between different settings of numbers of global clusters. (b) p-values corresponding to the different numbers of global clusters.

More »

Fig 19 Expand

Fig 20.

Deviance information criterion (DIC) as a method for selecting number of clusters in the kidney cancer dataset.

The plot shows the DIC for a range of numbers of global clusters when the number of local clusters is set to three. The DIC is minimized for 16 global clusters.

More »

Fig 20 Expand

Fig 21.

Illustration of the concepts of global and local clusters.

The first dataset contains two clusters 1 and 2, the second dataset contains three clusters, A, B and C. The combined structure contains six potential global clusters that correspond to combinations of assignments on the local context level.

More »

Fig 21 Expand

Fig 22.

Graphical model representation of the fully combinatorial context-dependent clustering model.

More »

Fig 22 Expand

Fig 23.

Graphical model representation of the decoupled context-dependent integrative clustering model.

More »

Fig 23 Expand