Figure 1.
(a) Breast cancer data heatmap sorted by PSDF outcome compared with another integration method iCluster, and the PAM50 subtypes based on expression alone.
Features are ranked by their probability of uses in the MCMC sampling from high to low respectively for copy number and expression features, as indicated on the left. (b–d) Posterior similarity matrices (red: high posterior probability between patient samples; blue: low posterior probability).
Figure 2.
Kaplan-Meier survival curves of PSDF, iCluster, and PAM50 results with their
-values (log-rank test) for breast cancer specific survivals.
Figure 3.
Network modules and enrichment maps as part of the functional follow-up analysis for the breast cancer subtypes: (A) Subtype-specific network modules for PSDF 1 and 2.
The node color in the network modules indicates the type of alterations relative to this cluster: red - copy number gain or over-expression, green - copy number loss or under-expression. The shape of nodes indicates the type of data: square - copy number, round - expression. (B) the KEGG pathway enrichment maps for PSDF 1 and 2. The node colors indicate the significance of enrichment result and the thickness of the edges indicates the amount of overlaps between pathways.
Figure 4.
(a) Prostate cancer data heatmap sorted by PSDF outcome comparing with another integrative clusteringmethod iCluster and the TS subtypes based on copy number data alone.
Features are ranked by their probability of uses in the MCMC sampling from high to low respectively for copy number and expression features, as indicated on the left. Color codes for the heatmap are the same as in Fig.1(b–d) Posterior similarity matrices (red: high posterior; blue: low posterior).
Figure 5.
Comparison of prostate cancer data clustering result from our method to that from iCluster and TS subtypes using survival curves and
-values (log-rank test) for biochemical recurrence, as well as the distribution of Gleason grade (GG) as an important prognostic factor of prostate cancer.
Figure 6.
Prostate cancer subtype-specific network modules and enrichment maps: (a–b) Subtype-specific network modules for PSDF 7 and 5.
The node color in the network modules indicates the type of alterations relative to this cluster: red - copy number gain or over-expression, green - copy number loss or under-expression. The shape of nodes indicates the type of data: square - copy number, round - expression. (c–d) KEGG pathway enrichment maps for PSDF 7 and 5 module genes. The node colors indicate the enrichment significance and the thickness of the edges indicates the amount of overlaps between pathways.
Figure 7.
Graphical representation of the PSDF model presented in this paper.
The indicator variables allow the model to perform data fusion on a sample-by-sample basis, defining the states fused (
) and unfused (
). The prior probability of fusion is defined by
and is set in all cases to
for the results in this paper. The
parameters are binary switches that select individual features in each data set. The number of clusters is given by the number of unique values assigned to the
variables, which denote cluster membership in a given context. The
parameters are mixture weights for the Dirichlet Processes and are marginalised analytically.
and
are concentration hyperparameters for the Dirichlet Processes and are sampled as part of the MCMC procedure.
Table 1.
Results from the simulation study.