TSCCA: A tensor sparse CCA method for detecting microRNA-gene patterns from multiple cancers

doi:10.1371/journal.pcbi.1009044

Fig 1.

Illustration of TSCCA to identify cancer-related miRNA-gene functional modules.

(A) Prepare the matched miRNA and gene expression data of 33 cancer types from TCGA. (B) Compute a cancer-miRNA-gene Pearson correlation tensor , where p, q and M represent the number of genes, miRNAs and cancers respectively. (C) Estimate multiple sparse latent factors (u_i, v_i and w_i, i = 1, ⋯, r) and these non-zero genes in u_i, non-zero miRNAs in v_i and non-zero cancers in w_i are considered as a cancer-miRNA-gene module.

More »

Expand

Fig 2.

Application to the TCGA data from multiple cancers.

(A) Number of cancer patients or samples on 33 cancer types from TCGA in this study. (B) Correlation between the modularity scores of identified modules (y-axis) and the corresponding singular values (objective function values) (x-axis) with PCC r = 0.98. (C) Distribution of modularity scores. The modularity scores of identified modules are significantly greater than those of random ones (Permutation test P < 0.05/50 for each identified module). (D) Among the 1793 genes from all the identified modules, 328 are reported to be related with cancer (Hypergeometric test P = 1.47e-06). (E) Among the 122 miRNAs from all the identified modules, 73 are reported to be related with cancer (Hypergeometric test P = 3.38e-03).

More »

Expand

Fig 3.

Heatmap of cancer-miRNA-gene modules identified by TSCCA in the TCGA dataset.

The top half of (A) corresponds to the module 1 (row corresponds to gene, column corresponds to miRNA) and the lower part of (A) is a random module for comparison. Similar setting is used for module 2 and module 5 in (B) and (C) respectively. (A), (B) and (C) show three different co-expression patterns.

More »

Expand

Fig 4.

Heatmap showing W, which is the output matrix of Algorithm 2 (See S1 Text), when it was applied to the TCGA data.

Each column corresponds to a module and each row corresponds to a cancer type and |W_ij| reflects the co-expressed intensity of between the genes and the miRNAs within the module j on the cancer i. A hierarchical clustering method was used to cluster the rows (cancer types) into four clusters.

More »

Expand

Fig 5.

Illustration of two cancer-miRNA-gene modules identified by TSCCA in the TCGA dataset.

The results on module 1 are shown in (A), (B), (C) and (D), while the results on module 4 are shown in (E), (F), (G) and (H). (A) Bar plot showing modularity scores of module 1 and a random one for different cancer types. (B) Top enriched GO BP terms on the genes within module 1. (C) Cancer gene enrichment, gene-gene interaction enrichment and miRNA-gene interaction enrichment of module 1 and the corresponding P-values were computed using the right-tailed hypergeometric test. (D) Largest connected miRNA-gene subnetwork of module 1 (including 7 miRNAs and 84 genes and 538 edges), where the miRNAs directly regulate 21 genes and the 21 genes regulate 63 other genes. Similar setting was used for module 4 in (E), (F), (G) and (H). (H) Largest connected miRNA-gene subnetwork of module 4 (including 7 miRNAs and 75 genes and 309 edges), where the miRNAs directly regulate 24 genes and the 24 genes regulate 51 other genes.

More »

Expand

Fig 6.

Statistical analysis of PCCs of module miRNAs/genes using permutation test.

(A) The average of absolute gene-gene PCCs of the genes within each module (Permutation test P < 0.01). (B) The same results about miRNAs.

More »

Expand

Fig 7.

Survival analysis of modules.

(A) showing a bipartite graph between the identified modules and the different cancer types based on these -log₁₀(BH adjusted P-value). For each identified module and each cancer within the module, we first extracted the first principal component (PC1) based on the expressed matrix of both miRNAs and genes within the module from the cancer type. We then divided the samples from the cancer type into two groups based on the median value of PC1 and a P-value was compute using log-rank test. In the graph, we only kept these edges/relationships between the modules and cancer types with adjusted P < 0.05. (B) Some cancer-miRNA-gene modules relate to survival time. For a given cancer type and a given module, the Kaplan-Meier survival curves were drawn for each group, and “+” denotes the censoring patient. Each sub-figure corresponds to a module and a cancer type. For example, Module 11 has a significant P = 3.2e-09 for LGG (cancer type), written as “M11-LGG, P = 3.2e-09”.

More »

Expand

Fig 8.

Comparison of results from different algorithms on the simulated data and TCGA data.

(A) A synthetic miRNA-gene correlation tensor , which contains four matrices with the same number of genes (rows) and miRNAs (columns), and includes three true modules framed by rectangular boxes of different colors. The shuffled is as the input of tested methods by shuffling the genes (rows) and miRNAs (columns) of . (B) Comparison of different methods in terms of CE ± std and Recovery ± std on the simulated data. The Recovery and CE scores are computed based on generated repeatedly.

More »

Expand