Skip to main content
Advertisement

< Back to Article

Fig 1.

Overall pipelines.

(A) X1 are the renormalized gene expression profiles and X5 is the reconstructed gene expression profiles from the latent features. N is the number of genes, and M is the number of samples. The deep autoencoders were trained using the renormalized gene expression profiles of luminal-A breast cancer in the METABRIC and the samples of the TCGA BRCA were used as the validation set. (B) The latent features of each 679 METABRIC sample were generated in the second hidden layer of deep autoencoders, and (C) the samples were divided into the subgroups using the latent features as input features of unsupervised learning. (D) The Kaplan-Meier analysis was performed to compare the prognostic differences (recurrence-free survival rate) between the subgroups, and (E) the prognostic differences were validated using the recurrence-free survival data of 415 TCGA samples.

More »

Fig 1 Expand

Fig 2.

The t-SNE (t-distributed Stochastic Neighbor Embedding) plot of 64-dimensional latent features and the Kaplan-Meier survival curve of 679 METABRIC luminal-A breast cancer samples.

(A) The t-SNE plot (dimension size = 2) of latent features generated from the deep autoencoders of 679 METABRIC luminal-A breast cancer samples. The samples assigned to the BPS-LumA (the better prognostic subgroup) and WPS-LumA (the worse prognostic subgroup) colored as green and orange, respectively. (B) The green and orange curve indicates the BPS-LumA and the WPS-LumA, respectively. The x-axis refers recurrence-free survival months and the y-axis refers survival probability.

More »

Fig 2 Expand

Fig 3.

The t-SNE plot of 64-dimensional latent features of all samples in the METABRIC and TCGA BRCA dataset and the Kaplan-Meier survival curve of and 415 TCGA BRCA luminal-A breast cancer samples.

(A) The t-SNE plot of all samples in the METABRIC and TCGA BRCA datasets. The samples of METABRIC and TCGA BRCA are denoted as circle and square, respectively. The samples assigned to the BPS-LumA and WPS-LumA colored as green and orange, respectively. (B) The Kaplan-Meier survival curve of the 415 TCGA samples which were assigned to the closer prognostic subgroups (BPS-LumA and WPS-LumA) in the latent space. (C) The mean log2-transforemd expression levels and (D) the log-transformed median absolute deviation of individual genes (N = 17,202) in the METABRIC (microarray, x-axis) and the TCGA (RNA-seq, y-axis) dataset were plotted using scatter plots. Each dot indicates individual genes in the (C) and (D).

More »

Fig 3 Expand

Fig 4.

The Kaplan-Meier survival curves of when the samples are clustered using the gene expression profiles and the low-dimensional features generated using PCA.

The Kaplan-Meier survival curve when the 679 METABRIC samples were divided into the two clusters using (A) the whole 17,202 genes, (B) the top 5,000 most variable genes, the 64-dimesional features (PCA) of (C) the whole 17,202 genes and (D) the top 5,000 most variable genes, and the 2-dimesional features (PCA) of (E) the whole 17,202 genes and (F) the top 5,000 most variable genes.

More »

Fig 4 Expand

Fig 5.

The module size and the proportion of genes overlapping with the DEGs in each co-expressed module.

(A) The bar graph of the module size (the number of genes in each module). (B) The bar graph representing the % of genes overlapping with DEG. The lightcyan module is highlighted as yellow. (C) The network plot of the lightcyan module. The DEGs are colored as yellow and the node size indicates the connectivity of each node.

More »

Fig 5 Expand

Fig 6.

The comparison with previous luminal-A breast cancer stratification methods.

(A) Netanely’s method, (B) Poudel’s method. The x-axis refers the subgroups identified in the previous studies and the y-axis indicates the percentage of the BPS-LumA and the WPS-LumA belonging to them. The BPS-LumA and WPS-LumA were colored as green and orange, respectively.

More »

Fig 6 Expand