Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices

doi:10.1371/journal.pone.0079720

Table 1.

Summary of copy number lesions that are considered.

More »

Expand

Figure 1.

Size of matrices to run NMF according to the Hamming distance to consider equivalent columns.

The matrix produced by using the DLBCL samples of Data set 1 as rows and their DNA copy number profiles as columns was subjected to the compaction procedure, according to the similarity between columns based on Hamming distance. As higher the maximum allowed Hamming distance used to merge columns as lower the number of resulting columns. Later, matrices of those dimensions are used as input for the NMF.

More »

Expand

Figure 2.

Comparison of Compact-NMF and Full-NMF over the DLBCL data of Data set 1.

The matrix produced by using the DLBCL samples of Data set 1 as rows and their DNA copy number profiles as columns was used to test the distinct manners of running NMF. Full-NMF stands for the procedure which runs over all data, while Compact-NMF stands for our procedure that merge similar columns. The graphs show the ratio between the divergence (the objective function we minimize) of the Compact-NMF over the divergence of the Full-NMF, so higher values mean greater error. Results are shown for different factorization ranks.

More »

Expand

Figure 3.

Comparison of Standard-NMF and Full-NMF over the DLBCL data of Data set 1.

The matrix produced by using the DLBCL samples of Data set 1 as rows and their DNA copy number profiles as columns was used to test the distinct manners of running NMF. Full-NMF stands for the procedure which runs over all data, while Standard-NMF stands for the procedure that keeps only one of each group of similar columns. The graphs show the ratio between the divergence (the objective function we minimize) of the Standard-NMF over the divergence of the Full-NMF, so higher values mean greater error of Standard-NMF. Results are shown for different factorization ranks.

More »

Expand

Figure 4.

Frequency plots of CN aberrations of DLBCL patients of Data set 1, according to the clustering results.

The DLBCL samples of Data set 1 were divided into three subgroups, according to the clustering results of rank 3. Clusters 1, 2 and 3 have 86, 54 and 26 cases, respectively. The frequency of samples with gain in copy number is given in red, while the frequency of losses is in blue (the scale into negative numbers is just for plotting purposes).

More »

Expand

Table 2.

Association between the clusters and the molecular subtypes in the DLBCL data sets.

More »

Expand

Figure 5.

Prognostic significance of NMF-identified clusters among R-CHOP-21 treated DLBCL.

Subfigures A and B show Kaplan-Meier estimates of OS (left panel, log-rank test p-value = 0.063) and PFS (right panel, log-rank test p-value = 0.034) in R-CHOP-21 treated DLBCL patients from Data set 1.

More »

Expand

Table 3.

Association between the clusters and the molecular subtypes in the breast cancer data set.

More »

Expand

Table 4.

Association between the clusters and the molecular subtypes in the medulloblastoma data set.

More »

Expand