Fig 1.
Plate notation for LDA, CTM, and MMCTM.
Graphical models for the a LDA, b CTM and c MMCTM models, with d descriptions of their variables. See S1 Text for detailed descriptions.
Fig 2.
Predictive log-likelihood benchmark.
SNV and SV signature per-mutation predictive log likelihood means ± standard error (n = 50) for: a 2–12 signatures, b a range of mutation count fractions, and c MMCTM with estimated or fixed Gaussian covariance matrix. NMF: applied to raw counts, NMF-norm: applied to normalized counts.
Fig 3.
Low-dimensional classifier input benchmark.
Accuracy means ± standard error (n = 50) is displayed for training with SNV (left), SV (middle), and both SNV and SV (right) signature probabilities. NMF: applied to raw counts, NMF-norm: applied to normalized counts.
Fig 4.
Signature probability mean absolute errors on synthetic data.
Shown are mean absolute errors per method and per signature (n = 20) for estimated signature probabilities compared to reference probabilities. The experiment was repeated with full mutation counts and with 1% SNVs & 10% SVs. Data is represented as Tufte-like boxplots with the following elements: points (medians), gaps (first to third quartiles), whiskers (extend to the most extreme value no further than 1.5X the inter-quartile range from the gap edge), dashes (outliers). NMF: applied to raw counts, NMF-norm: applied to normalized counts.
Fig 5.
BRCA-EU mutation signature analysis.
a SNV mutation signatures. SNVs are organized according to the SNV type (color). Within each type, SNVs are further organized into the pattern of flanking nucleotides (A—A, A—C, …,T—G, T—T). b SV mutation signatures. SVs are grouped by type (DEL: deletion, DUP: tandem duplication, INV: inversion, TR: translocation). c Heatmap of relative signature probabilities in BRCA-EU samples. Each heatmap column represents a single sample, and is composed of the SNV and SV signature probabilities output from the MMCTM model. The values for each signature (row) have been standardized, producing z-scores. Heatmap display has been truncated to ±3. Samples have been hierarchically clustered according to their transformed signature probabilities and cluster labels are indicated with colors underneath the dendrogram. The number of samples in each cluster is indicated in parentheses in the cluster legend. ER, PR, and HER2 positive status, BRCA1/2 mutation or methylation status, other gene driver mutation status, HRDetect prediction, and MMRD status is indicated with black bars. Grey cells represent missing data for annotation tracks. Samples with zero mutations for a mutation type also have grey signature probability cells. d Correlation heatmap between SNV and SV signatures. e Annotation associations for sample clusters. Upward- and downward-pointing triangles indicate enrichment and depletion, respectively. Adjusted p-values >0.05 are not shown. Colors correspond to cluster colors indicated in the heatmap.
Fig 6.
Ovarian cancer mutation signature analysis.
a SNV mutation signatures. SNVs are organized according to the SNV type (color). Within each type, SNVs are further organized into the pattern of flanking nucleotides (A—A, A—C, …,T—G, T—T). b SV mutation signatures. SVs are grouped by type (DEL: deletion, DUP: tandem duplication, INV: inversion, FBI: foldback inversion, TR: translocation). c Heatmap of relative signature probabilities in ovarian cancer samples. Each heatmap column represents a single sample, and is composed of the SNV and SV signature probabilities output from the MMCTM model. The values for each signature (row) have been standardized, producing z-scores. Heatmap display has been truncated to ±3. Samples have been hierarchically clustered according to their transformed signature probabilities and cluster labels are indicated with colors underneath the dendrogram. The number of samples in each cluster is indicated in parentheses in the cluster legend. Samples from the ICGC OV-AU project are indicated with black bars, as is microsatellite instability (MSI) and gene mutation status. Samples with zero mutations for a mutation type also have greyed signature probability cells. The number of SNVs for a POLE mutant sample has been truncated to 40k in the barplot; The actual number is 596,135. d Annotation associations for sample clusters. Upward- and downward-pointing triangles indicate enrichment and depletion, respectively. Adjusted p-values >0.05 are not shown. Colors correspond to cluster colors indicated in the heatmap. e Kaplan-Meier curves for HGSC samples only. f Risk table for HGSC samples only. Kaplan-Meier curve plots and risk tables share x-axes. g Correlation heatmap between SNV and SV signatures.