Variant calling enhances the identification of cancer cells in single-cell RNA sequencing data

doi:10.1371/journal.pcbi.1010576

Fig 1.

Heatmap visualizing inferred CNV results obtained using CopyKAT [10].

Columns represent individual cells, rows represent genes arranged by chromosomal position. Alternating bars on the y-axis indicate chromosome. Cells are clustered hierarchically within each patient using expression score matrices. Pink brackets indicate cells with low inferred CNV. Panel (A) shows results for the TNBC dataset and panel (B) for the CRC dataset.

More »

Expand

Fig 2.

Heatmap indicating alteration status for 736 cells for the top 25 most frequent oncogenic, predicted oncogenic, and likely oncogenic alterations in the CRC dataset.

Alterations are annotated using OncoKB. Absence of an alteration is noted when a cell has a read depth of at least 5 for all bases corresponding to the residue. For residues without an oncogenic alteration and with read depths less than 5 for all corresponding bases, the presence or absence of an alteration is not characterized (labeled as “Insufficient coverage”). Common recurrent driver mutations are shown in bold.

More »

Expand

Fig 3.

Relationship between putative driver alteration counts and inferred CNV for normal tissues (left) and tumor (right) dataset cells.

(A) Cancer dataset (at right) cells shown based on primary tumor site, normal cells shown together at left. (B) TNBC cells, grouped by patient, shown in comparison to normal cells, grouped by tissue type. (C) CRC cells, grouped by patient, shown in comparison to normal cells, grouped by tissue type. Higher mean absolute CNV values indicate predicted structural alterations resulting in copy number variation, and lower values suggest limited CNV. Dashed rectangles in (A) indicate regions of interest: groups of cells that might be identified as cancer cells by either CNV inference or putative driver alteration count. Dashed rectangles are bounded at the lower ends by the 99th percentile values derived from the values for 4,415 normal cells amenable to both CopyKAT and variant analysis. Dashed polygon in (B) indicates cells of interest, with either high CNV scores or high putative driver counts, that might be selected for downstream analyses. In (A), first, second, and third quartiles are indicated for tumor cells by dashed lines and bold labels along their respective axes.

More »

Expand

Table 1.

Table showing the correlation coefficients and statistical significance for relationships between inferred CNV and putative driver alteration counts for cells belonging to each patient.

p-values less than 0.05 are shown in bold.

More »

Expand

Table 2.

Gene set enrichment analysis results showing top 10 enriched cancer hallmark gene sets by enrichment score for groups of driver enriched cells versus all others.

Top section shows enrichment when comparing cells with high putative driver counts to cells with low putative driver counts for the CRC dataset. Middle section shows enrichment when comparing ERBB2+/PIK3CA+ cells to cells lacking the characteristic ERBB2 or PIK3CA mutations for the CRC dataset. Bottom section shows enrichment when comparing cells with high putative driver counts to cells with low putative driver counts for the TNBC dataset.

More »

Expand

Fig 4.

Flowchart depicting a cancer-cell-filtering process.

Additional steps proposed in this work, to include variant calling and analysis, are shown by the green dashed rectangle. Solid rectangles indicate inputs and outputs, hexagons indicate processes.

More »

Expand