CNAViz: An interactive webtool for user-guided segmentation of tumor DNA sequencing data

doi:10.1371/journal.pcbi.1010614

Fig 1.

CNAViz enables user-guided segmentation for improved copy-number calling.

(a) The genome of cancer cells (gray circles) is affected by CNAs (colored dots). DNA sequencing reads obtained from these cancer cells are aligned to a human reference genome, which is partitioned into bins (defined by the start and end position of the bin in a certain chromosome). For each bin, two signals are measured from DNA sequencing reads: the RDR, which is proportional to the total number of copies of the bin in the genome, and the BAF, which measures allelic imbalance. (b) Local segmentation algorithms combine neighboring bins with identical RDR (top plot) and BAF (bottom plot, where allelic imbalance is represented instead of BAF and is measured as 0.5 − BAF) into segments. Differences across datasets might lead to overclustering. (c) Global segmentation algorithms cluster bins with similar RDR and BAF values across the entire genome, disregarding genomic location information, which may lead to spurious clusters and omit focal CNAs. (d) CNAViz allows the user to unify local and global segmentation approaches to obtain a more accurate segmentation.

More »

Expand

Fig 2.

CNAViz provides the user with a variety of options, modes, and plots to help the user create an effective segmentation.

(a) Buttons containing import/export options as well as a demo dataset, and allowing the user to import driver genes or use the existing Cancer Genome Census (COSMIC) driver genes. Also includes a drop-down menu for chromosome, the color of the selected bins (default is black), and point size of each bin. (b) Checkboxes controlling the 2D scatter and 1D linear plots. (c) Buttons which lead to pop-ups with analytics, automatic functions, and cluster assignment history. (d) A table summarizing all clusters assigned so far and the percentage of bins represented in each cluster. Also provides the user with the option to change the color for any cluster ID. (e) The toolbar at the top of the screen. The toolbar describing the different modes (Zoom, Pan, Select, Deselect), and their respective hotkeys, will float at the top center of the screen, and the help button is in the top right. (f) Scatter plot with RDR on the y-axis and allelic imbalance on the x-axis. When hovering over a point in the scatter plot, a tooltip appears with information about the corresponding bin including the genomic position, bin size, RDR, allelic imbalance, and cluster ID. In addition, the hovered bin’s position on the linear plots is indicated with a black bar. (g) RDR and allelic imbalance linear plots with genomic position on the x-axis. (h) When points are selected, the color of the bins on all plots changes to a dark blue color. The cluster composition of the selected points is displayed under the plots with a table, where the row color matches the cluster color in the plots. (i) A second sample, where the selected bins are synced across the two samples and across the 2D scatter and 1D linear plots. (j) Driver genes are displayed as red dots along the x-axis of the linear plots. When a driver gene is clicked, it is locked in place and represented as an orange bar with the driver gene symbol above it. Hovering over one of the red dots allows the user to preview the driver gene (displayed as a green vertical bar).

More »

Expand

Fig 3.

CNAViz provides the user with a variety of analysis tools and automated functions to help generate an accurate segmentation.

(a) Average silhouette coefficient bar plot. Above the bar plot, the average of the silhouette scores for each cluster is displayed. Average Euclidean distance bar plot. Displays the average inter-cluster distance of each cluster to the cluster selected in the drop-down above the plot. (b) Centroid Table, illustrating each cluster, and the RDR and BAF values defining each cluster’s centroid in each sample. In this pop-up, we also provide the user with the automated Merge function, which allows the user to set RDR and BAF thresholds per sample. Clusters whose centroids are closer than the user-defined thresholds will subsequently be merged. See Automation for further details. (c) The Absorb Bins pop-up allows the user to select “From” clusters and “To” clusters. All bins in the “From” clusters will be evaluated according to a user-defined threshold, and re-assigned to the closest legal “To” cluster. See Automation for further details.

More »

Expand

Fig 4.

By using CNAViz, users are able to produce more accurate segmentation solutions on simulated data in both de novo mode as well as when refining a given segmentation.

(a) A two-dimensional plot of RDR (y-axis) and allelic imbalance (x-axis, measured as 0.5 − BAF) of 50 Kb genomic bins (points). Colors represent the ground-truth segments/clusters. Table shows performance metrics for each method. (b) Comparison of HATCHet’s global segmentation solution before (left plots) and after user refinement (HATCHet + CNAViz, right plots). (c) Comparison of ASCAT’s local segmentation solution before (left plots) and after user refinement (HATCHet + CNAViz, right plots). In each plot of (b) of (c) respectively, the same genomic bins are displayed, but colored according to each method’s inferred segmentation.

More »

Expand

Fig 5.

Manual editing using CNAViz results in more accurate identification of CNA status of breast cancer driver genes compared to an existing segmentation algorithm.

The DNA sequencing data of two tumor samples (DCIS and INV) obtained from each of three breast cancer patients (P5, P6, and P10) analyzed by [27]. (b) The number of correctly identified CNAs for breast cancer driver genes (y-axis) is reported across all samples of the three patients when using either the existing segmentation algorithm HATCHet (yellow) or after manual refinement of the HATCHet results with CNAViz (green). The number of correct driver genes is listed above each bar. (c) The number of breast-cancer driver genes with different types of CNAs inferred by either HATCHet (columns in top table) or HATCHet + CNAViz (columns in bottom table) is compared with the high-resolution CNAs measured by the matched classification in single-cell sequencing data (rows in both tables). (d) The CNAs (y-axis) inferred by HATCHet + CNAViz for two distinct sub-populations of cancer cells identified in Patient 10 are shown in orange and purple, with 0.15 separation for visual clarity.

More »

Expand