A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing

doi:10.1371/journal.pcbi.1011557

Fig 1.

The CONGAS+ approach.

A,B. Two CNA-associated tumour subclones C₁ and C₂, evolutionary nested (C₂ descend from C₁), together with a normal population N. Subclone C₁ is associated with a loss of heterozygosity (LOH), C₂ with an amplification (AMP). C. CNA profile for C₁, C₂ and N. In the first segment all the populations are diploid, in the second both C₁ and C₂ have a single-copy genome and in the third segment C₂ has a triploid genome. D. Clone proportions in a sequencing assay. E. The inference from RNA and ATAC distributions pose different challenges. One of the two (here RNA) might show weaker multimodal signals, making clustering a more challenging task. A joint assay has the advantage to gain the best of the two data types. Here we figure a stronger bimodal signal in ATAC. F. On top, cartoon ATAC/RNA signal for each segment. In the first, all clones have similar signals; in the second, normal cells have more ATAC/RNA; in the third, C₂ has more ATAC/RNA. RNA signals are more overdispersed, as in panel (E). On bottom, Bayesian categorical priors for the segment values with most mass at 2 (diploid).

More »

Expand

Fig 2.

CONGAS+ graphical models.

A Design of CONGAS+, which can be applied to experimental settings where scRNA-seq and scATAC-seq data are obtained from independent cell splits, or from a multiomics assay (i.e., both measures come from the same set of cells). The input segmentation for CONGAS+ can follow arm-level CNAs, or the profile obtained from an optional bulk sequencing assay. B, C Probabilistic graphical models represent observed and latent (i.e., inferred) variables for the flat CONGAS+ (B) and the multiomics (C) extension. Colours encode the ATAC and RNA specific variables, while variables in black are shared. Parameters are learnt via stochastic variational inference in Pyro [26].

More »

Expand

Fig 3.

Simulated scATAC-seq/ scRNA-seq data.

A. Segments breakpoints and copy number values of a synthetic dataset with K = 3 clusters. Only chromosomes with one or more CNAs are displayed in the plot. B,C. Low dimensional representation of the scATAC-seq and scRNA-seq profiles in panel (A); cells are colored by simulated clone. D,E. Data distributions for a segment of chromosome 3, with a loss of heterozygosity in one clone, and for a segment of chromosome 8, with an amplification in a different clone. F,G. Probability density functions estimated by CONGAS+ (F) and data histogram (G) for the chromosome 8 segment in panel (E).

More »

Expand

Fig 4.

Results on simulated data.

A. Adjusted Rand Index (ARI) among simulated cells and CONGAS+ clustering assignments, for 90 datasets with 1500 cells for scRNA-seq and 1500 for scATAC-seq. CONGAS+ performance is compared with copyKAT [19] and copy-scAT [22]. B Mean Absolute Error (MAE) among simulated and inferred copy number profiles. C. Computation times (in seconds) for CONGAS+ with up to 100, 000 cells, on CPU and GPU. D. Example counts of a bootstrap sample for an amplified segment with bimodal ATAC signal, and unimodal RNA signal (see Fig 1E). E. ARI boxplot for copyKAT, CONGAS and CONGAS+ (computed on scRNA and scATAC separately) in a test simulated as in panel (D).

More »

Expand

Fig 5.

CONGAS+ shrinkage effect.

A,B. Segments with bimodal signal (tumour versus normal) in both scRNA-seq and scATAC-seq of the Basal Cell Carcinoma (BCC) sample SU008 [38, 39]. C. Adjusted Rand Index (ARI) for CONGAS+ inference as a function of different values of the shrinkage coefficient λ. Higher values of λ favour RNA over ATAC, and viceversa. The maximum ARI is achieved for low λ and ATAC. D,E. ATAC and RNA profiles on the segments in panel (C) show that for low λ cells are split into 2 ATAC clusters. In RNA, instead, regardless of λ the cells can be split, as suggested in Fig 1E. F-H. From sample SU006 [38, 39], instead, we obtain a good clustering both in RNA and ATAC data.

More »

Expand

Fig 6.

ATAC/RNA CONGAS+ analysis versus scDNA-seq.

A,B. Mapping among scDNA-seq clones (ground truth) detected from a gastric cancer cell line (SNU601 [20]), and clusters inferred by CONGAS+ (λ = 0.5) from independent ATAC/RNA data, using the segmentation of the most prevalent clone from scDNA-seq. The largest cluster per mapping is highlighted to denote that there is almost a one-to-one mapping between the analyses, as is also suggested by the absolute mean absolute deviation between copy number profiles of the two analyses. C. CNA profiles for the matched analyses are in large agreement, excluding small segments on chromosomes 3, 4 and 20. D,E. A UMAP low-dimensionality representation shows good overlap between analyses. F,G. Comparison between the ATAC count distribution on the p-arm of chromosome 20, coloured by ground truth clones and CONGAS+ clusters. H. RNA distribution on the p-arm of chromosome 20 as in panels F-G shows concordance among ATAC and RNA. I,J. Differential gene expression volcano plot (Wilcoxon test) for two CONGAS+ clusters. and binding motifs associated with differently expressed ATAC peaks in both clusters.

More »

Expand

Fig 7.

Application of CONGAS+ to B-cell lymphoma multimodal data.

A. Cell types annotated in a low-dimensionality UMAP representation of ∼6400 RNA and ATAC single-cell data from a 10x multiomics assay [47]. B,C. UMAP coloured according to the two clusters inferred by CONGAS+ (multiomics model) from RNA/ATAC data, using an arm-level segmentation. The analysis separates perfectly tumour and normal cells. D. Copy number profiles for the two clusters identified by CONGAS+; segments with no lines have the same segments in all clusters. E-F. Normalised counts for the q-arm of chromosome 3, where the tumour is amplified, and the p-arm of chromosome 6, where the tumour has a loss. G-H. Differential testing for RNA counts (G) and ATAC peaks (H) across the two clusters. I. Comparison among clustering assignments the multiomics and flat CONGAS+.

More »

Expand

Fig 8.

CNA-associated drug resistance with CONGAS+.

CONGAS+ application to a prostate cancer dataset from [55], composed of a mixture of four cell lines with 7600 scRNA-seq cells and 8800 scATAC-seq cells. A. Cartoon representing the design of the drug resistance experiment. B. Distribution of the 5 clusters inferred by CONGAS+ across the original cell lines. C. copy number profiles inferred by CONGAS+ for each cluster. D,E. density plot and histogram of normalised counts coloured according to cluster assignments for chromosome 1p (D) where an amplification event is private to cluster C3, and chromosome 6p (E) where an amplification is shared by clusters C3 and C5. F,G. Volcano plot showing differentially expressed genes between C1 and the rest of the cells (F) and between cells in C3 and cells in C2 and C4. H. Phylogenetic tree inferred with MEDICC2 [60], using CNAs inferred by CONGAS+. Tips are labeled according to the inferred cluster, and edges are labeled with the number of events that accumulate in the corresponding branch.

More »

Expand