ChromaFactor: Deconvolution of single-molecule chromatin organization with non-negative matrix factorization

doi:10.1371/journal.pcbi.1012841

Fig 1.

NMF provides interpretable decomposition of single-molecule chromatin conformation datasets.

a. Matrices representing the median all-by-all Euclidean difference (nm) between genomic loci in single molecules at a 30 kb locus in Drosophila melanogaster actively transcribing (left) and not transcribing (middle) the Abd-A gene from Mateo et al.[5] (n = 16,320 molecules). The rightmost panel shows the difference in distance matrices, indicating two domains with elevated local interactions and reduced distal interactions in populations transcribing Abd-A. b. Bulk trends in contact change are challenging to observe in single cells actively transcribing (left) and not transcribing (right) Abd-A. c. Non-negative matrix factorization (NMF) decomposes a dataset of single-cell distance matrices into a template matrix with interpretable chromatin domain boundaries and a contribution matrix describing the weight of each template to each cell. d. Three templates produced when NMF is applied to distance matrices at the Abd-A locus.

More »

Expand

Fig 2.

Visualization of NMF outputs and their relationship to single-cell behavior.

a. UMAP visualization of contribution matrix, colored by the template with the predominant contribution in each cell. b. Depiction of cell coordinates from selected individual cells. c. Component contributions for each cell, emphasizing high weight for templates 5, 1, and 0. d. Distance matrices corresponding to each cell. e. Denoised reconstructions of the distance matrices, created by multiplying the template contribution of a cell shown in (c) by the template matrix. f. NMF templates 5, 1, and 0, which had the highest weight contributions for the three individual cells in (c). g. Median contact distance across all cells in the dataset with the highest weight contributions to templates 5, 1, and 0, respectively.

More »

Expand

Fig 3.

NMF templates are significantly correlated with transcription.

a. Application of random forest models to predict cell transcription from the contribution matrix alone. b. A random forest can modestly predict transcription in abd-A, Abd-B, and Ubx, demonstrating that the components capture salient information for transcription. c. Random forest feature importance highlights templates 0, 1, and 5 as most important for predicting transcription. d. Several components, including 0, 1, 5, and 14, have significantly different component contribution weights in transcribing and non-transcribing cells (FDR < 0.001, Benjamini-Hochberg correction). e. UMAP visualization of component contribution matrix, colored by cells with a high contribution of components 0, 1, 5, and 14 (blue) and a low contribution of these components (red). f. Mean distance between abd-A and nearby enhancers at the same locus across the subset of cells with high and low component contributions. g. Median contact of cells with high and low component contributions, encompassing the subset of cells which may be responsible for changes in contact observed in bulk.

More »

Expand

Fig 4.

Interpretable templates at the HCLS locus in IMR-90 cells.

a. Average chromatin contacts in cells actively transcribing (left) and non-transcribing (middle) the HCLS gene within the surrounding 10 Mb region. The right panel highlights the contrast in contact patterns, emphasizing a stronger boundary in actively transcribing cells. b. Templates generated using Non-Negative Matrix Factorization (NMF) on this cell population. Among the 20 components, five exhibit significant differences between cells transcribing and not transcribing the HCLS gene (FDR < 0.1, Benjamini-Hochberg correction). c. Directionality index of templates 0 and 11 correspond with the location of the transcribed HLCS gene. d-e. Insulation scores of templates 3 and 4 align with a shift in compartment in IMR-90 cells.

More »

Expand