Figures
Abstract
The genome is organized into distinct chromatin compartments with at least two main classes, a transcriptionally active A and an inactive B compartment, broadly corresponding to euchromatin and heterochromatin. Chromatin regions within the same compartment preferentially interact with each other over regions in the opposite compartment. A/B compartments are traditionally identified from ensemble Hi-C contact frequency matrices using principal component analysis of their covariance matrices. However, defining compartments at the single-cell level from sparse single-cell Hi-C data is challenging, especially since homologous copies are often not resolved. To address this, we present MaxComp, an unsupervised method, for inferring single-cell A/B compartments based on 3D geometric considerations in single-cell chromosome structures—derived either from multiplexed FISH-omics imaging or 3D structure models derived from Hi-C data. By representing each 3D chromosome structure as an undirected graph with edge-weights encoding structural information, MaxComp reformulates compartment prediction as a variant of the Max-cut problem, solved using semidefinite graph programming (SPD) to optimally partition the graph into two structural compartments. Our results show that the population average of MaxComp single-cell compartment annotations closely matches those derived from ensemble Hi-C principal component analysis, demonstrating that compartmentalization can be recovered from geometric principles alone, using only the 3D coordinates and nuclear microenvironment of chromatin regions. Our approach reveals widespread cell-to-cell variability in compartment organization, with substantial heterogeneity across genomic loci. When applied to multiplexed FISH imaging data, MaxComp also uncovers relationships between compartment annotations and transcriptional activity at the single-cell level. In summary, MaxComp offers a new framework for understanding chromatin compartmentalization in single cells, connecting 3D genome architecture, and transcriptional activity with the cell-to-cell variations of chromatin compartments.
Author summary
Chromosome conformation capture and imaging techniques have revealed that the genome spatially segregates into at least two functional compartments. Ensemble Hi-C contact frequency maps display checkerboard-like patterns, indicating that chromatin regions fall into two major compartments—likely a result of phase separation—where regions within the same compartment preferentially interact, often over long genomic distances, while interactions with the opposite compartment are minimized. Principal component analysis (PCA) of ensemble Hi-C data is commonly used to identify these compartments. However, because the compartment annotations are derived from a cell population, this method cannot provide information about compartments in single cells. In this study, we introduce an unsupervised graph-based method to predict A/B compartments in single cells, which utilizes only structural information in single cells. Our results demonstrate that ensemble PCA-based compartment annotations can be reproduced as population averages of our single-cell predictions. Our results reveal substantial cell-to-cell heterogeneity in compartmentalization, with notable variability across genomic regions. Applying our method to multiplexed FISH tracing data links single-cell compartment annotations with gene transcriptional activity, and enables exploration of how local chromatin structure relates to compartment identity. Compared to existing methods, our approach achieves superior compartmentalization scores, offering a robust and interpretable framework for analyzing genome architecture in single cells.
Citation: Zhan Y, Musella F, Alber F (2025) MaxComp: Predicting single-cell chromatin compartments from 3D chromosome structures. PLoS Comput Biol 21(5): e1013114. https://doi.org/10.1371/journal.pcbi.1013114
Editor: Alexandre V. Morozov, Rutgers University: Rutgers The State University of New Jersey, UNITED STATES OF AMERICA
Received: September 20, 2024; Accepted: May 5, 2025; Published: May 23, 2025
Copyright: © 2025 Zhan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code used to perform the algorithm and the analysis are publicly available on Github: https://github.com/alberlab/MaxComp.
Funding: This work was supported by the National Institutes of Health (UM1HG011593 to F.A). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: F.A. is a shareholder of EarlyDX, Agoura Hills, CA, USA. All other authors declare no competing interests.
Introduction
The development of chromosome conformation caption [1–3] and imaging techniques [4–8] has greatly enhanced our understanding of local chromatin folding and higher-order organization across scales, from nucleosomes to chromosome territories. Hi-C studies have revealed key structural features, such as chromatin loops and topological associating domains (TADs) [9–13]. Moreover, chromatin segregates into at least two major compartments, likely driven by protein- and nucleic acid-mediated phase separation [14–16]. A widely used approach to determine chromatin compartmentalization involves applying principal component analysis to the correlation matrix derived from ensemble Hi-C contact frequency matrices [1,2]. The first principal component typically distinguishes A and B compartments: chromatin regions with positive eigenvalues (A compartment) are associated with open, transcriptionally active chromatin, while those with negative eigenvalues (B compartment) correspond to more condensed, transcriptionally inactive heterochromatin. These A/B chromosome compartment profiles correlate strongly with other markers of chromatin state, including histone modification patterns from ChiP-seq data [17] and genomic features such as CG content and CpG density [18].
Most current analysis of chromatin compartments rely on ensemble datasets, describing chromatin properties averaged over a large population of cells. With the development of single-cell genomic technologies such as single-cell Hi-C [19–22] and multiplexed FISH-based chromosome tracing techniques [4–8], it has become increasingly important to characterize chromatin compartments at the single-cell level. Applying ensemble level compartment annotations to single-cell structures is problematic due to substantial structural variability between individual cells. Therefore, methods that infer chromatin compartments from single-cell information are essential for uncovering the relationship between genome structure and gene function at the individual cell level. The approaches can also identify genomic regions with high cell-to-cell variability in compartmental states.
To address this challenge, several approaches have been developed that infer A/B compartments from either single-cell Hi-C contact [22–24] or distance matrices derived from single-cell chromatin structures [16,22]. For instance, deep learning-based approaches like scGHOST have been proposed to learn A/B compartment patterns from labeled training data in single-cell Hi-C maps [24]. Other strategies rely on features such as local CG content of a chromatin region for single-cell compartment annotations [22]. However, these approaches often require either high-quality datasets or ground truth annotations for supervised learning, limiting their general applicability.
Here we provide a strategy to determine single cell chromatin compartments based solely on geometric properties of chromosome structures, generated either from computational modeling [25] or super-resolution multiplexed FISH imaging [7,8]. Our approach is based on the assumption that chromosome regions within the same compartment tend to be spatially closer to one another than to regions in opposing compartments. We first embed each chromosome structure into a weighted graph, with nodes representing chromatin regions and edges connecting nodes. Edge weights are derived from geometric features calculated from the spatial distances between the chromatin regions, their sequence distance and their relative geometric locations from nuclear speckles. Compartment segregation is then formulated as a “Max-cut problem”, a graph optimization problem that identifies a division of nodes into two groups such that the sum of weights of edges crossing the cut is maximized [26,27]. We refer to this approach as MaxComp.
The resulting single-cell compartment assignments derived from MaxComp not only conform well with those obtained from ensemble Hi-C matrices but also agree with considerations about structural organization and transcriptional activities at the single-cell level.
We assessed our single-cell compartment predictions by comparing the frequency with which each chromatin region is assigned to the A or B compartment across the cell population to the first principal component (PC1) values obtained from traditional PCA analysis of ensemble Hi-C data. The strong correlation between these profiles confirms that MaxComp’s single-cell compartment predictions are consistent with established ensemble-level compartment annotations. To further validate our method, we calculated compartmentalization scores, which quantify the structural segregation of compartments. MaxComp consistently achieved the highest scores outperforming other methods, such as Hi-C-based PCA, distance-based PCA using average distance matrices, and a method based on average CG content within spatial neighborhoods. Moreover, our single-cell compartment assignments agree with known properties of active and inactive chromatin, including preferences in nuclear positions, chromatin fiber condensation levels, and proximities to nuclear bodies, which confirmed expected trends in euchromatin and heterochromatin regions. Finally, we assessed single-cell compartment annotations with single-cell transcription data, which confirmed that genes predicted to be in the A compartment show a higher transcriptional activity than when the same genes are assigned to the B compartment, further supporting the functional relevance of our predictions.
Our results demonstrate that single-cell compartments can be inferred from geometric features of single-cell chromosome structures. The MaxComp approach is robust and can be successfully applied even on chromosome structures imaged at low resolution and coverage. Notably, MaxComp can determine A and B compartment annotations without relying on statistical models, prior knowledge, or deep learning frameworks, which require large training datasets.
Importantly, we observe substantial cell-to-cell variability in compartment annotations for certain chromatin regions. This highlights the limitations of ensemble-based annotations and underscores the importance of determining chromatin states at the single-cell level to capture biological heterogeneity.
Results
Formulating compartmentalization as a Max Cut problem
Our goal is to classify chromatin regions at a defined base-pair resolution (e.g., 200kb) into active (A) and inactive (B) compartments in single-cell chromosome structures, using only geometric features derived from single-cell data—without relying on ensemble-based compartment annotations from Hi-C data. The assumption is that chromatin within the same compartment shows increased interaction propensity, resulting in overall closer spatial proximity, while chromatin in opposing compartments is more spatially separated. We also expect that chromatin is more likely surrounded by neighboring regions of the same compartment in 3D space, reflecting spatial segregation between A and B compartments.
Moreover, locations within the nuclear environment can also be indicative of chromatin compartments. For instance, nuclear speckles—interchromatin granule clusters enriched in pre-mRNA splicing factors—are known to be associated with actively transcribed genes. Studies have shown that chromatin near nuclear speckles often harbors highly expressed genes [28]. Therefore, we expect regions of the A compartment to localize more frequently in the vicinity of nuclear speckles than regions of the inactive B compartment [29,30] (Fig 1A).
(A) The assumption of chromatin compartmentalization is based on two parts: Firstly, chromatin regions belong to the same compartment have higher contacts than those from the different compartment; Secondly, chromatin regions of compartment A are spatially closer to nuclear speckles than regions of compartment B. (B) Every single-cell structure is transformed to an undirected graph which can be represented by an adjacency matrix whose edge weights are decided by its pairwise distances and speckle distances. Max Cut is then applied to the matrix to generate two partitions of nodes which are the prediction of the single-cell compartments of the structure. Ensemble compartment frequency can be calculated by combining a population of single-cell profiles.
To achieve our goal, we represent each single-cell 3D chromosome structure (either from structure models or imaging experiments) as an undirected graph where each chromatin region is a node
connected by edges with weights that encode information about their 3D distance and differences in their nuclear microenvironment. Specifically, edge weights between genomic regions
and
are derived from their spatial distance normalized by the expected distance given their genomic separation, along with the z-score difference in their distances to the nearest nuclear speckle (Methods). Our objective is then to partition this graph into two subgraphs in a way that maximizes the total weight of edges between them. Nodes within the same subgraph are then assigned to the same compartment -A or B-based on their average distance to speckle. This task can be achieved by solving a graph theory problem named the maximum-cut problem (Max-cut), which is NP-hard [31,32]. By solving the Max-cut problem for a given single-cell structure graph, we assign A/B compartment annotations to all chromatin regions. Each chromosome structure is then characterized by a single-cell compartment profile, resulting in a compartment profile that captures compartmentalization at the single-cell level.
Relaxation and approximation of the Max-cut algorithm in MaxComp
We formulate the prediction of chromatin compartments in single cells as a Max-cut problem, which aims to divide a graph into two subgraphs such that the total weight of edges between them is maximized. However, the Max-cut problem is NP-hard, and cannot be solved exactly by a polynomial-time algorithm [31]. Besides greedy approaches, multiple approximation algorithms have been developed to reach a relatively high approximation ratio. Goemans and Williamson [26,28] have proven that certain relaxation and random projection techniques can increase the approximation ratio to about 0.878 to find a near optimal solution for each given Max-cut problem in polynomial time. The relaxation converts the original quadratic programming problem—where each node is represented by an indicator for its compartment type—into a vector programming problem, where nodes are represented by vectors. This transformed problem can be reformulated as a semidefinite programming (SPD) problem, which aims to optimize a linear function subject to positive semidefinite constraints. In our framework, given the Laplacian matrix , which is a matrix representation of the graph
representing the target chromosome structure (Methods), the goal is to find a symmetric and positive semidefinite matrix
containing information of node labels with diagonal elements set to one so that we can maximize the objective function
(Methods). We demonstrate that maximizing intra-compartment similarity while simultaneously minimizing similarity between compartments can be accomplished with the same objective function. This confirms that single-cell compartment annotations are uniquely determined by
, which encodes both spatial geometry and nuclear context (Methods).
Ideally, we aim to obtain a strictly positive semidefinite matrix , enabling Cholesky decomposition to generate
, where
contains row vectors representing nodes (i.e., genomic regions) in the hyperspace. However, generating a strictly semidefinite (SD) matrix requires lengthy computation times due to small convergence thresholds, especially on large graphs with hundreds of nodes. Therefore, we employ an approximation strategy to generate a close SD matrix with a larger convergence threshold. Subsequently, we apply lower-diagonal-lower (LDL) decomposition, a variant of Cholesky decomposition, which decomposes the target matrix into two triangular matrices and a diagonal matrix to obtain an approximated SD matrix
, facilitating the discovery of approximated row vectors (Methods). We prove that the difference in Euclidean norm between matrix
containing approximated row vectors and matrix
is strictly governed by the difference in Euclidean norm between approximated matrix
and matrix
, ensuring minimal errors during approximation (Methods). Furthermore, Goemans and Williamson [26,27] introduced a random projection approach to iteratively generate hyperplanes, dividing all row vectors of
into two groups labeled as compartment A and B. Using the adapted version of this algorithm, we are able to perform MaxComp several times faster, which is particularly suitable for high-resolution chromosome structures with several hundreds of chromatin regions.
MaxComp prediction of single cell compartments from 3D chromosome structures
We apply our MaxComp approach to three different datasets. First, we use single-cell 3D genome structures generated by the integrative genome modeling (IGM) platform [25], using Hi-C [3], Lamin B1 DamID [33] and SPRITE [34] data as input information [25]. These structures are resolved at 200kb base-pair resolution and also predict the spatial distance of each chromatin region to the nearest nuclear speckle in each single cell, using a Markov clustering approach as described in [30]. Second, we apply MaxComp to 3D genome structures of human IRM90 cells from DNA multiplexed error-robust fluorescence in situ hybridization (MERFISH) experiments, a chromosome tracing method that images 3D genome structures for more than 7,000 single cells at a coverage of around 3Mb [7]. Thus, 3D coordinates are available for chromatin regions spaced approximately every 3Mb along the linear genome, providing their spatial positions within the nucleus. The method also imaged the locations of nuclear bodies within the same cell, allowing to estimate also the spatial distance of each chromatin region to nuclear bodies. Finally, we also apply our method to chromosome structures from DNA sequential fluorescence in situ hybridization (seqFISH+) imaging [8], which traces mouse embryonic stem cell (mESC) chromosomes at 1Mb coverage for 444 imaged cells (888 chromosome copies) and also provides relative locations of nuclear speckles within the same imaged cells.
Applying MaxComp to chromosome structures from IGM genome structure modeling
First, we apply MaxComp on chromosome structures extracted from whole genome structures of H1-hESC cells generated by integrative modeling [25]. These structures also predict positions of nuclear bodies, such as nuclear speckles and nucleoli [25,30]. Specifically, we took 500 3D structures of chromosome 6 and 500 structures of chromosome 10 from single-cell H1-hESC whole genome structures. We then applied MaxComp to generate a single-cell compartment profile vector for each chromosome structure (Methods) (Fig 1B). The input graph of a structure can be represented by an adjacency matrix, where the entry in row and column
represents the weight of the edge connecting vertex
and vertex
. Adjacency matrices and predicted single-cell compartment profile vectors vary considerably across individual structures (Fig 2A and Fig 2B, left column). Noticeably, along the chromosome sequence, single-cell compartment annotations show more frequent transitions between A and B compartments (Fig 2B, left column) compared to the patterns derived from ensemble Hi-C derived (Fig 2B, right column). These transitions vary between individual cells, reflecting underlying structural heterogeneity. Despite this variability along the linear genome, visualization of single-cell compartments in 3D structures reveals pronounced spatial segregation of A/B compartments, with extended clusters of chromatin regions belonging to the same compartment forming distinct spatial domains (3D structures in Fig 2B,C, left panels).
(A) The predicted compartments for structure 35, 115, 255, 361, 443 of H1-hESC Chr6 together with the input adjacency matrices of the MaxComp approach. (B) The compartment profile and 3D visualization of structure 35, 115, 255, 361, 443 of H1-hESC Chr6 colored by compartments (red in compartment A and blue in compartment B) from both experiment and the MaxComp prediction showed together with the nucleus envelope. (C) Predicted speckles (green) showed together with the single-cell examples colored compartments labeled by MaxComp and the Hi-C-based PCA.
MaxComp-predicted single-cell compartments reconstitute ensemble Hi-C compartment profiles
We first validate our MaxComp predictions by comparing the compartment frequency of each chromatin region across cells (Methods) with the PC1 values obtained from ensemble Hi-C matrices through principal component analysis (Fig 3A). The absolute value of a chromatin region’s compartment frequency of a given chromatin region reflects the fraction of times it is consistently assigned to either the A or B compartment across all single cells (Methods). We observed high correlations (Pearson ’s correlation >= 0.9) between predicted normalized compartment frequencies and PC1 values for all studied chromosomes (Fig 3A,B,C). These results demonstrate that population-averaged single-cell chromosome compartment annotations from MaxComp closely reconstitute ensemble-level compartment profiles derived from Hi-C, providing independent validation [1,3]. Importantly, this agreement arises despite MaxComp relying solely on geometric features of 3D structures, without using ensemble Hi-C contact data or PCA (Fig 2B).
(A) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 500 modeled structures of H1-hESC Chr6. (B) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 500 modeled structures of H1-hESC Chr10. (C) Scatter plot between the normalized compartment frequencies of MaxComp and the PC1 values together with the Pearson’s correlation coefficient between the two samples on each chromosome. (D) Scatter plots of MaxComp single-cell compartment variabilities against PC1 values from the Hi-C-based PCA annotations for each chromosome. Each dot represents a genomic region colored by its value of cell-to-cell radial position variability in their radial positions (Left) or cell-to-cell speckle distance variability (Right). (E) Illustration of A-associated locus and B-associated locus acting as anchors at speckles and envelope during chromatin folding. (F) Comparison of compartment profile (p-value = 8.84e-139 and 4.74e-140), compartment variability (p-value>0.05), radial position variability (p-value>0.01) and speckle distance variability (p-value = 4.36e-59 and 5.21e-61) between top 20% PC1 locus and bottom 20% PC1 locus of H1-hESC Chr6 and Chr10. (G) Comparison of speckle distances (p-value = 1.37e-55 and 3.64e-152), lamina distances (p-value = 3.24e-63 and 1.57e-60), radial position (p-value = 3.24e-63 and 1.57e-60), radius of gyration (p-value = 3.12e-44 and 4.71e-50) and interchromosomal contacts (p-value = 2.87e-53 and 3.02e-73) between compartment A beads and compartment B beads on 500 structures of H1-hESC Chr6 and Chr10. (H) Comparison of compartmentalization scores between the Hi-C-based PCA annotation, the MaxComp prediction, the distance-based PCA annotation and the CG phasing prediction for each structure from H1-hESC Chr6 and Chr10 showed together with the corresponding violin plots and illustration.
We observe that chromatin regions with intermediate ensemble Hi-C PC1 values (40–60 percentile; i.e., middle PC1 quintiles) show relatively low absolute compartment frequencies (close to 0) and thus show a significantly higher variability in their compartment assignments across cells (Methods) (Fig 3D left panel, Fig 3E) than regions with large absolute PC1 values (p-value = 3.47e-09 and 3.12e-09 for Chr6 and p-value = 8.66e-22 and 4.19e-28 for Chr10 Fig 3F). Notably, these regions with intermediate PC1 values also show significantly greater variability in their radial positions in the nucleus than regions with high absolute PC1 values (p-value = 8.35e-04 and 7.30e-04 for Chr10, Fig 3D radial position variability panel). These observations suggest that such regions may have higher transcriptional heterogeneity between cells than regions with consistently high absolute PC1 values, which tend to have the same dominant compartment assignments in the large majority of cells [30]. However, this remains a speculative interpretation at this point.
Therefore, regions with the highest and lowest PC1 value quintiles show the highest absolute compartment frequencies (Fig 3D, left panels) and thus, the lowest compartment variabilities between cells than regions with intermediate PC1 value quintiles (p-value = 1.47e-03 and 1.86e-03 for Chr6 and p-value = 6.23e-14 and 3.91e-09 for Chr10, Compartment Variability panel in Fig 3F) (Fig 3D, left panels). These regions show also lower variability in their radial positions in the nucleus between cells and tend to be located either at the nuclear exterior lamina compartment or in the nuclear interior close to nuclear speckles—supporting previous findings that such regions serve as structural anchors for genome organization [30] (Fig 3D) (radial position variability panel in Fig 3F). For instance, regions with highest normalized A-compartment frequencies show generally lower speckle distance variability and high affinity to nuclear speckles (Fig 3D) (speckle distance variability panel in Fig 3F).
Next, we evaluate the robustness of the MaxComp pipeline across varying population sizes ranging from 10 to 500 single-cell chromosome structures. We found that as few as 200 structures are sufficient to yield a Pearson’s correlation of at least 0.90 between MaxComp-predicted normalized compartment frequency profiles and ensemble Hi-C-derived compartment profiles (S2(ABC) Fig). Even with only 10 structures the Pearson’s correlation remains moderate at 0.64 (S2(ABC) Fig). However, larger population sizes show the best performance and overall smoother normalized average compartment frequency profiles.
In summary, these results demonstrate that single-cell compartments can be determined from 3D structural information alone, and that MaxComp performs robustly across different chromosomes.
Single-cell compartment predictions are consistent with expected chromatin structure properties
We further assess our single-cell compartment predictions by measuring several structural properties of chromatin with predicted A and B compartments, including nuclear radial positions, chromatin fiber condensation, and distances to nuclear bodies.
Our analysis confirms several expected trends for euchromatic A and heterochromatin B compartment chromatin: First, we find chromatin predicted in A compartment to have significantly larger radius of gyration (p-value = 3.12e-44 for Chr6 and 4.71e-50 for Chr10), meaning that these regions are less condensed than B compartment chromatin (Radius of gyration panel in Fig 3G). Also, compartment A regions have a significantly higher number of interchromosomal contacts in single cells than compartment B regions (p-value = 2.87e-53 for Chr6 and 3.02e-73 for Chr10). Thus, A compartment chromatin in single cells is more frequently located at the exterior of chromosome territories (Fig 3G).
Also, chromatin in the A compartment are located more interior in the nucleus with smaller radial positions than those in the B compartment (p-value <= 1.57e-60 for Chr6,10) and subsequently smaller speckle distances (p-value <= 1.37e-55 for Chr6,10) (Fig 3G).
These trends hold across additional analyzed chromosomes (e.g., Chr8, Chr12, Chr15, Chr18), which show very similar results and maintain high concordance with Hi-C-based PC1 profiles (Pearson’s correlation >= 0.9 between predicted chromosome frequencies and Hi-C based PC1 profiles) (S3(ABC) Fig and S4(ABC) Fig).
Single-cell compartments show high compartmentalization scores
To further validate our results, we calculate a compartmentalization score based on chromatin-chromatin contacts in single cell structures. The contact-compartmentalization score (CCS) is defined as defined as the fraction of chromatin-chromatin contacts that occur within the same compartment (intra-compartment contacts), relative to the total number of contacts regardless of compartment assignment (Methods) (Fig 3H). This value is averaged over all single-cell chromosome structures. A higher CCS score indicates a stronger spatial segregation between A and B compartments, representing a more favorable and well-defined compartment state. We compare the contact compartmentalization scores from MaxComp with three alternative methods, namely the aforementioned ensemble Hi-C-based PCA analysis [2,3,8], a distance-based PCA analysis [4,35] and CG phasing [22] (Methods). The distance-based PCA method (DM) uses the average distance matrix calculated from all single-cell chromosome structures and applies principal component analysis to define A/B compartment annotations. Similar to Hi-C-based PCA analysis, the resulting compartment annotations are then assigned uniformly across all single-cell chromosome structures. The CG phasing method assigns a target region to the A compartment based on the total amount of CG DNA content from all chromatin regions within its 3D spatial neighborhood.
MaxComp predictions achieve the highest average CCS compartment score among the three methods, outperforming ensemble Hi-C-based PCA (HiC) and distance-based PCA (DM) compartment predictions (Fig 3H). Additionally, we also tested a compartment score based on distances rather than of chromatin-chromatin contacts. The distance-compartmentalization score (DCS) is defined as the ratio of the average distances between chromatin regions within the same compartment and the average distances between all chromatin regions, regardless of their compartment annotations (Methods) (Fig 3H). Here, a smaller DCS score indicates a more favorable spatial compartment segregation. Our results show that MaxComp compartment annotations produce a better spatial segregation between the two compartments at single-cell level, as evidenced by the smallest DCS score compared to the other methods (Fig 3H).
Applying MaxComp to multiplexed FISH imaging datasets reveals the relationship between single-cell compartments and transcription signals
Next, we evaluate MaxComp on chromosome structures imaged by integrated multiplexed FISH experiments [7,8]. These chromosome tracing experiments provide chromosome structures at considerably sparser coverage. For instance, chromosomes of human IMR90 cells are imaged at 3Mb step size by DNA MERFISH [7], while chromosomes in mESC cells are imaged at 1Mb step size in DNAseqFISH+ [8]. By combining multiplexed FISH chromosome tracing with immunofluorescence imaging these methods can also detect the locations of nuclear speckles and nucleoli in the same cells. Moreover, the datasets are also integrated with RNA-MERFISH and RNAseqFISH + , providing information about active transcription of specific genes in the same imaged cells [7,8].
We tested our method to 7,000 structures of chromosome 6 and chromosome 10 from IMR90 cells, imaged using DNA MERFISH [7]. Chromosome 6 is represented by a total of 55 imaged loci. Despite the relatively sparse coverage, MaxComp performs well in predicting A/B compartment annotations. For instance, the averaged single-cell normalized compartment frequency profiles predicted by MaxComp show high correlations with the ensemble Hi-C-based PCA compartment profiles (Pearson ’s correlation >= 0.8) (Fig 4A,B,C). We also observe lower compartment variability for chromatin regions with high or low PC1 values, derived from independent Hi-C data analysis [2,3,8] (Fig 4D). Also, A compartment chromatin shows significantly smaller speckle distances (p-value = 0.0 for Chr6 and 0.0 for Chr10) and larger distances to the nuclear lamina (p-value = 1.58e-102 for Chr6 and 2.20e-119 for Chr10) compared to B compartment chromatin (Fig 4E).
(A) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 7,000 DNA-MERFISH structures [7] of IMR90 Chr6. (B) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 7,000 DNA-MERFISH structures of IMR90 Chr10. (C) Scatter plot between the normalized compartment frequencies of MaxComp and the PC1 values together with the Pearson’s correlation coefficient between the two samples on each chromosome. (D) Scatter plot of compartment variabilities against PC1 values from the Hi-C-based PCA annotations on each chromosome. Each dot represents a genomic region colored by its value of speckle distance variability. (E) Comparison of speckle distances (p-value = 0.0 and 0.0), lamina distances (p-value = 1.58e-102 and 2.20e-119) and numbers of imaged nascent transcripts (p-value = 6.23e-06 and 4.76e-60) between compartment A beads and compartment B beads on 7,000 DNA-MERFISH structures of IMR90 Chr6 and Chr10. (F) Comparison of compartmentalization scores between the Hi-C-based PCA annotation, the MaxComp prediction, the distance-based PCA annotation and the CG phasing prediction for each structure on IMR90 Chr6 and Chr10 showed together with the corresponding violin plots. (G) The 3D visualization of 6 selected DNA-MERFISH structures of IMR90 Chr6 colored by compartments (red in compartment A and blue in compartment B) from both ensemble PC1 and Max-cut prediction.
A benefit of chromosome tracing by DNA MERFISH is the ability to concurrently measure nascent gene transcription for a selected group of genes in the same cell by RNA MERFISH imaging [7]. Interestingly, we find for all tested genes a higher nascent transcription signal (i.e., number of imaged nascent transcripts in a cell) in those structures where the gene is predicted to be in the A compartment in comparison to cells where the same gene locus is predicted to be in the B compartment (p-value = 6.23e-06 for Chr6 and 4.76e-60 for Chr10) (Fig 4E). Overall genes in the A compartment are more likely associated with active transcription, supporting studies about the role of nuclear compartmentalization for gene transcription [36]. However, our results also indicate that transcription can also occur for genes in the B compartment, in support of recent studies using RD-SPRITE, which simultaneously maps 3D genome structure and nascent RNA transcription genome-wide [37] (S6(A) Fig). Overall, the average transcription profile (i.e., number of imaged transcription spots averaged across the cell population) shows good correlation with the predicted A/B compartment profile as well as with the gene density profile (S6(BC) Fig). Similar results are also found for other chromosomes, such as chromosome 10 (S7(ABC) Fig).
Finally, we find both compartmentalization scores, CCS and DCS, are significantly better for compartments predicted by MaxComp compared to those predicted by the distance based PCA (DM), single-cell CG phasing method, or ensemble based Hi-C PCA (Hi-C) (Fig 4F), confirming our previous observations.
Next, we applied MaxComp to chromosome tracing data from DNA SeqFISH+ of the mESC cell line [8]. These structures were imaged genome-wide at 1Mb coverage. Also here, we found similar results with high Pearson’s correlations >= 0.8 between PCA-based compartment profiles from ensemble Hi-C and MaxComp predicted normalized compartment frequency profiles in both studied chromosomes. Also here, the CCS and DCS compartmentalization scores show good spatial segregation for the MaxComp compartments (Fig 5A,B,C,D,E). The CG method shows similar performance to MaxComp, although with a lower contact-based compartmentalization score. However, MaxComp performs substantially better when calculating the distance-based compartmentalization score (Fig 5E).
+ compartments by MaxComp and its comparison with other methods on mESC Chr5 and Chr15 (A) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 886 SeqFISH+ structures [8] of mESC Chr5. (B) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp on 884 SeqFISH+ structures of mESC Chr15. (C) Scatter plot between the normalized compartment frequencies of MaxComp and the PC1 values together with the Pearson’s correlation coefficient between the two samples on each chromosome. (D) Scatter plot of compartment variabilities against PC1 values from the Hi-C-based PCA annotations on each chromosome. Each dot represents a genomic region colored by its value of speckle distance variability. (E) Comparison of compartmentalization scores between the Hi-C-based PCA annotation, the MaxComp prediction, the distance-based PCA annotation and the CG phasing prediction for each structure on mESC Chr5 and Chr15 showed together with the corresponding violin plots. (F) The 3D visualization of 6 selected SeqFISH+ structures of mESC Chr5 colored by compartments (red in compartment A and blue in compartment B) from both experiment and the MaxComp prediction. (G) Log fold change of the average transcription level (number of mRNA transcript spots detected) of cells with both copies labeled with A (A cells) against the level of cells with both copies labeled with B (B cells) for each gene.
We also analyzed transcription data for 42 genes, whose nascent transcription levels have been determined in single cells [8]. To analyze these data, we divide the structures in two groups: first, structures where both homologous gene copies are predicted to be in the A compartment (A cells) and second, those structures where both gene copies were predicted to be in the B compartment (B cells). For each of these cells the nascent transcription level (number of mRNA transcript spots detected) for each of the 42 genes was measured by RNAseqFISH+ [8]. For each gene, we then calculated the transcription ratio as the average nascent transcription level for the genes in A cells over the average transcription level of the same gene in B cells (Methods). We found that 67% of all measured genes have positive transcription ratio, indicating that genes in compartment A (A cells) are expressed at higher levels in structures where the gene is predicted to be than those in compartment B (B cells) (Fig 5G). Additionally, the distributions of normalized transcription levels for all 42 genes in A compartment are statistically significantly different from those in the B compartment (paired t-test p-value < 0.05) (S8 Fig).
Discussion
Experimental evidence from Hi-C and imaging studies shows that chromatin segregates spatially into at least two distinct functional compartments. Compartment A is associated with open, transcriptionally active euchromatin while compartment B corresponds to more condensed heterochromatin. The spatial segregation manifests in Hi-C contact frequency maps as a checkerboard-like pattern, reflecting preferential interactions of chromatin within the same compartment and reduced interactions between opposing compartments. Principal component analysis of ensemble Hi-C contact frequency maps has been widely used to classify chromatin into A and B compartments, based on the first principal component (PC1) values. However, because this analysis is based on averaged data across thousands of cells, it cannot reveal compartment information at the single-cell level. To better understand the underlying principles of chromatin compartment formation, it is important to detect compartments at single-cell level, and study if and how chromatin compartments vary between individual cells.
To address this problem, we developed MaxComp, a graph-theory-based method that determines chromatin compartments in single cells using only geometric properties from 3D chromosome structures. By representing each chromosome structure as an undirected graph, MaxComp optimally partitions the graph into two compartment subgraphs, such that the resulting single-cell A/B compartment annotations across the cell population yield compartment frequency profiles that closely match those derived from principal component analysis of ensemble Hi-C. Furthermore, chromatin regions assigned to single-cell A and B compartments exhibit distinct structural characteristics—such as radial positioning and local chromatin condensation (measured by the radius of gyration)—that are consistent with the expected differences between euchromatin and heterochromatin.
We demonstrate that our method is robust across different structural resolutions and can be successfully applied to low-coverage chromosome structures obtained from DNA MERFISH and DNAseqFISH+ tracing experiments. When combined with nascent mRNA imaging, our analysis shows that most genes predicted to be part of the A compartment consistently show higher transcriptional activity compared to the same genes located in the B compartment. These results demonstrate that single-cell compartment annotations derived solely from geometric features of 3D chromosome structures are sufficient to capture biologically meaningful compartment organization at the single-cell level.
Our analysis shows that the cell-to-cell variability in single-cell compartment annotations is strongly correlated with the ensemble-Hi-C PC1 values. Chromatin regions with high absolute PC1 values show low compartment fluctuations. Instead, chromatin regions with intermediate PC1 values (i.e., small absolute values) show the highest fluctuations in compartment annotations between individual cells. Interestingly, these are also chromatin regions that show relatively high variability in their radial positions between individual cells. However, the high compartment fluctuations of these specific chromatin regions may also indicate a limitation of the two-state compartment definition and that these regions may not be functionally clustered into either A or B compartments.
Moreover, our single-cell compartment annotations show good performance against other prediction methods, yielding higher compartmentalization scores and thus higher spatial segregation of A and B compartment chromatin.
Overall, our study emphasizes the importance of defining compartments at single-cell level, which can be achieved by using only geometric considerations in 3D chromosome structures. These structures can be derived from structure modeling or chromosome tracing experiments.
Biologically, our method offers unique opportunities to relate single-cell structural features with distinct functional properties, for instance, enabling more precise identification of transcriptionally active regions in individual cells, not achievable from ensemble-based compartment analysis. Furthermore, our results can provide insights into the heterogeneity of gene expression across single cells, which could play an important role in cell differentiation. Using imaging data in complex tissues, our single-cell compartment predictions may help explore whether compartment states change along continuous spatial trajectories across a tissue.
Importantly, MaxComp is an unsupervised, de novo approach that does not rely on pre-trained models or ensemble-based labels and is broadly applicable across chromosome structures with varying coverage and resolution. In the future, higher-resolution structures from imaging are needed to further assess the quality of compartment predictions. Currently, our method relies exclusively on intra-chromosomal distances to infer single-cell compartments, as it is designed to analyze individual chromosomes. Incorporating inter-chromosomal interactions into this framework is a promising direction for future work, with the potential to enhance prediction accuracy and extend the method’s applicability to genome-wide single-cell compartment profiling.
Materials and methods
Definition of Max Cut
Given an undirected graph , with
as vertices of the graph and
as edges connecting vertices. A cut in
is a defined as a subset of the graph
. Let
, then the Max-cut problem is finding the cut
such that
the sum of weights of edges connecting set
and set
is maximized. Because the Max-cut problem is NP-hard, multiple approximation algorithms exist, including greedy algorithms and local search. The efficiency of an approximation algorithm can be defined by the approximation ratio:
where is the result of the approximate algorithm
and
is the optimal result of the problem. Sahni and Gonzalez [31] developed an approximation algorithm achieving
, while other approximations reach slightly higher approximation ratios [27]. The approximation ratio can be substantially improved by formulating the original problem as a quadratic programming problem [26,27].
Quadratic programming problem
The relaxation and reformulation of the problem follows the approach by Goemans and Williamson [26,27]. Since graph cutting is equivalent to partitioning its nodes into two groups, we use an indicator variable to represent the group assignment of node
. The objective is to maximize the total weight of edges connecting nodes within both group partitions. This leads to a quadratic programming formulation of the Max-Cut problem:
Once the optimal indicators are determined, nodes can be partitioned into two groups based on their assignments:
Vector programming problem
Instead of representing each node with a one-dimensional indicator, we relax the label of node to a vector
. The Max-Cut problem can then be relaxed as:
This semidefinite relaxation enables efficient approximation through convex optimization techniques.
Semidefinite programming problem
In the vector programming formulation, each dot product can be represented as
. Defining a matrix
, we can rewrite the problem as:
Since
, letting
, we have
. The matrix
is symmetric by the commutative properties of the dot product. Moreover, for any nonzero vector
,
implying that is positive semidefinite.
We can reformulate the Max-cut problem as a semidefinite programming (SDP) problem:
symmetric and positive semidefinite
where is the graph Laplacian matrix,
is the adjacency matrix,
is the degree matrix and
is the Frobenius inner product between matrix
and
.
Formulation of the compartment prediction problem
To predict chromatin compartments from single-cell chromosome structures, we aim for chromatin regions (i.e., beads) in compartment A to preferentially contact other beads in compartment A more frequently, and likewise for beads in compartment B. That is, intra-compartment contacts should be maximized, and inter-compartment contacts minimized. We represent each single-cell chromosome structure as a graph where nodes represent beads and edge weights between every two nodes are derived from a combination of their spatial distance and their relative distances from nuclear bodies, capturing aspects of nuclear architecture (see section “Transforming 3D structures to graphs”).
Using a semidefinite programming framework, we want to maximize the sum of weights for inter-compartment edges, which contribute , and minimize the sum of weights of intra-compartment edges connecting nodes from the same compartment, which contribute
. Then the original problem can be formulated as:
where and
are factors to balance inter-compartment and intra-compartment information with
. The programming problem in matrix form can be formulated as:
subject to , and
is symmetric and positive semidefinite
is the unit matrix. Since
is a constant that doesn’t contain unknown variables, we conclude that the objective function and the constraints of the original problem to find single-cell compartments is the same as the regular Max-cut problem. We implement this SDP using the python package cvxpy [38] and cvxopt https://github.com/cvxopt/cvxopt, with the SCS solver [39], to compute the resulting matrix
.
Decomposition by approximation and random projection
Since , we can recover the vector matrix
from
using Cholesky decomposition. However, Cholesky requires
to be strictly positive semidefinite, which may not hold due to the convergence threshold in the optimization. Instead, we apply LDL decomposition on to generate a symmetric matrix
, which is very close to a positive semidefinite matrix:
where is a lower unit triangular matrix,
is a diagonal matrix with diagonal entries
. If
is an approximately positive semidefinite matrix,
may contain some negative entries with small values. To enforce non-negativity, we set
if
and
elsewhere to obtain a new diagonal matrix
. We define the square root matrix of
as:
Since we know , we can generate a new matrix
by:
Hence, this yields a Cholesky-like decomposition by approximating the diagonal entries of and formulate the final vectors by:
so that
The difference between and
can be evaluated by Euclidean norm of their difference and reverse triangular inequality:
According to the property of Euclidean norm, we have for any matrix
:
While the difference between and
is measured by Euclidean norm of their difference and triangular inequality:
Overall, we have the following relationship:
where denotes the approximation error between
and
. Hence, we have proven that if we can make the approximated matrix
as close to matrix
as possible and reduce the error as much as possible, the resulting matrix
generated from decomposition will also be very close to matrix
. Let
, where each
indicates each node
embedding in hyperspace. To partition nodes into two groups, we apply a random hyperplane with normal vector
, to cut the hyperspace into two sub-hyperspaces, where all vectors are divided into two groups with positive or negative dot products
and
. Using the two groups to form a vector
, where
if
and
if
, we calculate the objective value of the Max-cut problem by:
To obtain the best cut, we choose to apply random rounding multiple times until the objective value exceeds a threshold. Goemans and Williamson prove that the approximation ratio of random projection is about 0.878 which improves the performance substantially over other approximations [26,27]. Accordingly, we set the threshold to be , where
indicates the optimal result.
Transforming 3D structures to graphs
We generated a population of H1-hESC 3D genome structures with our integrative genome modeling (IGM) platform [25] using data from Hi-C [3], DamID [33] and SPRITE [34] experiments. We generated 1,000 diploid whole-genome structures containing 3D coordinates of 30,332 200 kb regions in each cell. Each chromosome structure in each cell is then embedded as a graph with each chromatin region as a node, and edges connecting nodes with weights derived from a combination of the spatial distance between two genomic regions and their distances from nuclear bodies, capturing aspects of nuclear architecture.
The compartment prediction depends on how we construct these graphs from 3D genome structures by choosing proper edge weights. Firstly, the graph is undirected. Secondly, since two chromatin regions with closer spatial distance have generally higher contact probabilities, we assign smaller edge weights between nodes if their spatial distances in 3D are smaller than than expected 3D distance by their sequence distance alone. Varoquaux et al [40] demonstrates that the expected spatial distance between two regions
,
and the difference in their sequence position
follows
at small ranges, with
,
as the sequence position of chromatin regions i and j in a chromosome.
The observed spatial distances between pairs of chromatin regions in a genome structure are calculated using the Euclidean distance between their 3D coordinates:
where and
are 3D coordinates of region
and region
. 3D coordinates are obtained from our genome structure modeling or derived from multiplexed FISH imaging (e.g., DNA MERFISH or SeqFISH+ experiments).
We consider the ratio of observed spatial distance against expected spatial distance as part of the edge weight so that chromatin regions at longer distances than expected are more likely to be classified into different compartments.
Nuclear speckles act as hubs for pre-mRNA processing and highly transcribed genes are often located in close proximity to speckles. Thus, distances to nuclear speckles can act as potential indicators for active chromatin compartments. Hence, we incorporate a scaling factor to define edge weights that contribute the similarity of two chromatin regions
and
with respect to their distances to their closest nuclear speckles.
and
define the z-score normalized distances between chromatin regions i and j and their nearest nuclear speckles. Therefore, edge weights between two chromatin regions are further normalized by the similarity of their speckle affinity.
The distance between a chromatin region and its nearest speckle is obtained from the genome structure models or from DNA-MERFISH datasets (3 Mb resolution) [7].
Structures from SeqFISH+ imaging datasets (1 Mb resolution) [8] provide only intensities of imaged speckle marker antibodies at locations of certain loci. We estimated the speckle distances by assuming speckle intensity decays at a quadratic rate:
where is the imaged speckle intensity at loci
.
The weight of an edge between nodes
and node
is then defined by the product of the following factors:
where and
are the 3D coordinates,
and
are the genomic positions,
and
are the z-scores of speckle distances of chromatin regions
and
.
Because compartments are more likely to be determined by local structural relationships, edges between nodes and node
are removed when their distance is above a threshold, specifically when
, where
is the excluded volume of a sphere representing chromatin regions of 200kb sequence length. For DNA-MERFISH and SeqFISH+ structures, we set
. Finally, min-max normalization is applied to the edge weights so that all weights range between 0 and 1:
The matrix will be the final adjacency matrix of the graph, which is further used to generate the Laplacian matrix of the graph.
Due to the large size of the graph and the long running time, we select 500 copies of each studied chromosome from our structure population to perform the analysis.
Prediction of compartments
We apply our algorithm to each chromosome structure graph individually to calculate compartments for every single-cell structure. Given two sets nodes sets, and
, generated by the MaxComp algorithm, we set the active compartment
and inactive compartment
if the average speckle distance of all nodes in
is smaller than that of
; otherwise
and
.
Compartment profile vector
A compartment profile vector for a given chromosome structure
is calculated based on the prediction
and
. We set
if region
in structure
is assigned to the A compartment otherwise
.
Ensemble compartment vector
We define the ensemble compartment vector as , where each element
represents the fraction of times in the population where region
is assigned to the A compartment. The ensemble compartment vector is therefore the sum of all compartment profile vectors across all chromosome structures divided by the total number of structures.
Compartment frequency
The compartment frequency of a genomic region
is defined as:
where is an indicator function equal to 1 if
, and 0 otherwise. The absolute value of
describes the fraction of times a genomic region
is predicted to be in its majority compartment across all structures. Positive values indicate the compartment frequency of region
in compartment A, when the region is in compartment A in most of the structures. Negative values indicate the absolute value of compartment frequency of region
in compartment B, when the region is in the B compartment in the majority of structures. Subsequently, the compartment frequency profile for a chromosome is defined as:
This vector can be directly compared to the PC1 profile from ensemble Hi-C data.
Normalized compartment frequency
The normalized compartment frequency of a genomic region
is defined as:
The normalized compartment frequency is used to calculate the correlation with PC1 values derived from ensemble Hi-C-based compartment predictions.
Compartment variability
The compartment variability for chromatin region
is calculated as the standard deviation of its compartment profile value across all
structures of the population:
where is either 1 indicating that region
is in compartment A in structure
or 0 indicating that the same region
is in compartment B.
is the average value of
of region
across all
structures:
. The larger the value of
is, the more variable the compartment annotations of region
are across the population of structures.
Structural features
Radial position (RAD).
The radial position of a chromatin region in structure
in a spherical nucleus is calculated as:
where is the the 3D coordinates of bead
in structure s, and
is the nucleus radius.
indicates the region
is at the nuclear center while
means it is at the nuclear surface. The radial position variability (δRAD) of region
in the population is calculated as:
where is the standard deviation of the population of radial positions of region
and
is the mean standard deviation calculated from all regions within the same chromosome of the target region.
Radius of gyration (RG).
The local compaction of the chromatin fiber at the location of a given locus is estimated by the radius of gyration for a 1 Mb region centered at the locus. To estimate the values along an entire chromosome we use a sliding window approach over all chromatin regions in a chromosome. The radius of gyration for a 1 Mb region centered at locus in structure
, is calculated as:
where is the distance between the chromatin region
to the center of mass of the 1-Mb region.
Distances to nuclear bodies.
Distances for each chromatin region to various nuclear bodies—including speckle distance (SpD) and lamina distance (LmD)—are calculated by measuring the distance between the surface of each chromatin region
to the nearest speckle and lamina [30]. In each 3D genome structure model, speckle locations are estimated by the geometric centers of highly connected interaction subgraphs of chromatin regions with the top 10% SON TSA-seq signals following a procedure described in [30], while the locations of lamina is identified as the nuclear boundary. For DNA MERFISH data, SpD and LmD are directly obtained from the datasets [7]. The speckle variability (δSpD) of region
is calculated as the standard deviation of its speckle distances across the population of structures.
Interchromosomal contacts.
The calculation of inter-chromosomal contacts is similar to the calculation of contact frequency matrix but is based on a larger contact range. For a given 200kb region, its interchromosomal contacts is the total number of contacts with any target inter-chromosomal regions from the same genome structure within range .
Compartmentalization score
We define the contact-compartmentalization score (CCS), denoted as , as the ratio of total intra-compartment contacts to all contacts in a structure (intra-compartment contacts + inter-compartment contacts):
where and
are the numbers of unique intra- and inter-compartment contacts, respectively. A contact is defined when the spatial distance between two regions is less than or equal to
for modelled structures and SeqFISH+ coordinates [8] and
for DNA-MERFISH coordinates [7], accounting for different locus resolutions.
The CCS score allows comparison of different compartment annotations (e.g., from MaxComp or other methods). A higher CCS indicates stronger intra-compartment connectivity and thus a more accurate partition.
Similarly, we define the distance-compartmentalization score (DCS), denoted , as:
where and
are the sums of intra- and inter-compartment distances, respectively. A lower
indicates more compact intra-compartment regions compared to inter-compartment ones, reflecting better chromatin subcompartment segregation.
Preprocessing of imaging tracing datasets
For the imaging dataset, 7,000 DNA-MERFISH copies of chromosome 6 and chromosome 10 from the IMR90 cell line are obtained from Su et al [7] together with their corresponding speckle distances, lamina distances, nucleoli distances and transcription profiles with transcription on or off (nascent transcript imaged or not) for genes measured by RNA MERFISH. All datasets are preprocessed by linear interpolation to remove missing values. Structural information of DNAseqFISH+ including coordinates, speckle densities and transcription information containing the number of detected spots corresponding to mRNA transcript for more than 40 genes from 444 cells (888 copies) of the mESC cell line are obtained from Takei et al [8]. We preprocess the datasets to generate reasonable speckle locations for each cell using experimental SON TSA-seq following an approach described in [30]. Similarly, all datasets are preprocessed by linear interpolation to remove missing values. To avoid the impact of zero transcription, we remove the bottom 5% cells in the number of transcription spots for each gene when performing transcription ratio and paired change analysis. The genes are mapped to genomic regions nearest to their promoters in the reference genome [41]. For each gene, we first divide the cells into different groups, where A cells (B cells) indicate the corresponding locus on both copies is predicted to be in the compartment A (B). Then its transcription ratio is calculated by the log fold change of the average transcription level (number of mRNA transcript spots detected) in A cells
over the average transcription level in B cells
:
Normalized and
are calculated in the same way for each gene after min-max normalization on the population-wide transcription levels to conduct paired comparison with paired t-test. To compare with gene density, we calculate the total number of genes from the UCSC Genome Browser RefSeq [41] are located within each imaged loci and construct a gene density profile for each studied chromosome.
Hi-C-based PCA
The Hi-C-based PCA profile are obtained from the in-situ Hi-C dataset for H1-hESC (4DNESX75DD7R) [3], which are directly calculated by the largest principal component (the eigenvector corresponds to the largest eigenvalue) of the covariance matrix from the experimental ensemble Hi-C. For comparison, the PC1 values are mapped and averaged with regards to the nearest 200kb bins from the model. We measure the Pearson’s correlation coefficient
between the non-zero values from the predicted normalized compartment frequency vector
and the experimental profile
for each studied chromosome. The Hi-C-based PCA profile
are obtained from Rao et al for IMR90 (4DNESSM1H92K) [2] and Takei et al [8] for mESC. Similarly, we use the nearest PC1 values for each imaged loci for comparison with predicted profiles.
Distance-based PCA
Principal component analysis is mathematically performed by eigenvector decomposition on the input matrix, which can not only be applied on contact matrices, but also be adapted to pairwise distance analysis. The approach has been previously used by imaging related studies such as Wang et al [4] and Sawh et al [35]. We first normalize the mean distance matrix by pairwise genomic distances through fitted power-law function, and then calculate the pairwise Pearson’s correlation matrix between every row and column pair. Using it as the covariance matrix for PCA analysis, the resulting vector corresponding to the largest principal component with positive and negative entries can be used to generate compartments that are comparable with annotations from other methods.
CG phasing
Another frequently used approach is phasing by genomic information such as CpG density or CG content [22], where we have prior knowledge that high CG contents correspond to active compartments. We obtain CG contents for both hg38 and mm10 reference genomes from the UCSC genome browser [41]. For each locus, we calculate the mean CG contents from all loci (including itself) within its neighborhood (250 nm). The resulting vector measures the smoothed CG contents at single-cell levels, where the higher the value is the more likely the chromatin region belongs to A compartment. Eventually we may calculate the log fold change against the average to get A/B annotations by positive or negative signs as what we have explained in the MaxComp prediction.
Structure visualization
All chromosome structures together with nuclear envelopes and speckles are visualized by UCSF Chimera [42].
Computational requirements
The time and memory requirements for MaxComp depends on the population size and the chromosome size. For a modeled human chromosome 6 which is 171 Mb and at 200 kb resolution forms an input graph with 855 nodes, the algorithm takes on average 30–40 minutes to generate a single-cell compartment profile. In total less than 50GB memory is required to calculate and store 500 single-cell compartment profiles. When using the imaging dataset at 3 Mb resolution which forms an input graph with 55 nodes, the running time reduces to less than 1 minute. For the other chromosomes, the fewer the chromatin regions are, the shorter time and less memory MaxComp will require.
For IGM modeling, the generated population-based models consist of 1,000 genome copies from Boninsegna et al. [2] requires between 10–15 hours of parallel computation using 250 nodes, with a total memory of 4 GB for the controller and 2 GB per processor. The details about IGM requirements are available at https://github.com/alberlab/igm/.
Supporting information
S1 Fig. Correlations between average distances and normalized genomic distances
(A) Scatter plots of average distances from DNA-MERFISH Chr6 and Chr10 [7] against the corresponding normalized genomic distances showed together with the fitted quadratic curves and the Pearson’s correlation coefficients. (B) Scatter plots of average distances from SeqFISH+ Chr5 and Chr15 [8] against the corresponding normalized genomic distances showed together with the fitted quadratic curves and the Pearson’s correlation coefficients.
https://doi.org/10.1371/journal.pcbi.1013114.s001
(TIF)
S2 Fig. Comparison between population sizes on predicted profiles and their correlations
(A) The experimental profile obtained from principal component analysis on ensemble Hi-C matrix. (B) The predicted normalized compartment frequencies by MaxComp on populations of modeled Chr6 structures with different sizes. (C) Scatter plots between PC1 values from the Hi-C-based PCA and various predicted compartment profiles showed with Pearson’s correlation coefficients. We observe increased value in the coefficient as population size grows larger.
https://doi.org/10.1371/journal.pcbi.1013114.s002
(TIF)
S3 Fig. Prediction of model compartments by MaxComp and its comparison with the ground truth on H1-hESC Chr15 and Chr18
(A) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp of 500 modeled structures of Chr15. (B) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp of 500 modeled structures of Chr18. (C) Scatter plot between the normalized compartment frequencies of MaxComp and the PC1 values showed together with the Pearson’s correlation coefficient between the two samples on each chromosome. (D) Comparison of speckle distances (p-value = 1.84e-153 and 9.64e-102), lamina distances (p-value = 1.03e-27 and 2.35e-56), radial positions (p-value = 1.03e-27 and 2.35e-56), radius of gyration (p-value = 9.54e-41 and 2.25e-54) and interchromosomal contacts (p-value = 1.89e-48 and 2.55e-76) between compartment A beads and compartment B beads on the population of structures of Chr15 and Chr18.
https://doi.org/10.1371/journal.pcbi.1013114.s003
(TIF)
S4 Fig. Prediction of model compartments by MaxComp and its comparison with the ground truth on H1-hESC Chr8 and Chr12
(A) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp of 500 modeled structures of Chr8. (B) The experimental profile obtained from the Hi-C-based principal component analysis and the compartment profile predicted by MaxComp of 500 modeled structures of Chr12. (C) Scatter plot between the normalized compartment frequencies of MaxComp and the PC1 values together with the Pearson’s correlation coefficient between the two samples on each chromosome. (D) Comparison of speckle distances (p-value = 8.61e-70 and 5.32e-121), lamina distances (p-value = 8.53e-46 and 1.13e-35), radial position (p-value = 8.53e-46 and 1.13e-35), radius of gyration (p-value = 2.70e-52 and 2.61e-63) and interchromosomal contacts (p-value = 4.81e-68 and 1.37e-61) between compartment A beads and compartment B beads on the population of structures of Chr8 and Chr12.
https://doi.org/10.1371/journal.pcbi.1013114.s004
(TIF)
S5 Fig. Selected example of model compartments predicted by MaxComp for the whole genome of H1-hESC
(A) The predicted compartments for each chromosome copy from structure 0 of H1-hESC. (B) Selected chromosomes from structure 0 showed together with the envelope indicates compartment A and compartment B are segregated within the envelope. The section through the genome shows compartment A and compartment B are clustered with each other in the inferior region. Predicted speckles are basically associated with compartment A beads rather than compartment B beads.
https://doi.org/10.1371/journal.pcbi.1013114.s005
(TIF)
S6 Fig. Transcription analysis on single-cell DNA-MERFISH Chr6 compartments predicted by MaxComp
(A) Selected examples with more than or equal to 8 locus with active transcriptions of compartment prediction and transcription signals on DNA MERFISH structures [7] (Red bars indicate compartment A, blue bars represent compartment B while gray bars are where transcription is on (nascent transcript is imaged)). (B) Comparison between the gene density from RefSeq, the transcription frequency from DNA MERFISH and the compartment profile predicted by MaxComp. (C) Scatter plots between gene density, transcription frequency and predicted compartment profile showed with Spearman’s correlation coefficients.
https://doi.org/10.1371/journal.pcbi.1013114.s006
(TIF)
S7 Fig. Transcription analysis on single-cell DNA-MERFISH Chr10 compartments predicted by MaxComp
(A) Selected examples with more than or equal to 10 locus with active transcriptions of compartment prediction and transcription signals on DNA MERFISH structures [7] (Red bars indicate compartment A, blue bars represent compartment B while gray bars are where transcription is on (nascent transcript is imaged)). (B) Comparison between the gene density from RefSeq, the transcription frequency from DNA MERFISH and the compartment profile predicted by MaxComp. (C) Scatter plots between gene density, transcription frequency and predicted compartment profile showed with Spearman’s correlation coefficients.
https://doi.org/10.1371/journal.pcbi.1013114.s007
(TIF)
S8 Fig. Comparison between distributions of normalized transcription levels from SeqFISH+ Comparison of normalized transcription levels (numbers of mRNA transcript spots detected) between A cells and B cells for genes measured by SeqFISH+ [8] showed together with the paired t-test
We find most of the genes have increased transcription levels when shifting from state B to state A (showed in brown).
https://doi.org/10.1371/journal.pcbi.1013114.s008
(TIF)
References
- 1. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–93. pmid:19815776
- 2. Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. pmid:25497547
- 3. Akgol Oksuz B, Yang L, Abraham S, Venev SV, Krietenstein N, Parsi KM, et al. Systematic evaluation of chromosome conformation capture assays. Nat Methods. 2021;18(9):1046–55. pmid:34480151
- 4. Wang S, Su J-H, Beliveau BJ, Bintu B, Moffitt JR, Wu C, et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353(6299):598–602. pmid:27445307
- 5. Bintu B, Mateo LJ, Su J-H, Sinnott-Armstrong NA, Parker M, Kinrot S, et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science. 2018;362(6413):eaau1783. pmid:30361340
- 6. Mateo LJ, Murphy SE, Hafner A, Cinquini IS, Walker CA, Boettiger AN. Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature. 2019;568(7750):49–54. pmid:30886393
- 7. Su J-H, Zheng P, Kinrot SS, Bintu B, Zhuang X. Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin. Cell. 2020;182(6):1641-1659.e26. pmid:32822575
- 8. Takei Y, Yun J, Zheng S, Ollikainen N, Pierson N, White J, et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature. 2021;590(7845):344–50. pmid:33505024
- 9. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80. pmid:22495300
- 10. Sanborn AL, Rao SSP, Huang S-C, Durand NC, Huntley MH, Jewett AI, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc Natl Acad Sci U S A. 2015;112(47):E6456-65. pmid:26499245
- 11. Goloborodko A, Marko JF, Mirny LA. Chromosome Compaction by Active Loop Extrusion. Biophys J. 2016;110(10):2162–8. pmid:27224481
- 12. Fudenberg G, Imakaev M, Lu C, Goloborodko A, Abdennur N, Mirny LA. Formation of Chromosomal Domains by Loop Extrusion. Cell Rep. 2016;15(9):2038–49. pmid:27210764
- 13. Fudenberg G, Abdennur N, Imakaev M, Goloborodko A, Mirny LA. Emerging Evidence of Chromosome Folding by Loop Extrusion. Cold Spring Harb Symp Quant Biol. 2017;82:45–55. pmid:29728444
- 14. Nuebler J, Fudenberg G, Imakaev M, Abdennur N, Mirny LA. Chromatin organization by an interplay of loop extrusion and compartmental segregation. Proc Natl Acad Sci U S A. 2018;115(29):E6697–706. pmid:29967174
- 15. Yildirim A, Boninsegna L, Zhan Y, Alber F. Uncovering the Principles of Genome Folding by 3D Chromatin Modeling. Cold Spring Harb Perspect Biol. 2022;14(6):a039693. pmid:34400556
- 16. Boninsegna L, Yildirim A, Zhan Y, Alber F. Integrative approaches in genome structure analysis. Structure. 2022;30(1):24–36. pmid:34963059
- 17.
Zheng S, Thakkar N, Harris HL, Zhang M, Liu S, Gerstein M, et al. Predicting A/B compartments from histone modifications using deep learning. Cold Spring Harbor Laboratory. 2022. https://doi.org/10.1101/2022.04.19.488754
- 18. Fortin J-P, Hansen KD. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 2015;16(1):180. pmid:26316348
- 19. Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature. 2013;502(7469):59–64. pmid:24067610
- 20. Ramani V, Deng X, Qiu R, Gunderson KL, Steemers FJ, Disteche CM, et al. Massively multiplex single-cell Hi-C. Nat Methods. 2017;14(3):263–6. pmid:28135255
- 21. Stevens TJ, Lando D, Basu S, Atkinson LP, Cao Y, Lee SF, et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017;544(7648):59–64. pmid:28289288
- 22. Tan L, Xing D, Chang C-H, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Science. 2018;361(6405):924–8. pmid:30166492
- 23. Zhang R, Zhou T, Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat Biotechnol. 2022;40(2):254–61. pmid:34635838
- 24. Xiong K, Zhang R, Ma J. scGHOST: identifying single-cell 3D genome subcompartments. Nat Methods. 2024;21(5):814–22. pmid:38589516
- 25. Boninsegna L, Yildirim A, Polles G, Zhan Y, Quinodoz SA, Finn EH, et al. Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat Methods. 2022;19(8):938–49. pmid:35817938
- 26.
Goemans MX, Williamson DP. 879-approximation algorithms for MAX CUT and MAX 2SAT. Proceedings of the twenty-sixth annual ACM symposium on Theory of computing - STOC ’94. Montreal, Quebec, Canada: ACM Press; 1994. pp. 422–31. https://doi.org/10.1145/195058.195216
- 27. Goemans MX, Williamson DP. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J ACM. 1995;42(6):1115–45.
- 28. Spector DL, Lamond AI. Nuclear speckles. Cold Spring Harb Perspect Biol. 2011;3(2):a000646. pmid:20926517
- 29. Chen Y, Zhang Y, Wang Y, Zhang L, Brinkman EK, Adam SA, et al. Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler. J Cell Biol. 2018;217(11):4025–48. pmid:30154186
- 30. Yildirim A, Hua N, Boninsegna L, Zhan Y, Polles G, Gong K, et al. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nat Struct Mol Biol. 2023;30(8):1193–206. pmid:37580627
- 31. Sahni S, Gonzalez T. P-Complete Approximation Problems. J ACM. 1976;23(3):555–65.
- 32. Haglin DJ, Venkatesan SM. Approximation and intractability results for the maximum cut problem and its variants. IEEE Trans Comput. 1991;40(1):110–3.
- 33. van Steensel B, Belmont AS. Lamina-Associated Domains: Links with Chromosome Architecture, Heterochromatin, and Gene Repression. Cell. 2017;169(5):780–91. pmid:28525751
- 34. Bhat P, Chow A, Emert B, Ettlin O, Quinodoz SA, Takei Y, et al. 3D genome organization around nuclear speckles drives mRNA splicing efficiency. bioRxiv. 2023;:2023.01.04.522632. pmid:36711853
- 35. Sawh AN, Shafer MER, Su J-H, Zhuang X, Wang S, Mango SE. Lamina-Dependent Stretching and Unconventional Chromosome Compartments in Early C. elegans Embryos. Mol Cell. 2020;78: 96–111.e6.
- 36. Bhat P, Honson D, Guttman M. Nuclear compartmentalization as a mechanism of quantitative control of gene expression. Nat Rev Mol Cell Biol. 2021;22(10):653–70. pmid:34341548
- 37. Goronzy IN, Quinodoz SA, Jachowicz JW, Ollikainen N, Bhat P, Guttman M. Simultaneous mapping of 3D structure and nascent RNAs argues against nuclear compartments that preclude transcription. Cell Rep. 2022;41(9):111730. pmid:36450242
- 38. Agrawal A, Verschueren R, Diamond S, Boyd S. A rewriting system for convex optimization problems. Journal of Control and Decision. 2018;5(1):42–60.
- 39. O’Donoghue B. Operator splitting for a homogeneous embedding of the linear complementarity problem. 2020 [cited 12 Feb 2023. ]. Available from:
- 40. Varoquaux N, Ay F, Noble WS, Vert J-P. A statistical approach for inferring the 3D structure of the genome. Bioinformatics. 2014;30(12):i26-33. pmid:24931992
- 41. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006. pmid:12045153
- 42. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem. 2004;25(13):1605–12. pmid:15264254