Figures
Abstract
Inside the nucleus, chromosomes are subjected to direct physical interaction between different components, active forces, and thermal noise, leading to the formation of an ensemble of three-dimensional structures. However, it is still not well understood to what extent and how the structural ensemble varies from one chromosome region or cell-type to another. We designed a statistical analysis technique and applied it to single-cell chromosome imaging data to reveal the heterogeneity of individual chromosome structures. By analyzing the resulting structural landscape, we find that the largest dynamic variation is the overall radius of gyration of the chromatin region, followed by domain reorganization within the region. By comparing different human cell-lines and experimental perturbation data using this statistical analysis technique and a network-based similarity quantification approach, we identify both cell-type and condition-specific features of the structural landscapes. We identify a relationship between epigenetic state and the properties of chromosome structure fluctuation and validate this relationship through polymer simulations. Overall, our study suggests that the types of variation in a chromosome structure ensemble are cell-type as well as region-specific and can be attributed to constraints placed on the structure by factors such as variation in epigenetic state.
Author summary
Recent work has revealed principles of how chromosomes are folded and structured inside the human nucleus. It is now even possible to microscopically trace the path of chromosomes in 3D in individual cells. With this data, we can start to examine how much variation exists in chromosome structure and what biological factors may restrict or enhance this variation. Are chromosomes stuck in just a few possible positions or do they move around more freely, sampling many configurations? Here, we use a mathematical approach to compare chromosome structure variation in different cell types, at different locations along the genome, and when key structural proteins are removed. Through these comparisons and dynamic simulations of chromosome behavior, we identify factors that may constrain or promote variation in chromosome structure.
Citation: Das P, Shen T, McCord RP (2022) Characterizing the variation in chromosome structure ensembles in the context of the nuclear microenvironment. PLoS Comput Biol 18(8): e1010392. https://doi.org/10.1371/journal.pcbi.1010392
Editor: Carl Herrmann, Heidelberg University, GERMANY
Received: April 18, 2022; Accepted: July 15, 2022; Published: August 15, 2022
Copyright: © 2022 Das et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Example analysis code is provided on the ChromosomeStructuralEnsembleCharacterization GitHub repository (https://github.com/rpmccordlab/ChromosomeStructuralEnsembleCharacterization).
Funding: This work was supported by the National Institutes of Health [NIGMS grant R35GM133557 to R.P.M]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In eukaryotic cells, long genomic DNA is packed within a relatively small nucleus, requiring chromosomes to fold into a multi-scale 3D spatial organization [1]. At the kilobase length scale, loops form between different chromatin regions [2–4]. Some loops are formed by the interaction of cohesin-mediated loop extrusion and boundary factors, a process which can create domains of enriched contacts called Topologically Associated Domains (TADs) at the scale of 100s of kilobases [5–7]. Through processes such as phase separation and tethering to nuclear structures the chromatin is further organized at the megabase scale into separate euchromatin and heterochromatin spatial regions, known as A/B compartments [8–12]. Finally at the scale of the whole nucleus, chromosomes tend to form their own territories [13–15]. Each of these structural features act independently as well as synergistically to contribute to different biological processes, as reported by an ever-increasing number of studies. For example, chromatin loops and TAD boundary positioning can influence enhancer-promoter interactions and gene regulation [16–18]. This multi-layered organization of 3D genome is not static: the chromatin fiber is semiflexible in nature and is subjected to forces generated by ATP-driven molecular machines along with thermal fluctuations which generate chromosome structure heterogeneity on a temporal scale [19–21]. Some chromosome structures (such as the positioning of whole chromosomes or nuclear lamina associations) are stable on a timescale of hours [22,23], but heterogeneity in these structures can arise, for example, due to stochastic variations that occur as each cell rebuilds its chromosome structure after mitosis [23,24]. Overall, therefore, variations observed in a chromosome structure ensemble reflects a combination of both cell-to-cell and temporal structural heterogeneity. Measuring the heterogeneity of the structural ensemble may reveal detailed folding mechanisms and subpopulations of cells with different chromosome structures that mediate different biological outcomes [25].
Several key molecular and imaging techniques to characterize chromosome structure have been developed in recent years [26,27]. Chromosome conformation capture (3C) based techniques, including Hi-C, can map the spatial organization of the whole genome in terms of contacts between genomic regions. However, when Hi-C and its derivatives are applied to millions of cells in bulk, it is challenging to study the heterogeneity of chromosome structures from the resulting population averaged measurements. Several polymer- and restraint- based models have been developed to reconstruct ensembles of structures from such population averaged Hi-C data [28–32]. However, there are usually multiple possible theoretical ensembles that could fit the population average, so without single cell measurements for comparison, it is difficult to know whether models derived from bulk data capture the variation present in true biological systems. To resolve this issue, researchers have started to develop single-cell counterparts of Hi-C and related techniques [33–35]. However, these single cell contact capture techniques often suffer from low coverage, which makes it difficult to characterize the structural heterogeneity within a particular cell-type.
In contrast, oligonucleotide-based FISH imaging techniques have enabled the tracing of chromatin regions at high resolution in thousands of single cells [25,36–40]. These techniques can generate distance distribution maps between imaged points along a chromosome that resemble Hi-C contact maps. By analyzing these single-cell structures at different resolutions, recent studies have started to characterize the heterogeneity associated with the structural ensemble. For example, it has been revealed that the inactive chrX exhibits not only canonically expected compact structures but also more chromosome domain inter-mingling as compared to the active chrX [36]. Another study revealed that domain-like structure also exists on the inactive chrX at the single-cell level, but such structures are averaged out and not visible in bulk experiments [25]. Bintu and co-authors reported the existence of TAD-like structures and sharp domain boundaries at a single cell level [37]. Though the boundary positions vary from one cell to another, they show higher preference for structural protein binding sites. By analyzing the single-cell whole chromosome tracing data at a higher resolution, researchers identified a substantial number of TAD-like domains in single cells that cross compartment boundaries observed in bulk data [38].
While initial studies focused on showing that variability in chromosome structures exists, recent work has begun to turn toward the question of how and why this variability arises [30,41,42]. For example, transcriptional regulation has been proposed to influence an observed two-state transition between closed and open dumbbell-like conformations of a short chromatin segment spanning a TAD boundary [41]. Other groups have analyzed the chromatin tracing data to evaluate the effects of the structural protein cohesin on the dynamics of TAD-like structures. Another study used a folding coordinate to represent the progression of TAD formation. These analyses showed that in the absence of structural protein cohesin, a local chromatin region may still exhibit TAD-like structures in individual snapshots, but these structures are less stable due to the entropy gain of the unfolded conformations. [42]. This has the effect of converting phase separated globules into more random coil-like state [30].
Although these recent studies have started to shed light on the fundamental mechanisms of local chromatin folding at the single-cell level, it is unclear how the chromosome structure variation differs between cell types or chromosome regions. Here, we focus on quantifying the full structure variation landscape rather than identifying multiple discrete structural states. Support for this view comes from a variety of recent studies. A FISH imaging study in Caenorhabditis elegans showed that chrV exhibits four major conformational clusters during early embryonic development, but that the structure ensembles corresponding to different clusters exhibit a broad and continuous distribution rather than a handful of discrete conformational states [39]. Similarly, within a particular cell-type, the single-cell structure ensemble can be described as a distribution of conformations around the bulk average structure (mean). We do not know how this fluctuation around an average structure (variance) varies in a cell-type specific manner. This question becomes particularly interesting when the average structural organization looks similar across different cell types or conditions, but there may be different underlying structural ensembles that lead to similar averages. A logical follow up question is what factors might contribute to the extent of structural variation. There are various essential factors such as epigenetic marks, associated structural proteins, and nuclear lamina associations that vary in a cell-type specific manner and contribute to the organization of 3D genome [43–45]. Therefore, investigating the role of these different factors in the variation of chromosome structural ensemble is of high importance.
To answer these questions about chromosome structural ensembles, we designed a statistical pipeline to analyze structural data obtained from single-cell chromatin tracing. Using this pipeline, we first characterized the landscape of single-cell structures across different cell-types. We then identified cell-type specific differences in the structural ensembles through joint analysis of structural landscapes. To quantify these differences, we designed two different network-based measurements–one to quantify the degree of continuity across the chromosome structural ensemble (ensemble structure similarity index) and another to quantify the degree of structural variation of local genomic regions across the population (bin similarity index). Using these two measurements, we quantified the ensemble variation and local structural variation across cell-types and chromosome regions at different genomic resolutions. We identify overall locus compaction and domain boundary shifting as two primary but independent modes of structural fluctuation that are observed across all length scales. We observe differences in chromosome ensemble structural similarity between regions that have different epigenetic state profiles, and we are able to demonstrate that interactions governed by epigenetic state can help explain these differences in the type of structure variation exhibited by the chromosome region.
Materials and methods
The statistical chromosome conformation analysis pipeline used in this work can be broadly divided into two parts. In the first part, the single-cell chromosome imaging data (of different genomic resolutions) is analyzed statistically to compare the structures in a pairwise fashion. Next, a structural similarity measurement is applied to quantify the degree of continuity present in the chromosome structural ensemble. In addition, the pairwise similarity data is used to construct a structural landscape to arrange the single-cell chromosome structures according to their individual and bulk structural properties (Fig 1).
a) Cartoon representation of potential differences in chromosome structure landscapes (left). Condition A has a continuous landscape including all structures along a trajectory, while in Condition B, chromosomes fall into 3 major structural clusters. (Right) Conditions and factors that may influence this chromosome structure landscape. b) Each pair of median-centered chromosome distance matrices are compared in a pairwise fashion. PCA is then performed on the pairwise structure comparison matrix and transformed data is projected in 2D space using PC1 and PC2 to create the “chromosome structural landscape”. c) To calculate chromosome ensemble structure similarity index (SSI), we construct a Pearson’s correlation matrix from the pairwise structure comparison matrix. This correlation matrix scaled, thresholded, and then converted into a weighted network from which SSI values can be calculated from each node. The median SSI value is plotted for many resamplings of nodes. d) To calculate bin variability, we convert each single-cell distance matrix into a thresholded matrix using a cutoff and then construct a network from each of those matrices with bins as nodes and contacts between bins as edges. We then calculate the bin similarity index for each bin in each single-cell and then take the standard deviation of the bin similarity index across the population as the bin variability.
Construction of distance matrices from single-cell chromosome imaging data
For this work, we have examined experimental single-cell chromosome imaging data at three different resolutions, each of which reflects different levels of structural organization of the 3D genome at different genomic length scales. The single-cell chromosome imaging data obtained from Bintu et al. [37]. has the highest genomic resolution of 30 kb. Here, the authors traced the conformations of multiple ~ 2–2.5 Mb spanning chromatin regions of chr21, partitioned into consecutive 30 kb segments in single-cells across different cell-types (e.g., IMR90 –human fetal lung fibroblast cells, K562 –human erythroleukemic cells, A549 –human lung adenocarcinoma cells, and HCT116 –human colon cancer cells) during different physiological and perturbed conditions. The second set of IMR90 chromatin tracing data analyzed here is obtained from Su et al. [38], where the single-cell conformations of chr21 non-repetitive region were traced using consecutive 50 kb bins. In addition, that dataset also provides chr2 conformations captured at the same 50 kb resolution, however with an interval of 250 kb between two genomic regions, due to very large size of human chr2. Both the Bintu et al. and Su et al. data capture chromatin conformations at a comparable resolution (30 and 50 kb respectively), differing only in the size of the genomic region analyzed. Whereas the Bintu et al. data primarily represents the structural heterogeneity at the level of specific TADs and sub-TADs, the Su et al. data captures the heterogeneity of almost the whole chromosome. Finally, we also analyzed chromosome structural heterogeneity of the whole active and inactive chrX of IMR90, obtained from Wang et al. [36]. In this dataset, TADs are imaged as the basic subunits along the chromosome, so the resolution of this data is lower (ranging from 440 kb to 7.88 Mb) and reports on the conformations of compartments and the whole chromosome rather than the within-TAD level.
For all these different datasets, the authors reported the single-cell conformations of the chromosomes in a 3D Cartesian coordinate system where each basic genomic region (e.g., 30 or 50 kb genomic region or TAD; from here termed as bin) has X-, Y-, Z- coordinate values. However, in some single cell structures the 3D positions of some of the bins are missing due to technical difficulties. Therefore, to decrease the effect of missing data, we consider only structures with more than ~ 90% measured bins, and the values for missing bins in those structures are assigned by nearest-neighbor interpolation. The 3D positions of chromosomes are also associated with the global rotational and translational variations which might affect underlying statistical analysis. To resolve this issue, we convert each 3D structure into a 2D symmetric distance matrix by calculating the pairwise Euclidean distance between all the genomic bins of that chromosome structure. For example, if a particular chromosome has m genomic bins and a total of N copies originating from N single cells, then we will have N symmetric pairwise distance matrices each of size m×m, where dij represents the Euclidean distance between ith and jth genomic bins. Further, the spatial distance can be converted into contacts by applying a genomic resolution specific thresholding cutoff. However, for our analyses, we use distance matrices instead of contact matrices, unless otherwise explicitly mentioned.
Generating chromosome structural landscapes using a statistical analysis technique
We next center all the single-cell matrices around zero by subtracting the median distance matrix from each of them, hij = dij−median(dij). Once we obtain the median centered single-cell distance matrices, we calculate the pairwise dot product between each of those single-cell matrices, such as, , where hα and hβ are median centered distance matrices obtained from single cells (α, β∈{1,…,N}), respectively. For each pair of single-cells, this step compares the deviation from the median between the two structures and yields a single number. In the case of two structures having similar deviations, this dot product would yield a strong positive value and vice versa. The pairwise dot product comparison information between all possible pairs of the single cells is then represented in a symmetric matrix format, termed as pairwise structure comparison matrix, where each element of the matrix represents the pairwise comparison information between a specific pair of chromosome structures. Next, we perform PCA on the pairwise structure comparison matrix and project the single-cell conformations in 2D space using the first two principal components (PCs). This PC projection is termed the chromosome structural landscape.
Quantification of chromosome structural ensemble variation using a network-based similarity index
To quantify the variation associated with the chromosome structural ensemble, we perform a network-based similarity calculation on the pairwise comparison matrix (Fig 1C). Similar approaches have been used to study cellular network heterogeneity associated with cancer and cellular differentiation [46–48]. In this approach, we first calculate Pearson’s correlation coefficient (PCC) between each pair of rows of the comparison matrix, where cij denotes PCC between ith and jth rows of the pairwise similarity matrix. As PCC values range between -1 and 1, next we scale the PCC values in the range of 0 and 1, wij = 0.5*(1+cij). Once we have the rescaled pairwise PCC value matrix, we construct a weighted network from that where each node represents a single chromosome structure or conformation and the edges between them denotes rescaled PCC values. However, we only consider the edges having weights above a certain threshold, representing only connections between highly similar structures. This threshold is calculated by taking 50th percentile of all the edge weights of the network (see Results for an analysis of the impact of this threshold choice). With the constructed network, we calculate the structure similarity index (SSI) of each node Si, using a formula based on the Shannon-Jayne network entropy [46].
where N(i) represents the neighborhood set of the node i and ki denotes the cardinality of that set. pij represents the proportionality of the edge weight between nodes i and j compared to all the edges associated with node i.
Network entropy is a commonly applied concept in the analysis of heterogeneity vs. continuity in biological networks [49]. Conceptually, entropy measures the degree of randomness in a system, or the extent to which all possible outcomes are sampled. Entropy in DNA sequence analysis measures whether all 4 DNA bases are represented uniformly in a motif position (high entropy) or some bases are strongly preferred in certain positions (low entropy) [50,51]. Network entropy in signaling or gene expression networks captures whether the network is more continuous or modular [46,52,53]. Similarly, here this metric will reveal whether the landscape of structures is highly connected with all intermediates sampled vs. more heterogeneous with disconnected structures. This index Si is bound between 0 and 1. In our system, the resulting values are compressed near a value of 1, so for easier visual presentation of values, we transform them by subtracting 0.99 from each SSI and then multiplying by 1000. For our presented analyses, we calculate the median SSI across all nodes in the landscape and call this the ensemble SSI. Specifically, to normalize sample size between conditions, we iteratively resample a subset of nodes from each condition, then plot the distribution of median SSI values from the resampled node sets. Based on the distribution of the median SSI values, we characterize the heterogeneity of the chromosome structural landscape. An SSI distribution with higher mean is considered as having a smooth continuous structural landscape while a distribution with relatively lower mean would have more discrete, disconnected structures. Interpretation of the SSI distribution trends have been discussed in detail in the Results section.
Calculating chromosome structure variability for each genomic bin
To characterize the variability of structure at each genomic bin along the chromosome, we use a modified version of the network-based structural similarity calculation approach, discussed in the previous subsection (Fig 1D). In this step, for each single-cell structure, we first construct a network based on its distance matrix, where each node is a single genomic bin and the edges between them represent the spatial distances. We then threshold the network using an experimentally-derived distance cutoff (330 nm for 30 kb resolution [41] and 500 nm for 50 kb resolution [38]), where edges with spatial distance higher than the cutoff are discarded from the analysis. Next, we calculate the bin similarity index of each of the nodes (bins) in the network using the same formula discussed in the previous section, except here using genomic bins as nodes rather than whole single-cell structures and spatial distances as edge weights rather than structure correlation coefficients. We now have a bin similarity index value for each genomic bin in each single cell structure. To quantify the variability of structural positions occupied by a given bin across all structures, we then calculate the standard deviation of bin similarity index values across all single cell structures for a given bin. We call this measure the bin variability. A higher standard deviation corresponds to a more dynamic or variable genomic bin.
Characterization of chromatin states using epigenetic profiles
In order to characterize chromatin states of the IMR90, K562 and A549 cells, we downloaded a total of eleven histone marks–H3K4Me1, H3K4Me2, H3K4Me3, H3K9Ac, H3K27Ac, H4K20Me1, H3K36Me3, H3K27Me3, H3K9Me3, H2A.Z and DNase for all the three cell types from ENCODE repository (S1 Table) [54]. Next, we used chromHMM software [55] to annotate genomic regions of desired resolutions into five chromatin states using a combinatorial pattern of the eleven histone marks. Based on the combinatorial histone marks, we defined chromatin states as–Active state 1, Active state 2, Inactive Poised, Repressed Polycomb and Heterochromatin (S1 Fig). Briefly, if an inferred state shows strong enrichments for active marks such as H3K4Me1/2/3, H3K27Ac, H3K9Ac and DNase, we term that state as an Active state. Further, depending on the level of enrichment of those different active marks, we characterize them into Active state 1 and Active state 2. We assign the Heterochromatin label to the state for which we observe strong enrichment of H3K9Me3 and for the Repressive Polycomb assignment, we look for strong H3K27Me3 and a mix of H3K9Me3 and H4K20Me1 marks. And the rest of the states are assigned the Inactive Poised level.
Polymer simulation of chromatin segments based on underlying chromatin states
To test the contribution of epigenetic states to chromosome structure heterogeneity, we perform polymer simulations of a specific chromatin segment with the interactions defined based on their epigenetic states. In recent years, polymer simulation-based studies have shown how the compartmental profile or the underlying epigenetic state of the genomic regions can give rise to the population-averaged structural organization of chromatin at different resolutions [9,56,57]. Here, we represent a ~ 1.95 Mb chromatin region as a chain of beads using a bead-spring polymer model. The model system consists of 1950 beads where beads represent genomic regions of 1 kb. Each bead is assigned a specific chromatin state as described above. In addition to that, we also model loops within the specific chromatin regions based on publicly available high-resolution Hi-C derived chromatin loop sets from Rao et al. [58]. For each loop, we connect the loop anchors using a harmonic constraint. The potential energy function of the polymer model is described below,
where UB(r) corresponds to bond stretching potential, modeled by finite extensible nonlinear elastic potential given by,
where
is spring constant and R0 = 1.6σ is the equilibrium bond length.
We use Lennard-Jones potential to model steric repulsion and attraction between the beads of different chromatin state with different parameters. For example, the potential between a bead of chromatin state A and that of B is represented below,
where ϵAB denotes interaction strength between bead types A and B (S2 Table) and σ refers to the size of each bead.
In case of loops, we use harmonic potential between each of the loop anchors UL(r), represented by,
where
is spring constant and
is the equilibrium loop bond length.
Finally, in order to model the repulsive interaction between the chromatin beads and the simulation box boundaries, we use Lennard-Jones potential
where ϵW = 1.0 denotes the interaction strength between boundary walls and chromatin beads irrespective of their types.
The LAMMPS molecular dynamics package is used to perform polymer dynamics simulations [59]. All the simulations are performed in terms of Lennard-Jones reduced units with τ for time, σ for length and ϵ for energy. To mimic the viscoelastic nature of the nucleoplasm, Langevin dynamics with a damping coefficient of γ = 1.0τ is used to maintain the temperature at T = 1.0. We then run the simulation for a total of 1,000,000 steps with a timestep of dt = 0.000001τ. Followed by that, the timestep dt is increased to 0.01τ and a production run of a total 2,000,000 steps is performed and the conformations are saved at intervals of 1,000 steps. We repeated this process for 20 times, each time with different initial conformation and random seeds.
Initial conditions and Equilibration
The chromatin fiber is initialized as a self-avoiding worm like chain using the python package PolymerCpp [60] and is placed in a box of size L = 35σ with fixed repulsive boundaries. We then allow the polymer to equilibrate inside the box using four consecutive phases. In the first phase, a soft potential is used to remove overlap in the polymer for 400,000 steps with a dt = 0.01τ.
where A is pre-factor, varied gradually from 0 to 100ϵ and
.
For the rest three subsequent phases, we switch to potential to Lennard-Jones potential and run the simulation for 400,000, 500,000 and 500,000 steps with timesteps 0.000001τ, 0.0001τ, and 0.01τ respectively.
where ϵG = 1.0 denotes the interaction strength between chromatin beads irrespective of their types.
Loop extrusion simulation of chromatin segments based on underlying RAD21 binding sites
To test the contribution of loop extrusion dynamics to the variability present in chromosome structural ensemble, here we perform loop extrusion polymer simulations of a specific chromatin segment according to the method described in Fudenberg et al. [2]. For these simulations, we define the domain boundaries based on RAD21 peaks present in the desired chromatin segment. Here, we model a 2 Mb spanning chromatin region as a 2000 beads monomer chain where beads represent genomic regions of 1 kb. All the simulation parameters are kept same between different cell types except the position of the boundary elements (here RAD21 peaks) and their strengths. The strength of boundary elements is defined in terms of boundary permeability parameter in the models which represents the boundary insulation. For each cell-type, these parameter values are obtained by calculating the relative strength of the RAD21 peaks. The RAD21 peaks for the IMR90 and K562 were downloaded from the ENCODE repository [54].
Analysis of nucleolus-associated domain (NAD) mapping data
In order to characterize the nucleolar association of the IMR90 chromosomal regions, we downloaded the NAD mapping data of the proliferating IMR90 cells from Dillinger et al. [61]. This microarray data contains log2 normalized ratio of nucleolus-associated and genomic DNA at specific probe locations. For our analysis, we bin that log2 normalized ratio data at 10 kb resolution using multiBigwigSummary functionality of the deepTools software [62].
Results
Chromosome structural landscape reveals major features of the structural ensemble
We first apply our statistical analysis technique to the single-cell chromosome structures measured for a 2 Mb region of IMR90 chr21 (28–30 Mb) (Fig 2A) and produce a chromosome structural landscape (Fig 2B). We observe that the first major source of variation corresponds well to the compactness of the chromosome structures, which can be quantified using the radius of gyration, Rg (S2A Fig). We calculate the Rg of the single-cell structures from their experimentally obtained 3D conformations. When we compare the Rg values of the structures with their PC1 values (from here on we refer to the PC1 projection as “PC1”), we observe that the structures with more negative PC1 projections are highly compact with smaller Rg values while structures with more positive PC1 values are less compact with higher Rg scores. With increasing Rg, the structural divergence also increases, with individual structure points becoming spread further along PC2 and also more spread out from each other.
a) Population-averaged distance map of IMR90 chr21:28–30 Mb at 30 kb resolution. b) Chromosome structural landscape of IMR90 chr21:28–30 Mb chromatin region, generated as described in the Methods section. Each point represents a single-cell structure. Points are colored according to the radius of gyration (Rg) of the corresponding structures (dark blue–smaller Rg and dark red–larger Rg). The percent of variation explained by each PC is shown in parentheses. c) Structures are first divided into five groups based on the PC1 ordering (which correlates highly with Rg). Structures belonging to groups 2, 3 and 5 are shown. Within each group structures are further divided into five subgroups based on their PC2 ordering. Average structures within each subgroup are shown.
Along PC2, we observe that the structure ordering follows a progressive reorganization of the domains within the region and captures several subclusters of structures (Fig 2C). As we move from negative to positive PC2, we observe a shifting of a boundary (weakening of one boundary and gaining of another) that leads to the change in the structure of the domain (S2B Fig). This phenomenon can be seen in different regimes of Rg, showing the existence of domain reorganization within both compact and open structures (S2B Fig).
When the same approach is applied to different chromosomes and cell-types at different resolutions, we found a consistent ordering of the dynamic modes of the structural landscape (S2C, S2D, S3 and S4 Figs). Although the major dynamic variation remains the Rg ordering for all the different situations, the reorganization of domains along PC2 is dependent on the length of the genomic region as well as the local structural properties.
Ensemble structure similarity index captures cell-type- and chromosome-specific features of the chromosome structural ensemble
To compare the single cell structural heterogeneity between different cell types, we constructed a joint structural landscape using the 28–30 Mb chr21 region data from three cell types–IMR90 (lung fibroblast), K562 (leukemia) and A549 (lung adenocarcinoma) collectively. In this joint structural landscape we can visualize the cell-type-specific variation present in structural ensemble (Fig 3A). To quantify the differences between cell types, we divide the landscape in 10 equally sized regimes along PC1 (since PC1 correlates with Rg, different PC1 regimes correspond to different Rg values) and calculated the proportion of different cell types within each regime. We observe that K562 shows a higher proportion of chromosome structures in lower Rg regimes and has an overall more compact structural ensemble compared to A549 and IMR90 (Fig 3A and 3B).
a) Jointly constructed chromosome structural landscape of genomic region chr21:28–30 Mb for cell types IMR90, K562 and A549. The only difference between individual and joint landscape is that all the different cell-type structures are considered together while calculating pairwise structure comparison matrix. Points are colored based on the cell types. Further, the landscape is divided into ten equal sized bins (or regions) based on PC1 values, where the bins on the left represent lower Rg regions and bins on the right are with higher Rg. b) Relative proportion of IMR90, K562 and A549 single-cell structures present in each bin represented in a). c) Chromosome structural ensemble SSI distributions for different 2 Mb chromosome regions from different cell-types and conditions, obtained from Bintu et al. [37]. The SSI value of the structural ensemble is calculated using a network entropy based approach. For each structural ensemble (size of ensemble indicated as n), 200 single-cell structures are selected randomly (except IMR90 S 28–30 Mb where we select 150 structures for resampling), and the median of the SSI values of the nodes are calculated. This step is repeated 1000 times and values are plotted as a distribution. d) Chromosome structural ensemble SSI distributions for active and inactive IMR90 chrX at TAD resolution, obtained from Wang et al. [36]. 50 single-cell structures are selected randomly each time for resampling. e) Chromosome structural ensemble SSI distributions for different IMR90 chromosomes and conditions at higher resolution, obtained from Su et al. [38]. Here, for resampling 300 single-cell structures are selected randomly each time.
While the relative Rg shift between cell types is straightforward to observe and quantify, other features of the chromosome structural landscape are more difficult to compare visually. Therefore, we devised a network-based measurement to quantitatively compare the structural landscapes, as discussed in the Methods section. According to this approach, a network with similar edge weights will exhibit a higher ensemble SSI value and vice-versa. In terms of the chromosome structural ensemble, a set of structures that smoothly transition across the entire conformational space will have a highly connected structural landscape and thus high ensemble SSI (Fig 1A, blue line). On the other hand, if structures coalesce around discrete states (Fig 1A, red line), the discontinuity in the structural ensemble will be higher, and the resulting network will have a larger variation in of edge weights, and a low ensemble SSI score will be produced.
We first applied the network-based similarity quantification approach to structures of 2 Mb regions of chr21 (30 Kb resolution) in different cell-types and conditions obtained from Bintu et al. [37]. There is variation in the sample size (i.e., the number of conformations) obtained for different experiments, so we relied on a resampling approach to make the comparison consistent. Based on this approach, when we compared the 28–30 Mb genomic region from three different cell-types analyzed above, we find that IMR90 has the highest ensemble SSI, followed by K562 and then A549 with the lowest ensemble SSI (Fig 3C). Thus, this region in IMR90 cells has the highest continuity across different structures while the corresponding distribution in A549 cells exists as a relatively discrete set of structures.
To evaluate our interpretation of the ensemble SSI result in an orthogonal fashion, we performed hierarchical clustering of the 3D structures obtained from IMR90 and K562 cells based on a structure similarity metric, Q, as represented in Cheng et al. (S5 Fig). At the same threshold, both cell types showed the same number of structure clusters. However, in IMR90, the vast majority of structures fall into one of these clusters, corresponding to our high ensemble SSI value (S5A and S5B Fig). In K562, the structures are more evenly split into the different clusters, corresponding to our low ensemble SSI value (S5C and S5D Fig). Thus, our ensemble SSI metric indeed captures the degree of continuity within the structural ensemble. By using ensemble SSI rather than a hierarchical clustering approach, we avoid the need to arbitrarily select a cutoff or number of clusters. We further assessed the dependence of the ensemble SSI score on the number of cells measured (S6 Fig). We observed that with an increase in the number of cells, the difference in ensemble SSI between cell-types follows the same trend, but the absolute difference becomes smaller (S6B Fig). It is important to note that as we increase the number of cells, we are including more low-quality measurements (cells with more missing data) and this may also contribute to a broader ensemble SSI distribution. We then examined how robust the ensemble SSI distribution is to changes in the correlation matrix cutoff chosen (S6C Fig). Using a cutoff lower than our initially chosen 50th percentile produces similar results, but increases variability as more non-significant connections between structures are included. A high cutoff, on the other hand, can obscure the results by breaking a heterogeneous structure landscape into too many disconnected networks. Within each sub-network, there is high continuity, so this results in an artificially high SSI being reported for landscapes that are in fact heterogeneous. We conclude that a 50th percentile correlation matrix cutoff excludes noisy connections without creating disconnected subnetworks (see S6C Fig for extended discussion). As a final technical test of our metric, we repeated the ensemble SSI analysis after removing extremely compact or large structures from each cell type to see whether those extreme conformations are disproportionately affecting the results (S7 Fig). We found that those extreme structures actually have little influence on the ensemble SSI calculation. Overall, we conclude that our ensemble SSI calculation is robust.
We next compared two different genomic regions of IMR90 chr21 using the ensemble SSI distribution. We found that although both are from same cell-type and same chromosome, the 28–30 Mb region has lower ensemble SSI values compared to 18–20 Mb region (Fig 3C). This difference indicates that the latter region has a higher degree of continuity in its structural ensemble. This 18–20 Mb region lies entirely in the B compartment (putative inactive, gene poor regions) while the 28–30 Mb region contains a majority of A compartment regions mixed with B compartment regions. To further investigate the potential connection between compact, inactive states and the continuity of the structural landscape, we compared the ensemble structural similarity of the active vs. inactive chrX structural ensembles (440 Kb to 7.8 Mb) (Fig 3D). We found that chrXi has higher ensemble SSI values compared to its active counterpart. This suggests that the chromosome with a highly compact global structure has a more continuous structural landscape compared to the more specific and discrete states of the open and active chromosome. (S8A, S8B and S8C Fig) [63].
When we analyzed the structural variation of the entire chr2 and chr21 q-arm at high resolution [38], we observed that chr21 has a higher ensemble SSI compared to chr2 (Fig 3E). To test whether the difference in length between chr2 and chr21 contributed to differences in the ensemble SSI values, we calculated the ensemble SSI of regions of different lengths selected from chr2. We observed that as the length increases the ensemble similarity decreases (S8D Fig). This effect likely stems from the increased number of possible conformations that a longer polymer can adopt, which corresponds to a sparser coverage of conformational space. We also found that ensemble similarity is lower in S/G2 phase compared to G1 phase for chr21. This result is consistent with the ensemble SSI values of 28–30 Mb IMR90 chr21 region structural ensembles from different cell-cycle stages, where the G1 phase has highest ensemble SSI followed by S and G2 phases, as shown in Fig 3A. One major difference between the chromosomes in the phases of the cell cycle is the addition of tethers between sister chromatids in S and G2 compared to G1 [64]. Increased constraints on the chromosome structure could then lead to more distinct structures and lower structural ensemble similarity.
Contribution of epigenetic properties to the cell-type-specific variation of chromosome structural ensemble
Building on the observation that regions with different A/B spatial compartment identity had different structural landscape SSI, we further examined the influence of local epigenetic state on the resulting structural landscape. Recent studies have shown the association of epigenetic states with the ensemble genome organization [28, 29, 65]. Fig 4A shows chromatin states along the 28–30 Mb genomic region in the three different cell types–IMR90, K562 and A549 (detailed steps in the Methods section). We observe that this region in IMR90 mostly has active marks while in K562 the region is mixed between repressive polycomb and heterochromatin marks as well as active marks. The A549 region shows a mixture of active, repressive, and poised marks. By qualitatively comparing this chromatin state data with the ensemble SSI distributions for these three cell types, we observe that the cell type with the more uniform epigenetic state (IMR90) has the highest ensemble SSI while cell types with mixed epigenetic states exhibit lower SSI.
a) Chromatin states of chr21:28–30 Mb genomic region from three different cell-types–IMR90, K562 and A549 at 30 kb resolution. The chromatin states are inferred based on 11 different histone marks for each cell-type by ChromHMM tool. b) Bin variability of the chr21:28–30 Mb region from IMR90, K562 and A549, where a higher value represents higher degree of structural variation of that genomic position across the population and vice versa. c) Population-averaged distance maps of IMR90 and K562 chr21:28–29.5 Mb genomic region from polymer simulations (see Methods) at 1 kb resolution. d) Chromosome structural ensemble SSI distributions for chr21:28–29.5 Mb from simulated IMR90 and K562. For resampling, 500 single-cell structures are selected randomly each time.
In addition to the overall dynamic properties of the region, we sought to quantitatively compare local genomic bin epigenetic state with the degree of structure fluctuation experienced by that bin. We first calculated the bin variability for each cell type (Methods) which represents the structure variability of a particular genomic region across the population of cells (Fig 4B). We do not see a direct bin-by-bin correlation between epigenetic state and bin variability. Instead, we observe that a higher proportion of active states within the region tends to correspond to higher bin variability across the region while a larger proportion of inactive states, even if focused in one part of the locus, corresponds to restricted bin variability across the whole region. In this comparison, it is important to remember that we are comparing a population averaged epigenetic state with single cell structural fluctuation. It is possible that additional relationships could become evident if single cell variation in epigenetic state were considered.
To test our hypothesis that epigenetic state differences contribute to cell-type-specific variation in the chromosome structural ensemble, we performed a set of polymer simulations of the desired genomic region of 28–29.5 Mb at a resolution of 1 kb, as described in the Methods section. We generated the models for IMR90 and K562 separately by assigning interactions between the polymer beads based on their specific chromatin states (S9A Fig). Once we obtained the 3D conformations from the simulations, we constructed pairwise Euclidean distance matrices for each conformation and further quantified the structural ensemble using our structural similarity approach. For the resampling purpose, we combined all the snapshots from 20 independent simulations and then selected 500 conformations randomly from this set 1000 times. When we compared the simulated IMR90 ensemble SSI distribution with simulated K562 data, we found that the K562 structural ensemble shows a lower similarity, which is the same trend observed in our analysis of the Bintu et al. imaging data (Figs 3C and 4D). Since the only difference between cell types in these simulations was the encoded epigenetic state of the polymer beads, this result corroborates the idea that the epigenetic state is one factor that can contribute to the variation in chromosome structural ensemble.
Structure heterogeneity properties are associated with epigenetic state and boundary positions
We further evaluated the contribution of epigenetic factors to structural heterogeneity by analyzing single-cell structures of six genomic regions, each 2 Mb in length, obtained from IMR90 chr21 whole q-arm imaging data (Fig 5A) [38]. For simplicity, the regions will be denoted as S1 to S6 based on their starting genomic coordinates. The regions S1 (chr21:18–20 Mb) and S2 (chr21:28–30 Mb) were selected in order to match the two IMR90 genomic regions data analyzed in Bintu et al. The three regions S3, S5 and S6 were selected based on their epigenetic state similarity with epigenetic state of the chr21:28–30 Mb region in K562 cells (S9B Fig). To evaluate epigenetic state similarity, we designed a sliding window-based approach where we move a window of size 40 (2 Mb genomic region with 50 kb resolution) over the whole q-arm of chr21 and for each window compare the IMR90 epigenetic state with the K562 28–30 Mb region epigenetic state using binary comparison (S9C Fig). However, as the resolution of the Bintu et al. data (65 bins per 2 Mb region) is different from the Su et al. data (40 bins per 2 Mb region), we take the least common multiple of the two bin numbers and expanded each data to 250 bins by repeating each bin element the required number of times. Here, a higher comparison value shows higher similarity with the K562 epigenetic state. Among the three regions, S3 has the highest score, S6 has the second highest and S5 has the third highest one (S9C Fig). The region S4 was selected randomly.
a) Population-averaged distance maps of 6 different genomic regions of IMR90 chr21 at 50 kb resolution obtained from Su et al. [38]. Locations of these 6 regions are indicated on the top right chromosome ideogram and labeled with colors used throughout the figure. b) Chromosome structural ensemble SSI distributions for the same 6 regions of IMR90 chr21. For resampling, 300 single-cell structures are selected randomly each time. c) HiCRatio score of each genomic bin of chr21 q-arm at 50 kb resolution (calculated with a 300 kb window size). The red points represent the boundary positions inferred by HiCRatio. d) Nucleolus association level of the same 6 regions of IMR90 chr21 [61]. For each genomic region, the 1 Mb upstream and downstream regions are also shown. A value greater than 0 represents higher levels of nucleolus association and vice versa. Colors in each panel correspond to chromosome locations indicated above panel b.
When we quantified the variation in the structural ensemble of these six regions, we first observed that the S1 and S2 regions have ensemble SSI distributions that are consistent with the results from the same regions in the Bintu et al. data (Fig 5B). Next, we found that the S3 region has the lowest ensemble SSI distribution among all the six regions which is also consistent with the low ensemble SSI of the Bintu et al. K562 region. In addition, the ordering of S6 and S5 are also consistent with their similarity with K562 epigenetic state. Finally, the S4 region, which has a very low similarity score with K562 state, exhibits a very high ensemble SSI distribution. Again, this analysis supports the idea that epigenetic patterns contribute to the degree of variation present in the chromosome strucural ensemble.
In addition to the epigenetic states, we also identified domain boundaries along the whole q-arm of chr21 using the HiCRatio approach (with a 300 kb window size) [66]. Since this boundary calling software is designed for contact data, we first converted the single-cell chromosome structural distance data into contact data using a cutoff of 500 nm, as previously reported [38]. We found that regions with a higher number of strong domain boundaries exhibit lower ensemble similarity and vice versa (Fig 5C). For example, the S3 region, which has the lowest ensemble SSI, has three strong domain boundaries and one weak. On the other hand, regions like S1 and S4 which exhibit higher ensemble SSIs have fewer or weaker domain boundaries. This effect is visible across the entire chromosome, in that the average boundary strength in a given region is negatively correlated with the ensemble SSI of that region (S9 and S9E Fig).
Finally, we asked whether chromosome association with the nucleolus is associated with changes in the chromosome structural ensemble SSI (Fig 5D). The NAD data source and processing steps are described in the Methods section. We observe that S1 and S4, the regions with the highest structural ensemble similarity, are the most uniform in their NAD state: S1 is completely associated with the nucleolus, while S4 has almost no association with the nucleolus. As seen with epigenetic state, here we find that regions with uniform nucleolus association exhibit a higher continuity level in the structural ensemble. In contrast, S3 has the lowest ensemble SSI and also alternating levels of nucleolus association in the first half of the genomic region, followed by an unconstrained region. Thus, variation in constraints across the region is again associated with specific distinct structures and thus a lower ensemble similarity.
Loop extrusion alone does not reproduce experimentally detected chromosome structure variation patterns
Above, we observed that chromosome regions with a higher number or strength of domain boundaries showed less continuous structural landscapes and lower ensemble SSI values. Boundaries detected by approaches like HiCRatio can result from cohesin-mediated loop extrusion that is periodically blocked by factors such as CTCF [67], as well as from the borders between A and B compartment regions of chromatin. We next therefore asked whether the loop extrusion process, blocked by different numbers of boundaries, would be expected to generate the patterns of structure variation we observe. We simulated loop extrusion using the chr21:28–30 Mb segment using experimental cohesin binding data from K562 and IMR90 (see Methods). This region has fewer boundaries in K562 (likely due to the portion of the region that is heterochromatic in this cell type). The structure landscapes resulting from a single simulation run show that the cell type with more loop extrusion boundaries (IMR90) would be predicted to have a lower ensemble SSI than the cell type with fewer boundaries if only loop extrusion were at work (S10A and S10B Fig). Once we start the simulation from several different initial conditions, this difference is diminished (S10B Fig) due to variation in structure between different runs. While this result matches our expectations that more boundaries associate with lower ensemble SSI, it does not agree with the experimental observations from these cell types, in which IMR90 actually has the higher ensemble SSI. This suggests that epigenetic state-specific interactions must be considered along with loop extrusion to explain the chromosome variability observed.
The effect of cohesin depletion on structural heterogeneity depends on the underlying epigenetic state of the chromosome region
To further examine the effect of cohesin on the chromosome structural landscape, we analyzed chr21 single-cell structural data from HCT116 cells in wildtype (WT) and cohesin depleted (+Auxin) conditions [37]. This analysis included two different genomic regions (chr21:28–30 Mb and chr21:34–37 Mb) with different underlying structural properties. The jointly constructed chromosome structural landscape for both genomic regions shows that along PC1 the structures are again ordered based on their Rg (S10C and S10D Fig). Thus, the chromosome structural landscape approach orders single-cell structures based on their Rg irrespective of cell-type, genomic regions, and perturbation condition. The WT and cohesin-depleted structural landscapes from the first genomic region mostly overlap with a slight bias for the +Auxin structures in the higher Rg regime (Fig 6C). This suggests that removal of cohesin makes the structures less compact to some extent, as previously observed [68].
a,b) Population-averaged distance maps of HCT116 chr21:28–30 Mb (a) or chr21:34–37 Mb (b) at 30 kb resolution for wild-type (WT) and cohesin-depleted (+Auxin) conditions based on data obtained from Bintu et al. (37]. Compartment status from Hi-C data at these locations shown above each matrix. c,d) Jointly constructed chromosome structural landscapes of chr21:28–30 Mb (c) or chr21:34–37 Mb (d) of HCT116 for WT and +Auxin conditions. e) Chromosome structural ensemble SSI distributions for different local chromosome regions of HCT116 for WT and +Auxin conditions. 200 single-cell structures are selected randomly each time for resampling.
On the other hand, for the second region, the jointly constructed landscape shows a shift upwards along PC2 along with the shift towards the large Rg regime for cohesin-depleted structures (Fig 6D). In addition, we found that after cohesin removal, both genomic regions exhibit more structural variation at the genomic bin level according to BSI standard deviation (S10E and S10F Fig). When we examine the structural ensemble SSI of these two regions, we see apparently conflicting effects of cohesin removal: the ensemble SSI is decreased in the 28–30 Mb region but mostly unchanged in the 34–37 Mb region (Fig 6E). Note that the initial ensemble SSI of these regions in the WT follows the expected pattern for regions of different lengths. We observed that this divergent ensemble SSI effect corresponds to a difference in the underlying compartment state of the regions. The 28–30 Mb region is divided into A and B compartment segments (which in earlier examples tends to correspond to lower SSI) (Fig 6A) while the 34–37 Mb region is all in the A compartment with no compartment boundaries (which in earlier examples tends to correspond to higher SSI) (Fig 6B).
Discussion
As the number and quality of single cell chromosome structural datasets continues to increase, a diverse toolbox of methods to capture properties of chromosome structure variation is needed. Our statistical analysis of single cell chromosome structures allows us to characterize both the major modes of chromosome structure fluctuation and the properties of the structure fluctuation landscape. While some approaches focus on classifying single cell structures into discrete clusters, other recent work has shown that chromosome structures can exist along a continuum [25,30,39,41,42]. Our analysis does not require a decision to be made about how many different clusters of structures exist, but instead can capture the overall types of structure variations and can measure the degree of continuity between the structures.
For all observed length scales of chromosome structures, from local TAD scale to whole chromosome arms, we find that the primary type of single cell structure variation is the overall compaction of the region. For each structure assayed, there exist some cells in which the region being examined is very compact and others in which it is much more extended. This overall size fluctuation is true of both active and inactive regions, though inactive regions and the inactive X chromosome show a shift toward smaller sizes overall. However, we find that this overall region compaction is largely independent of a secondary fluctuation mode: the reorganization of domains. No matter how compact or extended the overall region is, we observe a similar progressive reorganization of the domain boundaries as a secondary type of structure variation.
By quantifying the network-based similarity properties of the structural ensemble, we can measure how discrete vs. continuous the structures adopted by a certain chromosome region are. Combining all of our observations of ensemble SSI in different regions and conditions, we propose that the ensemble SSI of a structural ensemble is influenced by the epigenetic state constraints put on the region by its epigenetic state. These constraints may take the form of domain boundaries, where different parts of the chromosome regions are constrained in self-interacting patterns, or by associations with nuclear structures such as the nucleolus. Overall, constraints on the chromosome structure and boundaries between different types of regions decrease structural ensemble SSI and result in more discrete structures while regions with more uniform linear chromatin states exhibit a more continuous structural landscape with higher SSI.
It might seem likely that the active or inactive chromatin status of a region would be the primary contribution of epigenetics to the structural fluctuation of a region. That is, one might expect an inactive region to have high structural ensemble SSI and an active region low ensemble SSI or vice versa. However, we find that this is not the case. Instead, the uniformity of the epigenetic state appears to be a key factor, such that a uniformly active region shares structural landscape properties with a uniformly inactive region (both have high SSI) while a region divided between alternating states behaves differently (low SSI). This leads us to the idea that alternating epigenetic states place constraints on the possible conformations and thus create a more disjointed, rather than continuous, structural ensemble.
The idea that epigenetic state-mediated constraints are key to shaping dynamic structural landscapes is echoed again by the results we observe in loop extrusion simulations and cohesin depletion data. Previous work has shown that compartmentalization and loop extrusion can act in opposition to each other [69]. When cohesin is removed, we see that a region consisting of multiple epigenetic states can form subdomains and is thus more likely to adopt more discrete structures, resulting in a decrease in ensemble SSI compared to the state where cohesin is active. When cohesin is active, there is a chance of the entire region being pulled into the same loop, creating a more continuous landscape of possible structures (higher SSI). In contrast, when a region exists in a uniform epigenetic state, we see that the removal of cohesin does not result in a reduction of SSI. As before, cohesin removal is expected to reduce the active movement of chromatin, but in this case, the region is entirely in the same compartment, so there are no additional constraints that would lead to the formation of discrete states in the cohesin-depleted structural landscape. Our observations are consistent with the previously published idea that cohesin loop extrusion acts to increase chromosome conformation dynamics, promoting distal contacts and countering the tendency of domains to settle into more discrete sub-structures [69,70]. Our loop extrusion simulations suggest that in a uniform epigenetic state, the region with more boundaries would adopt more discrete structures and thus have less landscape continuity. But, loop extrusion alone, without consideration of interactions between similar chromatin states, cannot reproduce the differences in structure variability we see between different cell types.
The analysis of cohesin depletion results also reiterate the value of considering both the structural landscape and the ensemble SSI of that landscape. The structural landscape of a region can change (adopting a different range of radius of gyration or domain structures) while the continuity (SSI) stays the same. Alternately, the ensemble SSI can change while the structural landscape is overall conserved, indicating cases where the range and types of structures a region can adopt are the same, but the region is more often in certain discrete structures within that range rather than continuously sampling the range.
Supporting information
S1 Table. ENCODE accession IDs for the eleven epigenetic marks from three different cell-types.
https://doi.org/10.1371/journal.pcbi.1010392.s001
(XLSX)
S2 Table. Parameters values for Lennard-Jones Potential UP(r) used in the polymer model.
Each entry represents the interaction strength and distance cutoff (separated by |) between beads of that two specific types. Energy values and distance cutoff values are represented in terms of kBT and σ respectively.
https://doi.org/10.1371/journal.pcbi.1010392.s002
(XLSX)
S1 Fig.
Assignment of different chromatin states based on relative enrichment of different histone marks using ChromHMM tool for chr21:28–30 Mb genomic region from three different cell-types—IMR90, K562 and A549 at 30 Kb resolution.
https://doi.org/10.1371/journal.pcbi.1010392.s003
(PNG)
S2 Fig.
a) Pearson’s correlation between chromosome structural landscape PC1 values and radius of gyration (Rg) of single-cell structures (calculated from 3D position traces) for IMR90 chr21:28–30 Mb genomic region. b) HiCRatio score of PC2 ordered structures belonging to different PC1 groups for IMR90 chr21:28–30 Mb genomic region, calculated by HiCRatio approach (with a 300 kb window size). Here, PC1G1 refers to the group1 (G1; smallest Rg structures) based on PC1. Within PC1G1, then PC2G1 means the group1 (G1) based on PC2. Successive higher numbered PC1 groups have higher Rg structures. c) For K562 chr21:28–30 Mb genomic region, structures are first divided into three groups based on the PC1 ordering. Within each group, structures are further divided into three subgroups based on their PC2 ordering. To display the domain reorganization along PC2 in K562, averaged distance maps from only groups 2 and 3 from PC1 are shown here. d) For the same groups as in c, HiCRatio (with a 300 kb window size) scores of structures are shown. Colored lines represent HiCRatio scores from PC2 subgroups within two different PC1 subgroups (top = smaller Rg and bottom = larger Rg)
https://doi.org/10.1371/journal.pcbi.1010392.s004
(PNG)
S3 Fig.
For IMR90 chr2 structure data, structures are first divided into ten groups based on the PC1 ordering. Within each group, structures are further divided into ten subgroups based on their PC2 ordering. To display the domain reorganization phenomenon along PC2 in IMR90 chr2, only PC1 subgroups 6 (lower Rg) and 10 (higher Rg) are shown here. Heatmaps display the average distances for structures within each PC2 subgroup in each PC1 subgroup.
https://doi.org/10.1371/journal.pcbi.1010392.s005
(PNG)
S4 Fig.
For IMR90 chr21 entire q-arm structure data, structures are first divided into sixteen groups based on the PC1 ordering. Within each group, structures are further divided into sixteen subgroups based on their PC2 ordering. To display the domain reorganization phenomenon along PC2 in IMR90 chr21, only PC1 subgroups 12 (lower Rg) and 16 (higher Rg) are shown here. Heatmaps display the average distances for structures within each PC2 subgroup in each PC1 subgroup.
https://doi.org/10.1371/journal.pcbi.1010392.s006
(PNG)
S5 Fig.
a,c) Hierarchical clustering of IMR90 (a) and K562 (c) structures for chr21:28–29.5 Mb genomic segment, based on Q similarity metric as described in Cheng et al. (1). To assign structures to the different clusters, the trees are cut at 0.7. b,d) Proportion of structures in different clusters obtained from IMR90 (b) and K562 (d) structural ensembles.
https://doi.org/10.1371/journal.pcbi.1010392.s007
(PNG)
S6 Fig.
a) Chromosome structural landscape of IMR90 and K562 chr21:28–30 Mb chromatin regions constructed for different numbers of cells. Points are colored according to the radius of gyration (Rg) of the corresponding structures (dark blue–smaller Rg and dark red–larger Rg). b) Chromosome structural ensemble SSI distributions for chr21:28–30 Mb segment from IMR90 and K562 for different number of cells (shown as n). c) Chromosome structural ensemble SSI distributions for the same chr21:28–30 Mb segment from IMR90 and K562 for a lower (35th percentile) and higher (65th percentile) correlation matrix cutoff compared to the cutoff used in the manuscript (50th percentile). The inset (dotted lines) shows a zoom-in to the comparison between IMR90 (left) and K562 (right) at the 65th percentile cutoff. Here, for resampling 200 single-cell structures are selected randomly each time.
https://doi.org/10.1371/journal.pcbi.1010392.s008
(PNG)
S7 Fig.
a) Chromosome structural landscape of IMR90 and K562 chr21:28–30 Mb chromatin regions constructed for different numbers of cells after removing extremely compact or large conformations. Points are colored according to the radius of gyration (Rg) of the corresponding structures (dark blue–smaller Rg and dark red–larger Rg). b) Chromosome structural ensemble SSI distributions for chr21:28–30 Mb segment from IMR90 and K562 for different numbers of cells, before (green) or after (blue) removing extremely compact or large conformations (“outliers”). 200 single-cell structures are selected randomly each time for resampling.
https://doi.org/10.1371/journal.pcbi.1010392.s009
(PNG)
S8 Fig.
a) Population-averaged distance maps of active and inactive chrX from IMR90 cell at TAD resolution. b) Jointly constructed chromosome structural landscape of active and inactive chrX from IMR90 cell. c) Bin Variability (BSI standard deviation) of active and inactive chrX, where a higher value represents a higher level of structure variation for that bin across the population and vice versa. d) Chromosome structural ensemble SSI distributions for chr2 segments of varying length. Here, for resampling 300 single-cell structures are selected randomly each time.
https://doi.org/10.1371/journal.pcbi.1010392.s010
(PNG)
S9 Fig.
a) Representative polymer models of IMR90 and K562 for chr21:28–29.5 Mb genomic segment, obtained from different time points of the simulation. The color of the beads shows corresponding epigenetic states. b) Chromatin states of six different 2 Mb spanning genomic regions of IMR90 chr21 at 50 kb resolution. The chromatin states are inferred based on 11 different histone marks for each cell-type by ChromHMM tool. c) Epigenetic state similarity of different 2 Mb spanning genomic regions of IMR90 chr21 q-arm with the K562 28–30 Mb region using binary comparison. Here, a higher comparison value shows higher similarity with the K562 epigenetic state. d) Chromosome structural ensemble SSI distributions for 31 different 2 Mb spanning genomic regions of IMR90 chr21. For resampling, 300 single-cell structures are selected randomly each time. Mean boundary strength of those regions inferred by HiCRatio have been shown in green color. e) Pearson’s correlation between the mean boundary strength and mean ensemble SSI of the 31 different genomic regions obtained from IMR90 chr21.
https://doi.org/10.1371/journal.pcbi.1010392.s011
(PNG)
S10 Fig.
a) Population-averaged distance maps of IMR90 and K562 chr21:28–30 Mb genomic region from loop extrusion polymer simulations (see Methods) at 1 kb resolution. b) Chromosome structural ensemble SSI distributions for chr21:28–30 Mb from loop extrusion simulated IMR90 and K562. For resampling, 500 single-cell structures are selected randomly each time from a single run (left) or across multiple runs (right). c,d) Jointly constructed chromosome structural landscapes of chr21:28–30 Mb (c) or chr21:34–37 Mb (d) of HCT116 for WT and +Auxin conditions. Points are colored according to the radius of gyration (Rg) of the corresponding structures (dark blue–smaller Rg and dark red–larger Rg). e) Bin Variability of chr21:28–30 Mb genomic region for wild-type (WT) and cohesin-depleted (+Auxin) conditions. f) Bin Variability of chr21:34–37 Mb genomic region for wild-type (WT) and cohesin-depleted (+Auxin) conditions.
https://doi.org/10.1371/journal.pcbi.1010392.s012
(PNG)
Acknowledgments
The authors would like to thank Dr. Rebeca San Martin for valuable suggestions and insightful discussions in the development of the research project. We thank Dr. Tian Hong for sharing of computational GPU resources.
References
- 1. McCord RP, Kaplan N, Giorgetti L. Chromosome conformation capture and beyond: toward an integrative view of chromosome structure and function. Molecular cell. 2020;77(4):688–708. pmid:32001106
- 2. Fudenberg G, Imakaev M, Lu C, Goloborodko A, Abdennur N, Mirny LA. Formation of chromosomal domains by loop extrusion. Cell reports. 2016;15(9):2038–49. pmid:27210764
- 3. Nichols MH, Corces VG. A CTCF code for 3D genome architecture. Cell. 2015;162(4):703–5. pmid:26276625
- 4. Sanborn AL, Rao SS, Huang S-C, Durand NC, Huntley MH, Jewett AI, et al. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proceedings of the National Academy of Sciences. 2015;112(47):E6456–E65.
- 5. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148(3):458–72. pmid:22265598
- 6. Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N, et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012;485(7398):381–5. pmid:22495304
- 7. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80. pmid:22495300
- 8. Lieberman-Aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. science. 2009;326(5950):289–93. pmid:19815776
- 9. Falk M, Feodorova Y, Naumova N, Imakaev M, Lajoie BR, Leonhardt H, et al. Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature. 2019;570(7761):395–9. pmid:31168090
- 10. Sanulli S, Justin N, Teissandier A, Ancelin K, Portoso M, Caron M, et al. Jarid2 methylation via the PRC2 complex regulates H3K27me3 deposition during cell differentiation. Molecular cell. 2015;57(5):769–83. pmid:25620564
- 11. Wang L, Gao Y, Zheng X, Liu C, Dong S, Li R, et al. Histone modifications regulate chromatin compartmentalization by contributing to a phase separation mechanism. Molecular cell. 2019;76(4):646–59. e6. pmid:31543422
- 12. Strom AR, Emelyanov AV, Mir M, Fyodorov DV, Darzacq X, Karpen GH. Phase separation drives heterochromatin domain formation. Nature. 2017;547(7662):241–5. pmid:28636597
- 13. Bolzer A, Kreth G, Solovei I, Koehler D, Saracoglu K, Fauth C, et al. Three-dimensional maps of all chromosomes in human male fibroblast nuclei and prometaphase rosettes. PLoS biology. 2005;3(5):e157. pmid:15839726
- 14. Cremer T, Cremer C. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nature reviews genetics. 2001;2(4):292–301. pmid:11283701
- 15. Stack SM, Brown DB, Dewey W. Visualization of interphase chromosomes. Journal of cell science. 1977;26(1):281–99. pmid:562895
- 16. Kadauke S, Blobel GA. Chromatin loops in gene regulation. Biochimica et Biophysica Acta (BBA)-Gene Regulatory Mechanisms. 2009;1789(1):17–25. pmid:18675948
- 17. Braccioli L, de Wit E. CTCF: a Swiss-army knife for genome organization and transcription regulation. Essays in biochemistry. 2019;63(1):157–65. pmid:30940740
- 18. Zuin J, Roth G, Zhan Y, Cramard J, Redolfi J, Piskadlo E, et al. Nonlinear control of transcription through enhancer-promoter interactions. bioRxiv. 2021:2021.04.22.440891.
- 19. Clapier CR, Iwasa J, Cairns BR, Peterson CL. Mechanisms of action and regulation of ATP-dependent chromatin-remodelling complexes. Nature reviews Molecular cell biology. 2017;18(7):407–22. pmid:28512350
- 20. Kim Y, Yu H. Shaping of the 3D genome by the ATPase machine cohesin. Experimental & Molecular Medicine. 2020;52(12):1891–7. pmid:33268833
- 21. Kim JM, Visanpattanasin P, Jou V, Liu S, Tang X, Zheng Q, et al. Single-molecule imaging of chromatin remodelers reveals role of ATPase in promoting fast kinetics of target search and dissociation from chromatin. bioRxiv. 2021. pmid:34313223
- 22. Strickfaden H, Zunhammer A, van Koningsbruggen S, Köhler D, Cremer T. 4D chromatin dynamics in cycling cells: Theodor Boveri’s hypotheses revisited. Nucleus. 2010;1(3):284–97. pmid:21327076; PubMed Central PMCID: PMC3027035.
- 23. Kind J, Pagie L, Ortabozkoyun H, Boyle S, de Vries SS, Janssen H, et al. Single-cell dynamics of genome-nuclear lamina interactions. Cell. 2013;153(1):178–92. pmid:23523135.
- 24. Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature. 2013;502(7469):59–64. pmid:24067610; PubMed Central PMCID: PMC3869051.
- 25. Takei Y, Zheng S, Yun J, Shah S, Pierson N, White J, et al. Single-cell nuclear architecture across cell types in the mouse brain. Science. 2021:eabj1966. pmid:34591592
- 26. Kempfer R, Pombo A. Methods for mapping 3D chromosome architecture. Nature Reviews Genetics. 2020;21(4):207–26. pmid:31848476
- 27. Jerkovic I, Cavalli G. Understanding 3D genome organization by multidisciplinary methods. Nature Reviews Molecular Cell Biology. 2021:1–18.
- 28. Di Pierro M, Cheng RR, Aiden EL, Wolynes PG, Onuchic JN. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture. Proceedings of the National Academy of Sciences. 2017;114(46):12126–31. pmid:29087948
- 29. Di Pierro M, Potoyan DA, Wolynes PG, Onuchic JN. Anomalous diffusion, spatial coherence, and viscoelasticity from the energy landscape of human chromosomes. Proceedings of the National Academy of Sciences. 2018;115(30):7753–8.
- 30. Conte M, Fiorillo L, Bianco S, Chiariello AM, Esposito A, Nicodemi M. Polymer physics indicates chromatin folding variability across single-cells results from state degeneracy in phase separation. Nature communications. 2020;11(1):1–13.
- 31. Wlasnowolski M, Sadowski M, Czarnota T, Jodkowska K, Szalaj P, Tang Z, et al. 3D-GNOME 2.0: a three-dimensional genome modeling engine for predicting structural variation-driven alterations of chromatin spatial structure in the human genome. Nucleic Acids Research. 2020;48(W1):W170–W6. pmid:32442297
- 32. Tjong H, Li W, Kalhor R, Dai C, Hao S, Gong K, et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proceedings of the National Academy of Sciences. 2016;113(12):E1663–E72. pmid:26951677
- 33. Ramani V, Deng X, Qiu R, Gunderson KL, Steemers FJ, Disteche CM, et al. Massively multiplex single-cell Hi-C. Nature methods. 2017;14(3):263–6. pmid:28135255
- 34. Tan L, Ma W, Wu H, Zheng Y, Xing D, Chen R, et al. Changes in genome architecture and transcriptional dynamics progress independently of sensory experience during post-natal brain development. Cell. 2021;184(3):741–58. e17. pmid:33484631
- 35. Arrastia MV, Jachowicz JW, Ollikainen N, Curtis MS, Lai C, Quinodoz SA, et al. Single-cell measurement of higher-order 3D genome organization with scSPRITE. Nature Biotechnology. 2021:1–10.
- 36. Wang S, Su J-H, Beliveau BJ, Bintu B, Moffitt JR, Wu C-t, et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353(6299):598–602. pmid:27445307
- 37. Bintu B, Mateo LJ, Su J-H, Sinnott-Armstrong NA, Parker M, Kinrot S, et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science. 2018;362(6413):eaau1783. pmid:30361340
- 38. Su J-H, Zheng P, Kinrot SS, Bintu B, Zhuang X. Genome-scale imaging of the 3D organization and transcriptional activity of chromatin. Cell. 2020;182(6):1641–59. e26. pmid:32822575
- 39. Sawh AN, Shafer ME, Su J-H, Zhuang X, Wang S, Mango SE. Lamina-dependent stretching and unconventional chromosome compartments in early C. elegans embryos. Molecular cell. 2020;78(1):96–111. e6. pmid:32105612
- 40. Mateo LJ, Murphy SE, Hafner A, Cinquini IS, Walker CA, Boettiger AN. Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature. 2019;568(7750):49–54. pmid:30886393
- 41. Cheng RR, Contessoto VG, Aiden EL, Wolynes PG, Di Pierro M, Onuchic JN. Exploring chromosomal structural heterogeneity across multiple cell lines. Elife. 2020;9:e60312. pmid:33047670
- 42. Xie WJ, Qi Y, Zhang B. Characterizing chromatin folding coordinate and landscape with deep learning. PLoS computational biology. 2020;16(9):e1008262. pmid:32986691
- 43. Zhu C, Zhang Y, Li YE, Lucero J, Behrens MM, Ren B. Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nature Methods. 2021;18(3):283–92. pmid:33589836
- 44. Chen H, Tian Y, Shu W, Bo X, Wang S. Comprehensive identification and annotation of cell type-specific and ubiquitous CTCF-binding sites in the human genome. PloS one. 2012;7(7):e41374. pmid:22829947
- 45. Bersaglieri C, Kresoja-Rakic J, Gupta S, Bär D, Kuzyakiv R, Santoro R. Genome-wide maps of nucleolus interactions reveal distinct layers of repressive chromatin domains. bioRxiv. 2020:2020.11.17.386797.
- 46. Cheng F, Liu C, Shen B, Zhao Z. Investigating cellular network heterogeneity and modularity in cancer: a network entropy and unbalanced motif approach. BMC systems biology. 2016;10(3):301–11. pmid:27585651
- 47. Banerji CR, Miranda-Saavedra D, Severini S, Widschwendter M, Enver T, Zhou JX, et al. Cellular network entropy as the energy potential in Waddington’s differentiation landscape. Scientific reports. 2013;3(1):1–7. pmid:24154593
- 48. Menichetti G, Bianconi G, Castellani G, Giampieri E, Remondini D. Multiscale characterization of ageing and cancer progression by a novel network entropy measure. Molecular BioSystems. 2015;11(7):1824–31. pmid:25909281
- 49. Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. Entropy. 2020;22(6). pmid:33286399
- 50. Saha P, Kumar Sarkar B. Entropy based analysis of genetic information. Journal of Physics: Conference Series. 2020;1579(1):012003.
- 51.
Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press; 1998.
- 52. West J, Bianconi G, Severini S, Teschendorff AE. Differential network entropy reveals cancer system hallmarks. Scientific Reports. 2012;2(1):802. pmid:23150773
- 53. Teschendorff AE, Severini S. Increased entropy of signal transduction in the cancer metastasis phenotype. BMC Systems Biology. 2010;4(1):104. pmid:20673354
- 54. Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic acids research. 2018;46(D1):D794–D801. pmid:29126249
- 55. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature methods. 2012;9(3):215–6. pmid:22373907
- 56. Chiang M, Michieletto D, Brackley CA, Rattanavirotkul N, Mohammed H, Marenduzzo D, et al. Polymer modeling predicts chromosome reorganization in senescence. Cell reports. 2019;28(12):3212–23. e6. pmid:31533042
- 57. Shi G, Liu L, Hyeon C, Thirumalai D. Interphase human chromosome exhibits out of equilibrium glassy dynamics. Nature communications. 2018;9(1):1–13.
- 58. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. pmid:25497547
- 59. Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics. 1995;117(1):1–19.
- 60. Stefko M, Douglass KM. PolymerCpp: 2D and 3D wormlike chain generator for Python and written in C++. 2017.
- 61. Dillinger S, Straub T, Nemeth A. Nucleolus association of chromosomal domains is largely maintained in cellular senescence despite massive nuclear reorganisation. PLoS One. 2017;12(6):e0178821. pmid:28575119
- 62. Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic acids research. 2014;42(W1):W187–W91. pmid:24799436
- 63. Cheng Y, Liu M, Hu M, Wang S. TAD-like single-cell domain structures exist on both active and inactive X chromosomes and persist under epigenetic perturbations. bioRxiv. 2021:2021.05.12.443887. pmid:34749781
- 64. Mitter M, Gasser C, Takacs Z, Langer CC, Tang W, Jessberger G, et al. Conformation of sister chromatids in the replicated human genome. Nature. 2020;586(7827):139–44. pmid:32968280
- 65. Szabo Q, Bantignies F, Cavalli G. Principles of genome folding into topologically associating domains. Science advances. 2019;5(4):eaaw1668. pmid:30989119
- 66. Lazaris C, Kelly S, Ntziachristos P, Aifantis I, Tsirigos A. HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking. BMC genomics. 2017;18(1):1–16.
- 67. Davidson IF, Peters J-M. Genome folding through loop extrusion by SMC complexes. Nature Reviews Molecular Cell Biology. 2021;22(7):445–64. pmid:33767413
- 68. Hao X, Parmar JJ, Lelandais B, Aristov A, Ouyang W, Weber C, et al. Super-resolution visualization and modeling of human chromosomal regions reveals cohesin-dependent loop structures. Genome biology. 2021;22(1):1–20.
- 69. Nuebler J, Fudenberg G, Imakaev M, Abdennur N, Mirny LA. Chromatin organization by an interplay of loop extrusion and compartmental segregation. Proceedings of the National Academy of Sciences. 2018;115(29):E6697–E706. pmid:29967174
- 70. Schwarzer W, Abdennur N, Goloborodko A, Pekowska A, Fudenberg G, Loe-Mie Y, et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature. 2017;551(7678):51–6. pmid:29094699