Mapping the semi-nested community structure of 3D chromosome contact networks

Mammalian DNA folds into 3D structures that facilitate and regulate genetic processes such as transcription, DNA repair, and epigenetics. Several insights derive from chromosome capture methods, such as Hi-C, which allow researchers to construct contact maps depicting 3D interactions among all DNA segment pairs. These maps show a complex cross-scale organization spanning megabase-pair compartments to short-ranged DNA loops. To better understand the organizing principles, several groups analyzed Hi-C data assuming a Russian-doll-like nested hierarchy where DNA regions of similar sizes merge into larger and larger structures. Apart from being a simple and appealing description, this model explains, e.g., the omnipresent chequerboard pattern seen in Hi-C maps, known as A/B compartments, and foreshadows the co-localization of some functionally similar DNA regions. However, while successful, this model is incompatible with the two competing mechanisms that seem to shape a significant part of the chromosomes’ 3D organization: loop extrusion and phase separation. This paper aims to map out the chromosome’s actual folding hierarchy from empirical data. To this end, we take advantage of Hi-C experiments and treat the measured DNA-DNA interactions as a weighted network. From such a network, we extract 3D communities using the generalized Louvain algorithm. This algorithm has a resolution parameter that allows us to scan seamlessly through the community size spectrum, from A/B compartments to topologically associated domains (TADs). By constructing a hierarchical tree connecting these communities, we find that chromosomes are more complex than a perfect hierarchy. Analyzing how communities nest relative to a simple folding model, we found that chromosomes exhibit a significant portion of nested and non-nested community pairs alongside considerable randomness. In addition, by examining nesting and chromatin types, we discovered that nested parts are often associated with active chromatin. These results highlight that cross-scale relationships will be essential components in models aiming to reach a deep understanding of the causal mechanisms of chromosome folding.


Introduction
Mammalian genomes fold into a network of 3D structures that facilitate and regulate genetic processes such as transcription, DNA repair, and epigenetics [1,2]. Most discoveries derive from chromosome capture methods, such as Hi-C, which measure the number of contacts between DNA segment pairs and allow researchers to construct genome-wide 3D contact maps [3][4][5]. These maps show that chromosomes comprise a spectrum of 3D structures spanning a range of scales: megabase-scale A/B compartments, sub-megabase-scale Topologically Associated Domains (TADs), and short-ranged loops. Some of these structures are associated with epigenetic marks, active genes, and architectural proteins that reshape chromatin, such as CCCTC-binding factors (CTCF), cohesin complexes, and CP190 [6][7][8][9].
At first glance, Hi-C maps appear hierarchical, where DNA regions sharing high contact counts fold into larger and larger structures. This scheme is appealing because it proposes a simple folding mechanism leading to densely packed DNA without over-entanglement. It also predicts the existence of alternating megabase-sized 3D structures appearing in most Hi-C maps as plaid patterns [3,6,10]. More specifically, TADs tend to aggregate into sub-compartments (denoted A1, A2, B1, . . ., B4) [11].
This folding scheme also posits that chromosomes form a perfect hierarchy. In other words, once two DNA regions join, such as two TADs, they remain in the same super-structure throughout the upstream folding hierarchy. This idea is the keystone assumption in several studies [12][13][14][15]. While it can explain how A/B compartments form and foreshadows the co-localization of some functionally similar DNA regions, critical observations question the basic idea.
First, 3D communities are not necessarily contiguous DNA segments [16]. Assembling such disconnected communities into larger and larger building blocks inevitably leads to a non-perfect hierarchy. Second, if the hierarchy is perfect, it suggests that similar folding mechanisms act across several scales. However, this conclusion is inconsistent with the competing mechanisms that seem to form TADs and A/B compartments: loop extrusion and phase-separation [17,18]. Third, in a recent paper [19], researchers fit a Gaussian polymer model to Hi-C data and recovered several established sub-structures-TADs, subTADs, A/B compartments, etc.-showing that they were not perfectly hierarchical.
This paper aims to unveil the actual folding by charting the cross-scale structural relationships from empirical data. In particular, we use data from Hi-C experiments that we recast into a weighted network of 3D interactions and use tools from network science to find the optimal community assembly while scanning through the networks' layers of organization. By mapping the hierarchical relationships between these assemblies, we find that some nest, others segregate, and still others are not significantly different from random. To better understand these results, we propose a minimal folding model mixing perfect and random nesting. We also relate community nesting to established chromatin states. We discovered that communities associated with active transcription are more distinct and show significant nesting relative to the chromosome-wide average.

Hi-C data treatment
We used Hi-C data for the human cell line GM12878 (B-lymphoblastoid) [4], downloaded from the GEO database [20]. We used the MAPQG0 data set at 100 kilobase-pair (kb) resolution. Stored in matrix form, the Hi-C data contains the pairwise contact counts between DNA loci. We omit inter-chromosome contacts due to their low signal-to-noise ratio [21,22].
We treat the Hi-C data as a DNA contact network. Each network node represents a 100 kb DNA segment, and the link weights are proportional to the number of measured Hi-C contacts. We use network methods to extract communities harbouring densely connected nodes that maintain fewer contacts with the rest of the network (the generalized Louvain method, see Methods: The GenLouvain algorithm-detecting 3D communities in Hi-C data).
Before investigating the community structure, we normalized the raw Hi-C counts to reduce biases and to make fair comparisons between chromosomes that may vary in size up to one order of magnitude. In particular, we use the Knight-Ruiz (KR) matrix balancing [23] implemented in gcMapExplorer [24].

The GenLouvain algorithm-Detecting 3D communities in Hi-C data
To find communities in the Hi-C contact network, we use the generalized Louvain method (GenLouvain) [25,26]. It is a community detection method that takes advantage of so-called modularity maximization to find the optimal community division of a network. By "optimal," we mean the community assembly that maximizes the number of internal contacts, measured by a so-called modularity function, with respect to a null hypothesis. A common null model is random rewiring keeping the node degree fixed. GenLouvain is a greedy optimization algorithm that starts with single-node communities and then searches for the optimal solution by generating trial node agglomerates and evaluating the modularity function.
We chose GenLouvain based on principled and practical aspects. First, it is a generalized version of one of the most intensively tested algorithms: Louvain [25]. Second, for practicality, the developer's open-source codes are written in MATLAB, allowing us to modify essential parts such as the null-model term, which we will elaborate on in the forthcoming paragraphs.
One critical feature of GenLouvain is its resolution (or scale) parameter γ. This parameter allows us to sweep through the scales of the network and probe the network's community spectrum. Furthermore, this parameter is closely related to the parameters capturing the relative tendency of intra-versus inter-group connections in the context of the stochastic block model [27]. Mathematically, γ is a part of the modularity function M, defined as where A ij is the network's adjacency matrix representing the weight of the edge connecting nodes i and j, and the summand is counted only if i and j belong to the same community, thus the Kronecker delta δ(g i , g j ). In our case, A ij corresponds to the KR-normalized Hi-C matrix. Furthermore, the second term of the summand in Eq (1) represents our null hypothesis for the network's "background" connectivity. Building on previous work [16], we use the so-called fractal-globule (FG) null-model term P FG ij that assumes that the average interaction strength between two DNA segments, i and j, decays as a power-law with exponent -1. The FG null model term is where the strength k i represents the sum of weights around node i, 2m = ∑ i k i is a normalization constant, and / 1/|i − j| is the expected amount of reduced interaction as a function of the one-dimensional distance separating nodes i and j. This decay follows the fractal-globule scaling [16,28,29] that agrees with chromatin contact decay in Hi-C experiments, where the exponent is approximately −1.08 [3]. Note that we may use any decay exponent in Eq (2). For example, −0.75 matches better with within-TAD contacts, [30], but we kept −1 as our 3D communities are larger than TADs.

Network nestedness
By varying the resolution parameter γ embedded in the GenLouvain algorithm, we scan through the scales of chromosomes' 3D organization. While scanning, we keep track of uninterrupted DNA segments-we will refer to these segments as domains in Results-and how they distribute between the communities as the scale changes. This allows us to chart crossscale folding relationships. To better understand these relationships and quantify the deviations from a perfect hierarchy, we use an approach developed for ecological networks [31,32]. Designed for interacting species pairs, say plants and pollinators, this approach rests on a nestedness metric, called N ij , measuring how many plants two pollinators have in common compared to a random benchmark. In our case, we track how many DNA segments (domains) two 3D communities share, given that they appear at different hierarchical levels (different γ values)-by construction, two communities at the same hierarchical level do not share any domains. We illustrate the philosophy behind N ij in Fig 1. The N ij metric benchmarks the community overlaps to a combinatorial link redistribution, normalized to vary between −1 and +1. These endpoints indicate complete network segregation (N ij = −1) or perfect nesting (N ij = +1). When perfectly nested, the larger community engulfs the domains in the smaller ones. When completely segregated, the communities do not share any domains. In a perfect hierarchy, like a phylogenetic tree, the nestedness is either +1 or −1, indicating full nesting or complete segregation. But in a more complex multi-scale structure, N ij takes any value between these two extremes because the communities may share more or fewer domains relative to a random overlap. We note that N ij is normalized so that the midpoint N ij = 0 represents random overlap and that N ij = ±x indicates the same relative proportion x of segregation or nesting. We exemplify this property in We go through several steps to calculate N ij . First, we extract the overlap S ij between two communities i and j from data-we study nestedness in empirical (Hi-C-derived) and simulated data. Second, we calculate the expected overlap μ ij assuming a random arrangement. Denoting d i as community i's internal number of domains, μ ij is [31] where n is the total and k is the shared number of domains. Next, we shift S ij by μ ij to center the expected overlap for random arrangement at zero and normalize so that N ij 2 [−1, 1]: where O ij is the maximum or minimum achievable overlap, depending on if S ij > μ ij or S ij < μ ij . In these cases, we calculate O ij as Here i and j share one domain more than expected (S ij = 5). This yields N ij = 0.5 because their overlap is at the midpoint between the random and the maximum overlap (S ij = 6) that would result in ideal nesting (N ij = 1). We illustrate this with the orange stripe spanning half of the range μ ij (= 4) � k � 6. This example shows that N ij measures the relative overlap compared to what is achievable given the link density rather than absolute numbers. https://doi.org/10.1371/journal.pcbi.1011185.g001 (b) S ij < μ ij , which is further classified into two cases: Significant community overlap and p-values. In addition to the expected overlap μ ij , we calculate the likelihood that two communities share S ij domains given the random null hypothesis. Under this hypothesis, the probability that S ij = k is [31,32] To assign p-values to the observations, we sum PðS ij ¼ kÞ over k. However, depending on if k is smaller or larger than S ij , we must separate two cases In our analyses, we set the significance threshold to p � 0.025 to distinguish significant from random overlap.
We downloaded the HMM data from ENCODE (human cell line GM12878) [34]. The data is a genome-wide list of start-and-stop coordinates for each HMM state, where each instance is called a "peak". To determine the HMM content in a long DNA stretch, say a community, we count the number of peaks belonging to each of the 15 states. Because some HMM peaks may cross community borders, we count the number of peak starts.
Next, to calculate the enrichment, we use a hypergeometric test that benchmarks the HMM content in a community to the chromosome-wide random expectation (sampling without replacement). The test goes through the following three steps.
1. Get community content from HMM data. We denote the number of peaks for each state as k X , X = S1, . . ., S15.
2. Calculate the expected number of X peaks k 0 X given the community's total peak count n as where N is the total number of peaks in the chromosome (including all HMM states), and K X is the number of X peaks in the chromosome.
3. Calculate the p-value for k 0 X under the hypergeometric null hypothesis. If less than the significance threshold of 0.05, we consider the community enriched or depleted in HMM state X (two-sided test). However, because we make multiple comparisons, one for each HMM state, we correct the p-value to reduce the false discovery rate. We do this using the Benjamini-Hochberg procedure [35] implemented in Python statsmodels [36]. We set the false discovery rate to 0.05.
After going through all communities using this procedure, they get labeled as "enriched" or "depleted" in each of the 15 chromatin states. We point out that one community can be enriched in several HMM states.
In addition, to make our analysis more tractable when studying the nestedness of different chromatin types, we make a coarser classification and partition the communities into four large groups, A-D. These groups reflect the overall HMM state enrichment: A: Active promoters (S1-S2), B: Enhancers (S4-S7), C: Transcribed regions (S9-S11), and D: Heterochromatin (S3 and S12-S15).

Distant domains aggregate into 3D communities spanning a range of scales
To illustrate how the 3D communities partition the chromosome, we superimpose GenLouvain-derived communities as squares along the Hi-C map's diagonal in Fig 2A. By assigning each community a unique color, we see that some 3D communities contain distant DNA segments. This community type-a distributed assembly of DNA segments-widens commonly used 3D partitions, like TADs, that assumes contiguous DNA stretches.
Furthermore, Fig 2B shows that some DNA stretches rarely break across a wide range of γ. We call these indivisible pieces "irreducible domains". We collect them by we collecting borders of intact DNA segments across many γ values into one list. We show the domains in the upper turquoise stripe in Fig 2B and their size distribution in S4 Fig. Admittedly, making γ large enough, we break even the domains into smaller linear DNA pieces so that eventually every Hi-C bin (100 kb) represents one domain. However, we do not cover this extreme limit here. To better visualize such relations, we constructed a hierarchical tree from the same Hi-C data set (chromosome 10), showing how domains, the least divisible DNA regions, join into large 3D structures that, in turn, make up 3D communities (Fig 3).

3D communities do not form perfect hierarchies
To construct the tree in Fig 3, we first extracted chromosome 10's domain list and calculated the optimal community division associated with a few γ values. Next, we stored the folding pathways of all domains by tracing how their community memberships change with γ ( Fig  3A, left). The circular tree illustrates the collection of all these pathways, where the links indicate how domains (filled circles on the outer rim) assemble into 3D structures (filled circles on Hi-C maps, 3D communities, and domains. (a) Hi-C maps where the red-to-blue pixel colors are a proxy for short-to-long 3D distances. The squares decorating the map's diagonals represent GenLouvain-derived 3D communities for three γ values (0.5, 0.6, and 0.7). Above each map, we show the community coverage as a colored stripe. Having unique colors, we observe that the communities comprise scattered linear DNA segments. The white cross shows the centromere. (b) Community borders and coverage across chromosome 10 for 16 γ values. The upper turquoise stripe shows DNA stretches that never split for 0 < γ ≲ 1. We refer to these indivisible regions as domains. The smallest 3D domain is 100 kb long, which is the resolution limit of the Hi-C data set we use.
https://doi.org/10.1371/journal.pcbi.1011185.g002 . Each ring represents one value of GenLouvain's resolution parameter γ, and the diameters of the filled circles are associated with their DNA length (measured by the number of Hi-C bins). The red circles mark delocalized 3D structures forming a single 3D community at γ = 0.9 (denoted 10 0.9 ). The dark links show folding trajectories for the domains passing through 10 0.9 towards the root. The left panel shows a two-domain folding path and defines our label convention. We plotted the tree using RAW Graphs [37]. (b) Joining and splitting of the 13 domains belonging to the community 10 0.9 . These domains (filled dark-blue circles) pass through the 3D communities (open circles), joining other domains (filled light-blue circles). The edges connect 3D communities with dark-blue domains. We also highlighted these folding pathways in (a) (dark links).
The tree in Fig 3A looks hierarchical. But a more complex pattern emerges if factoring in the 3D communities. To this end, we print the community ID next to a few filled circles (black numbers). In addition to the ID, we add a subscript to indicate the γ value (e.g., ID γ ). However, we omit ID numbers for the domains on the outer rim. Instead, their numbering denotes sequential ordering along the chromosome (e.g., domain 2 is next to domains 1 and 3, etc.).
Interestingly, the same community ID appears several times within one γ ring. One example is the community 10 0.9 , highlighted in red, that appears five times on the γ = 0.9 ring. This community contains 13 domains scattered over the chromosome, as seen from their non-consecutive ID numbers. But even if scattered, they belong to the same 3D community that is a part of the optimal network division (according to GenLouvain). Delocalized domains forming communities this way is a hallmark of imperfect hierarchical folding.
To further exemplify this observation, we depict the folding pathways of the 13 individual domains belonging to the community 10 0.9 in Fig 3B. By following the folding paths (edges) from left to right, we see that these domains (filled dark-blue circles) start in the same community and then split apart to become members of other 3D communities having different domain content (light-blue circles). Going even further to the right, 10 out of 13 dark-blue domains join yet again into one huge community (at γ ≳ 0.6). Again, this complex mergingand-splitting behavior is far from a perfect hierarchy. For clarity, we highlighted community 10 0.9 's folding pathways as dark lines in (a) connecting the violet circles.
Quantifying chromosome nestedness Fig 3 shows that domains mix between 3D communities as they approach the tree's root. This finding suggests that the folding mechanics is not perfectly hierarchical. To quantify deviations from being perfect, we calculate the pairwise community-domain overlap relative to random chance between two communities, i and j, belonging to different tree rings. To this end, we use a normalized nestedness metric, denoted N ij , that varies from −1 to +1. These two extreme points indicate complete segregation (N ij = −1) and perfect nesting (N ij = 1). When N ij = 0, the overlap is not different from being random. We outline the explicit calculations and some of N ij 's critical properties in Methods: Network nestedness and show a schematic in Fig 4A. To study the cross-scale nestedness in Hi-C-derived trees, like Fig 3, we calculated N ij across several γ values in four chromosomes (3, 5, 10, and 22); we choose these to mix large, intermediate, and small chromosomes. Plotting the N ij histogram for all chromosomes in one graph, we find that the distribution has two pronounced peaks at ±1 and a flat but slightly rightskewed intermediate region (Fig 4B). These two peaks indicate that some communities segregate (−1) while others nest (+1), just like in a perfect hierarchy that is either completely segregated or fully nested. However, the histogram's intermediate N ij region is not zero and thus differs from an ideal hierarchy. This telltales that the 3D folding blends hierarchy-breaking contacts where some are possibly random.
To separate significant from random overlaps in Fig 4B, we calculated the probability that two 3D communities, having sizes d i and d j , share k domains in a random assignment-we defer all details to Methods: Network nestedness. Based on this probability, we associate p-values to each N ij observation. Setting the threshold to p � 0.025, we count the fraction of significant observations and illustrate the relative proportions in Fig 4C. In orange, we highlight significant overlaps. In blue-green, we indicate overlaps that are indistinguishable from being random. From Fig 4C, we make three key observations. First, the most segregated part (−1) is almost entirely blue-green and thus classified as insignificant. Second, roughly half of the perfectly nested communities (+1) share significant domain overlaps. Third, two regions show substantial overlaps that err on the side of segregation (−0.95 < N ij < −0.8) and nesting (0.65 < N ij < 0.95). The remaining data points appear random, particularly those surrounding N ij = 0.
From Figs 2-4, we conclude that chromosomes fold into complex hierarchies that mix nested and segregated parts. On average, however, the nesting is close to being random N ij � 0 if neglecting the -1 peak that skews the average (if included, it is N ij � À 0:8). But since the distribution is so broad, community pairs show substantial differences where some are completely segregated, others are perfectly nested, and the rest is somewhere in between. This finding sheds new light on the hierarchical chromosome paradigm underlying several papers. Finally, in S5 Fig, we show community overlaps between specific γ pairs.

Modeling non-nested chromosome folding
Several papers assume that linear DNA regions, like TADs, form higher-order structures by folding into each other in a perfect hierarchy (e.g., [12][13][14][15]). However, our data show that the nesting is more complex (Figs 2-4). To better understand this disconnect, we propose a model for semi-nested chromosome folding. At the core, the model assembles ideally nested domain groups, consistent with the significant nestedness seen in the N ij histograms (Fig 4). Then, we break this pattern by reshuffling some domains among the communities. We denote the critical reshuffling parameter Q that represents the probability that two domains will change community memberships. Below, we outline the Q = 0 (perfect hierarchy) and Q > 0 limits separately.
Perfect hierarchical folding (Q = 0). To achieve ideal hierarchical folding, we agglomerate domains into superstructures and superstructures into yet larger superstructures, following a few simple steps. First, we calculate the pairwise domain-domain interaction strength from their average Hi-C contact frequency (domains typically consist of several Hi-C 100 kb bins). Second, we select the domain pair having the strongest interaction and merge them into a superstructure. Then we replace the two merged domains in the list of pairwise interactions with the new superstructure and join the next most interacting pair. Regardless of choice, this scheme yields a new superstructure at each iteration. Notably, the algorithm does not only merge linearly adjacent domains.
Once we merge all domains into a giant superstructure, we use the domains' folding paths to organize the superstructures into a circular tree (Fig 5A). However, unlike the Hi-C derived tree in Fig 3,  To illustrate that this scheme produces an ideal hierarchy, we select a group of 12 domains (red-filled circles, outer rim) and highlight their folding paths across the tree with heavy links. After forming one super structure ('453', dark violet), this domain group stays intact as it joins more and more domains forming increasingly larger superstructures (violet circles). This exemplifies that domains never split apart once they end up in the same superstructure. However, this behaviour contrasts with what we observed in Fig 3, where domains split and merge as they form communities. Therefore, this simple description cannot explain actual chromosome folding. We point out that the Q = 0 limit is nearly identical to the so-called metaTAD algorithm [12].
Hierarchical folding with randomness (Q > 0). The model yields a perfect domain hierarchy when the reshuffling parameter Q is zero. However, the actual folding patterns appear more complex. We exemplified this in Fig 2B, illustrating the cross-scale folding paths of 12 domains in chromosome 10. Following these paths, we note they do not perfectly correlate as they would in a perfect hierarchy. While some domains often stay together, others split only to reunite later. This represents the feature we aim to mimic by considering Q > 0.
To this end, we reshuffle a fraction of domains between the communities, restricting the reshuffling to communities within the same level of organization. The number of domains we interchange is proportional to Q. Algorithmically, we follow these three steps. (1) Go through all superstructures in the same organizational level (one ring in Fig 5A) and identify the domain IDs and superstructure memberships. We exclude domains that do not yet belong to any community. (2) Select two of these domains randomly and swap their superstructure memberships with probability Q. (3) Repeat (1)-(2) until we exhaust all domain pairs, excluding those we already interchanged. If one domain remains without a pair, we keep its superstructure membership. Next, we pick another tree ring and repeat steps (1)-(3).
By varying the parameter Q, we retrieved several N ij distributions. To find the optimal Q opt . -the Q that produces the N ij distribution that is most similar to the real data-we utilized a Kolmogorov-Smirnov test. This test gave Q opt . � 0.3 (Supplementary Material, S8 Fig and S7 Text). Fig 5B shows the associated domain folding paths.
In contrast to Q = 0 in Fig 5A and 5B shows that the hierarchy breaks when Q > 0. We highlighted the domains forming the same superstructure we studied in (a) ('453', dark violet) to better see the difference. Like in (a), this superstructure has 12 domains (11 out of 12 are the same). But unlike (a), superstructure 453 appears in different tree branches. This better reassembles the Hi-C derived tree in Fig 3B, where domains merge that do not have identical domain-to-root folding paths.
We point out that the tree's backbone formed in this way is identical to the Q = 0 case, but the domain memberships differ. Therefore, we foreshadow that this model is valid for small Q. . Filled circles on the outer rim represent domains; the root symbolizes the entire chromosome. We align the domain aggregates (superstructures) with the inner tree rings, each defining a scale of organization. We select a few domains (red-filled circles) and show their domain-to-root paths with thick edges. These domains assemble into yet larger structures (violet) at every inner ring. As soon as the domains merge into a superstructure labeled '453' (dark violet), they never split apart. (b) Semi-hierarchical folding (Q = 0.30). As in (a), we color the domains in red that merge into a superstructure '453' and highlight their folding paths with thick edges going from the outer rim to the root. Unlike (a), node '453' is scattered across seven tree branches. Thus, '453' only partially nests into larger structures and the domains split and reunite when approaching the root. (c) Nestedness histogram when Q = 0 (ideal hierarchy, red bars) and Q = 1 (random nesting, open bars). When Q = 0, we see two peaks at N ij ± 1, indicating complete segregation and full nestedness. When Q = 1, the domains are fully randomized between the superstructures. While there is still perfect nesting and segregation (as we expect from the random null hypothesis in Methods: Network nestedness), there is also partial overlap for −0.8 < N ij < 0.8. (d) Nestedness histogram with some randomness (Q = 0.30, light-blue bars) overlaying the actual GenLouvain-derived data for chromosome 10 (dark-grey bars). We produced (a) and (b) using RAW Graphs [37]. But as we show in the following section, this is enough to reproduce the actual nestedness distribution in Fig 4. Nestedness for hierarchical and semi-hierarchical folding. To study how the Q parameter in the model affects superstructure nestedness, we calculated and studied the N ij histograms. Just as in Results: Quantifying chromosome nestedness, we calculate these histograms by going through all superstructure pairs, omitting those belonging to the same tree ring, and counting the number of shared domains (Methods: Network nestedness). We show three cases in Fig 5C and 5D: perfect hierarchy (Q = 0), full randomness (Q = 1), and intermediate randomness (Q = 0.30). All three cases build on domains derived from chromosome 10. Below, we discuss each case separately.
As expected for the ideal hierarchy, Fig 5C has two isolated peaks at ±1 (red bars), indicating that the communities are either fully nested or fully segregated. Put differently, the structure is "modular." These peaks also appear in the complete randomness limit (Q = 1). However, the +1 bar is lower relative to the Q = 0 case, and there is a distribution of N ij values surrounding N ij = 0, albeit not as wide as the actual data. We interpret this as the domain reshuffling split several nested communities while keeping the segregation primarily intact.
To better mimic the real data, we tweaked Q to reassemble the actual N ij histogram. In Fig  5D, we show the Q = 0.30 case overlaying the empirical data for chromosome 10. Apart from underestimating the histogram for negative N ij values and overestimating it for large values, the two histogram lies on top of each other for the most part. This shows that the reshuffling parameter must not be large for the model to reproduce the nestedness data in Fig 4B. About 30% domain redistribution seems enough.

Nestedness and chromatin states
In Fig 4, we found that some communities nest and others segregate. Also, Fig 5 showed that we could reproduce the chromosome-wide nestedness distribution by slightly breaking an otherwise perfect folding hierarchy. This section analyzes if this behavior is associated with specific chromatin types.
To this end, we take advantage of published data that partition the genome into 15 chromatin states [33]. However, to make the analysis more tractable, we aggregate these states into four groups A-D, and study their pairwise nestedness. The groups are: promoters (A), enhancers (B), transcribed regions (C), and heterochromatin (D) (see Methods: for complete definitions). To assign communities to these groups, we calculate folds of enrichment for each of the 15 chromatin states relative to the chromosome-wide average. We then use the hypergeometric statistical test to judge the enrichment significance (see Methods: Chromatin states and folds of enrichment for details). Notably, because one community may enrich several chromatin states, it can belong to several A-D categories.
Next, we go through all community pairs to extract their nestedness N ij and chromatin group (A-D). Then we plot N ij histograms for all paired combinations-AA, AB, AC, AD, BB, etc. We show these histograms as panels in Fig 6B, where the light blue background portrays the entire chromosome's nestedness (we use data from chromosome 10). The diagonal panels represent community pairs having the same chromatin type (AA, BB, CC, and DD). These pairs nest more than the rest of the chromosome as the N ij distributions skew to the right. This observation differs from DD, which seems to follow the chromosome's overall nestedness distribution. To lend quantitative support to these observations, we performed a Kolmogorov-Smirnov test, which compares the cumulative distribution functions of the histograms AA, BB, CC, and DD ( Supplementary Material, S6 Fig). However, we could argue that the folding structure is segregated rather than nested because the average N ij is negative in all diagonal panels; it becomes negative due to the large peak at −1 skewing the average. But as we showed in Fig 4, this peak represents mostly random segregation (admittedly, roughly half of the +1 peak is also random), and the significant overlaps mostly appear for N ij > 0 where AA-CC histograms carry heavy weight. Therefore, we conclude that these chromatin groups nest more than the chromosome average and that the nesting is significant.
Furthermore, the off-diagonal panels in Fig 6B show the N ij histograms for the six cross pairs, AB, AC, etc. But as noted above, some communities may enrich two groups simultaneously, say A and B. So when studying the AB cross-pair, it is natural to analyze separately communities enriched in A, B, or both, those enriched in A and B simultaneously, or those enriched in only A or only B. We depict these combinations and the color coding in Fig 6A, where the large dashed circles encompass all communities flagged as A or B. At the circles' intersection, communities are enriched in A and B (half-filled circles).
The blue histograms in the lower triangular part of Fig 6B show the nestedness among communities belonging to the broadest class (e.g., A, B, or both). These off-diagonal histograms show that A, B, and C types tend to nest with each other (panels AB, AC, and BC), similar to AA-CC along the diagonal. In contrast, their overlap with D shows a wider variability reassembling the chromosome-wide N ij distribution, apart from the dip close to N ij = 1, hinting that A-C nest less with D than is expected. This observation likely reflects that A-C broadly belongs to what is commonly referred to as "active chromatin" and D is "inactive chromatin" (e.g., measured by low or high RNA expression levels). In addition, a more granular study analyzing all 15 chromatin states showed that the five states making up group D rarely enrich more than random alongside the others in A, B, and C. This differs from the A-C communities, where the internal chromatin states often co-appear. This explains why the significant nesting with group D is relatively scarce (see AD, BD, and CD panels).
In the upper triangular part in Fig 6B, we show stacked N ij histograms for the other two more restricted cross pairs (e.g., communities simultaneously enriched in A and B, or only A or B). In blue-green, we represent the intersection (e.g., A and B), and orange symbolizes the difference (e.g., only A or only B). Admitting that the sample size is relatively small, we note that the AB, AC, and BC histograms have a more significant fraction of data points for positive N ij values than those in the lower triangle, indicating more nesting. However, the AD, BD, and CD histograms remained almost identical.
In summary, when studying the cross-scale community nestedness, our data suggest that 3D communities belonging to "active chromatin" tend to nest more than the chromosome-wide average and appear to segregate from "inactive chromatin." The data also indicates that communities embedded in inactive chromatin seem to have substantial random cross-scale overlaps.

Active chromatin appears more hierarchical than inactive
To better understand the implications of the results in Fig 6 regarding the chromosome's 3D organization, we quantified how well different 3D communities partition the Hi-C network and if solid or weak divisions are associated with the chromatin groups A-D. To this end, we calculated the modularity associated with the GenLouvain-derived communities (M c ). To calculate M c , we use Eq (1) and sum only those terms belonging to the same community. If the modularity scores high, the internal nodes interconnect more than the background. If scoring low, they connect less (see Methods: The GenLouvain algorithm-detecting 3D communities in Hi-C data for an explanation). We recover the global modularity M in Eq (1) by summing over all communities, The community modularity varies significantly within and between chromatin groups A-D (S9 Fig). We also found that M c grows linearly with the community sizes (number of domains) (S9 Fig). Therefore, to make a fair comparison, we plotted the median modularity rescaled with the community sizes (Fig 7). The solid lines represent each chromatin group, including the global median modularity as a reference curve (dashed). These curves show that A-C communities ("active chromatin") have higher modularity than chromatin group D ("inactive chromatin") and the entire Hi-C network. This implies that A-C communities partition the network better than the D communities. To complement Fig 7, we show the M variability in S10 These findings argue that active chromatin is hierarchical. At least it is more hierarchical than the D communities that form less convincing communities with substantial random nestedness N ij . As we concluded from our simple folding model (Results: Modeling non-nested chromosome folding), random nesting breaks ideal hierarchies.

Conclusion
In this paper, we have mapped out the semi-hierarchical organization of chromatin in human cells. Viewing the Hi-C data as a DNA contact network, we extracted significant 3D structures using the GenLouvain community detection algorithm that allows us to scan seamlessly through different organization scales. Contrasting common assumptions, the communities form non-hierarchical structures, where some organizational levels show substantial randomness. To better understand this result, we developed a model blending hierarchical folding and random contacts. This model reproduces the degree of nestedness we observe in actual data. We also study the nestedness in terms of chromatin states. We uncover that transcriptionally active states tend to nest more with each other and form more distinct 3D communities relative to the chromosome-wide average and inactive or repressed chromatin.
Our results derive from 100 kb Hi-C data. However, our approach is not restricted to any specific resolution or interaction matrix. It can efficiently analyze various chromatin interaction matrices such as single-cell Hi-C (scHi-C [38]), HiCap [39], HiChIP [40], and distance matrices [41]. Nevertheless, modifications to the GenLouvain null model may be necessary for some of these scenarios. For instance, if using Hi-C at higher resolution, e.g., 1 kb, numerous contacts appear inside TADs where the contacts decay as a power-law with an exponent of −0.75, rather than approximately −1 [42].
Next, we use the GenLouvain method to extract 3D communities. However, generating communities this way is a random process, meaning that the nodes' community membership will differ between realizations even if the scale parameter γ is the same. Interestingly, the community-node correlation varies with γ, indicating that some communities are more stable than others. This problem exists in most complex networks where multi-scale interactions govern the organization [43]. Therefore, depending on algorithm design, two community detection methods focusing on slightly different connectivity features may disagree on the optimal node assembly. This aspect is mainly unexplored for chromosome organization and worth pursuing in future work.
Furthermore, different community detection algorithms may disagree on the hierarchical levels. For instance, the Leiden algorithm [44] could yield a more strict community hierarchy than GenLouvain (the Leiden algorithm was developed to "solve" the non-hierarchical features of the original Louvain method). One of the authors of this paper tried the Leiden algorithm to study general community inconsistency problems [43]. That experience showed that the specific property of reducing the inconsistency does hamper proper detection of scale-dependent inconsistency. While studying the effects of other algorithms is a reasonable research direction, it does not invalidate our approach, which is agnostic to the specific algorithm choice.
There are several TAD-finding methods [45]. These can be broadly categorized into feature-based algorithms, clustering methods, and graph-partitioning tools [46]. In this paper, we use a technique that falls into the graph-partitioning category, which includes perhaps the most popular community-detection algorithm based on modularity maximization. For example, Ref. [47] finds TADs using Louvain but assumes that the background connectivity is a random network under given node degrees (the Newman-Girvan model). In contrast, we use the fractal-globule null model, which better agrees with the empirical distance decay in contact probability in human Hi-C maps. Although the approach is similar to ours-varying γ and extracting communities-there are meaningful quantitative differences. For example, the fractal globule model tends to capture widely spread and delocalized 3D communities, while the Newman-Girvan model typically groups contiguous DNA stretches into local communities, like TADs. In addition, another reference combines maximum modularity and Hi-C-like distance decay and extracts communities for different γ values [48]. However, they treat TADs as unbroken DNA stretches, not delocalized as we do here. Using polymer simulations [16], we demonstrated that this generalization partitions spatially close monomers into meaningful 3D communities.
We interpret our data as active chromatin being more hierarchical than inactive chromatin. From a biological standpoint, this has exciting implications. In active chromatin, there is a menagerie of specific proteins, like transcription factors, that coordinate transcription regulation. These proteins interact with chromatin elements at all distances, bringing some in 3D proximity to regulate transcription. These interactions are not random, so they could contribute to shaping the 3D structure toward a perfect hierarchy. While we lack data to validate this hypothesis, we note that our simple folding model requires fine-tuned interactions to create an ideal hierarchical order and that a slight degree of arbitrary nesting causes noticeable deviations.
While specific proteins regulate transcription in active chromatin, inactive chromatin is often epigenetically repressed. The proteins managing epigenetic repression decorate chromatin with chemical tags over large DNA regions (e.g., methylation of specific histone sites). In this respect, epigenetic repression is not relying on characteristic long-range attractions. It is enough if the right chromatin type is close. If so, this idea foreshadows many random contacts manifesting as a broad nestedness distribution, as in Fig 4, and a less hierarchical structure than active chromatin.
Several lines of evidence indicate that compartments and TADs emerge from distinct mechanisms, like loop extrusion and phase separation, and are not a hierarchy that stems from identical phenomena operating on different scales. Our paper sheds new light on the hierarchical organization resulting from these phenomena. While large sections nest, others segregate, and there is a significant portion of randomness. We anticipate that cross-scale relationships capturing these features will be essential components in future models aiming to reach a deep understanding of the causal mechanisms of chromosome folding. In the short perspective, our results further open questions worthy of research, including the reliability of 3D communities or what biological factors govern the different organization scales, such as DNA-binding proteins, epigenetic marks, or general chromatin types.  Fig. N ij histograms for individual γ-pairs (stacked bars with colors specified in legend). To better illustrate range −1 < N ij < 1, we truncate the bars at N ij ± 1 if their counts exceed 100. a) pairs between γ = 0.6 and all other γ. Cross-scale interaction between communities found at γ = 0.6 and all other communities are fully nested, fully segregated, but also their nestedness is close to random (−0.5 < N ij < 0.5); b) for γ = 0.7 and all other γ. The distribution changes for N ij * 0), showing that communities tend to be more nested (more counts in the range: 0 < N ij < 0.5); c) and d) for γ = 0.8 and 0.9 versus all other γ. We observe that distribution peaks near N ij = 0 when comparing to other distributions.