Transient chromatin properties revealed by polymer models and stochastic simulations constructed from Chromosomal Capture data

Chromatin organization can be probed by Chromosomal Capture (5C) data, from which the encounter probability (EP) between genomic sites is presented in a large matrix. This matrix is averaged over a large cell population, revealing diagonal blocks called Topological Associating Domains (TADs) that represent a sub-chromatin organization. To study the relation between chromatin organization and gene regulation, we introduce a computational procedure to construct a bead-spring polymer model based on the EP matrix. The model permits exploring transient properties constrained by the statistics of the 5C data. To construct the polymer model, we proceed in two steps: first, we introduce a minimal number of random connectors inside restricted regions to account for diagonal blocks. Second, we account for long-range frequent specific genomic interactions. Using the constructed polymer, we compute the first encounter time distribution and the conditional probability of three key genomic sites. By simulating single particle trajectories of loci located on the constructed polymers from 5C data, we found a large variability of the anomalous exponent, used to interpret live cell imaging trajectories. The present polymer construction provides a generic tool to study steady-state and transient properties of chromatin constrained by some physical properties embedded in 5C data.

summarizes the values of the spring constants κ m,n that we computed from the EP matrix (see Materials and methods) for simulating long-range connectors between monomer m and n. The right column indicates which TADs are connected by the monomer-pairs. The values are 1.1 to 3 times higher than the spring constant of randomly added connectors, and for adjacent monomers in the linear backbone (nearest-neighbor connection) of the polymer (κ = 3 × 10 −5 N m −1 ).

Comparison of the experimental and simulation encounter data
We now compare the EP matrices about the steady-state of our models and the experimental data. We compare the experimental EP matrix (S1A Fig) with the simulation of the model with persistent and random connectors (S1 Fig B). The simulated EP matrix is much smoother than the experimental 5C EP matrix.
To estimate the similarity between the EP matrices, we computed the cumulative distribution function (CDF) of the monomer encounters. The CDF is computed by averaging the EP between all monomers m and n from an encounter frequency matrix M . For each row of M , we start from the diagonal entry and average counts at symmetrical positions. The EP is given by Table 1. Spring constants for long-range interactions. Long-range fixed connectors added between monomers pairs indices (left column) and their computed spring constant (middle column, see Methods in main text), form connections within and between TADs as indicated in the right column We average over N = 307 rows of the matrix, leading the CDF defined by We use Eq 2) to compare the EP matrices, by computing the CDF for the experimental F (k) Exp and the simulation F (k) Sim data respectively. Finally, we shall use the Kolmogorov-Smirnov (KS) distance, defined by The KS distance is computed for the simulation of the model with persistent long-range connectors (subsection 3.3 of the main text) that we compare with the model with persistent long-range and random connectors (subsection 3.4 of the main text). In S1 Fig C, we show the CDF of the model with only persistent connectors gives D max = 0.15. The CDF of the model that contain both persistent and random connectors (S1 Fig D), leads to an improved agreement with experimental data indicated by the value D max = 0.06 at a level of 0.001 (P-Value= 0.06).

MSD and anomalous exponent statistics for single polymer realization
We describe in this section the MSD along single monomer trajectories for polymer realizations having connectivity matching the one calibrated from the 5C data based on 6 and 10 randomly connectors for TAD D and E, respectively (this construction is already described in Fig 4 of the main text). Long-range connectors are included, as listed in Table 1. The anomalous exponent values for each single monomer trajectory is estimated by fitting a power law to the MSD: To each MSD curve, we computed the two parameters A and α (anomalous exponent). We repeated the MSD analysis and the computation of the anomalous exponent when TAD E was removed. We found that the values of the anomalous exponent remained distributed in the range [0,1] (S2B Fig left column), similar to the case where TAD E is included. The MSD curves were divided into 3 classes as described above (S2B Fig right column), and we found that the majority of them (87%) saturate (blue), the medium class included 12% of the curves (green), which displayed mixed saturated and rapid MSD increase. Only 1% of the MSD curves were classified into high class (red curves), displaying rapid MSD increase, similar to the case with TAD E included.
We conclude that the distribution of anomalous exponents extracted from SPTs depends on the polymer configuration generated by random connectors. In addition, the distribution of MSDs in TAD D was sensitive to the influence of TAD E (as shown in comparing S2A Fig with B). The exact scaling law that connect the anomalous exponent measured in SPTs with the EP decay exponent of the Chromosomal Capture data remains unknown, although it is now clear that increasing the chromatin connectivity, leading to a higher decay exponent is characterized by a smaller anomalous exponent.

Computational tools
The data analysis and stochastic simulations were performed using our codes on Matlab 2015. The source codes and description are now available on our website http://bionewmetrics.org/. For the chromatin visualizations in Figs 2-5 of the main text, we use the UCSF Chimera software version 1.11. Distribution of the anomalous exponent for 3 polymer realizations (left column). The MSD curves for each realization is shown in the right column. MSD curves are classified into 3 classes: low (blue), medium (green) and high (red) based on the MSD value at time 35 sec. B Distribution of anomalous exponent (right column) for TAD D, when TAD E is removed. The distribution in each class is given by low (blue, 85%) medium (green, 12% of curves) and high (red, 1% of curves).