4D nucleome equation predicts gene expression controlled by long-range enhancer-promoter interaction

Recent experimental evidence strongly supports that three-dimensional (3D) long-range enhancer-promoter (E-P) interactions have important influences on gene-expression dynamics, but it is unclear how the interaction information is translated into gene expression over time (4D). To address this question, we developed a general theoretical framework (named as a 4D nucleome equation), which integrates E-P interactions on chromatin and biochemical reactions of gene transcription. With this equation, we first present the distribution of mRNA counts as a function of the E-P genomic distance and then reveal a power-law scaling of the expression level in this distance. Interestingly, we find that long-range E-P interactions can induce bimodal and trimodal mRNA distributions. The 4D nucleome equation also allows for model selection and parameter inference. When this equation is applied to the mouse embryonic stem cell smRNA-FISH data and the E-P genomic-distance data, the predicted E-P contact probability and mRNA distribution are in good agreement with experimental results. Further statistical inference indicates that the E-P interactions prefer to modulate the mRNA level by controlling promoter activation and transcription initiation rates. Our model and results provide quantitative insights into both spatiotemporal gene-expression determinants (i.e., long-range E-P interactions) and cellular fates during development.

one nucleosome.When fitting the model to the mESC data, a monomer became a 5kb region, which is more than 10-fold longer than a region spanned by a nucleosome.The confusing usages of the monomer's meaning needs to be addressed.While the authors can make the interpretation flexible, doing so would mean that the biological meanings for all parameters of the model will change accordingly.This can be problematic when interpreting the simulation results.In addition, it will be helpful if the authors can use realistic distributions of genomic E-P distances to explain the scales of their structural model.
Response: Thanks for your comment.To model a chromatin region, most work has used a much lower level of detail whereby a chromatin fiber is represented by a connected chain of monomers.
Each monomer represents a segment of DNA, whose length depends on the levels of coarsegrained chromatin or depends on the question being asked.During the fitting of mESC data, different monomer lengths do not change the biological meanings of the parameters or the fitting results.The Additional Table 1 shows the best-fit parameters using different monomer lengths (5kb, 2kb,1kb), corresponding to different d_G.We find that the change in d_G only leads to the elasticity coefficient k_NN multiplicative changes, which results in basically unchanged of k_NN/d_G (or the parameter in the Maxwell-Boltzmann distribution).For example, for cell line C2, the values of the d_G for different monomer lengths are 22 (5kb), 56 (2kb) and 112 (1kb).Therefore, the ratio k_NN/d_G is 0.0353 (=0.7758/22), 0.0356 (=1.9916/56) and 0.0349 (=3.9046/112), three being approximately equal.From a physical viewpoint, this can be explained as a coarsening process.That is, the elasticity coefficient k_NN in the 5kb model can be viewed as the equivalent spring coefficient of 5 springs in series in the 1kb system.
The distribution of E-P genomic distances is generally in an unimodal form, and as the d_G increases, the probability distribution first increases and then decreases.In mouse, the median genomic distances are about 280 kb and the most possible distances are about 150 kb [1].For example, if one monomer represents 5kb regions, then 30~60 monomers are required to simulate the E-P genomic distance.And if one monomer represents 2kb regions, then 70~150 monomers are required.Importantly, based on the above results, the length of the monomers does not affect the final theoretical and inferred results, so selecting an appropriate length of the monomers for the study is sufficient to study the gene expression controlled by the E-P interaction.2. The comparison of multimodality of mRNA distribution from simulations and experimental data seems weak.Out of the three references mentioned in that section on Page 16, none of them showed the relationship between E-P regulation and modality of mRNA distribution.It is therefore unclear whether the simulation results are realistic.

Additional
Response: Thanks for your comment.Gene expression is regulated by E-P communication, so the regulatory information from E-P communication is already contained within the experimentally observed mRNA distribution.In Ref [2], the researchers studied the transcriptional behavior of GREB1 changes with estrogen dose.Generally, addition of estrogen (17b-estradiol, E2) may increase contacts between the GREB1 gene and the estrogen-receptor-alpha-bound enhancer [3], thus altering the E-P interaction strength in our model.When increasing the E2 concentration (increasing the E-P interaction strength), we find that the mRNA level grows, and the mRNA distribution changes from the single peak to a bimodal distribution.The changing behavior is consistent with what we described in the paper.A new peak emerges at a larger mRNA level and gradually increases and the peak near the origin gradually decreases.In addition, we have cited other related papers to show that our model can produce the experimentally observed bimodal phenomenon.To make the description more concise and accurate, we have modified the corresponding parts of the main text.
Additional Figure 1.Increasing E2 increases the proportion and productivity of transcriptional ).
3. I suggest that the authors use their framework to explain the source of the multimodality.Is this a result of the chromatin dynamics which produced multiple structural configurations that are more stable than others?If yes, some analysis of the structural model component will be helpful.
Response: Good question!Only the downstream ON-OFF model could not produce bimodality with a non-origin peak, even less likely trimodality.Our model of gene transcription has the potential to produce multimodality.We think that the regulation of chromatin dynamics, especially E-P regulation, is a main source of multimodality in gene expression distribution.In our model, we used a unimodal E-P stationary spatial distance (Maxwell-Boltzmann distribution) to modulate switching between gene states.
However, whether chromatin dynamics produce multiple stable structural configurations and thus lead to multimodal gene expression distributions is a good question.For this, we do not have a clear answer.Our model represents that a single stable structural configuration can lead to bimodal and trimodal.This is mainly due to differences in timescales between upstream and downstream.It seems that the multiple structural configurations are likely to lead to a multimodal gene expression distribution.We need a deeper investigation of this question in future research.
Response: Thanks.This has been corrected.

Reviewer #2
The paper introduces the 4D nucleome equation as a comprehensive theoretical framework for investigating the impact of long-range enhancer-promoter (E-P) interactions on gene expression dynamics.The framework incorporates upstream chromatin motion, downstream mRNA production, and the temporal connection between these processes.The authors derive analytical mRNA distribution and explore how E-P interactions qualitatively influence the characteristics of mRNA distribution.The 4D nucleome equation provides a valuable modeling framework for studying the influence of chromatin dynamics on gene expression kinetics.

Response:
We are pleased that the referee appreciates our work.Response: Thanks for your comment.We compare to previous work and emphasize our specific five contributions as follows.
First, we developed a general theoretical framework to study gene expression regulated by E-P communication, which transforms a complex biological issue into a mathematically solvable one.Gene expression is regulated by spatial chromosome topology and previous models commonly ignored this process.Meanwhile, we would point out that the random organization of chromatin can cause fluctuations in E-P spatial distance and regulate gene transcription in realtime [1].Therefore, time-varying state-switching rates between promoter states may be a more realistic and practical modeling approach to promoter state switching regulated by E-P interaction with cumulative effects (multi-steps) described in Ref. Fourth, we can use less information or directly measured data to infer gene expression dynamics regulated by E-P interactions, and this will have wide applications.In fact, only the E-P genomic distance and the gene expression distribution are needed for inference, without needing contact probability data which are more difficult to measure than the E-P genome distance.Therefore, our model has a wider range of applications than Ref. 10. Fifth, our model can illustrate the nonlinear relationship observed in the experiment between transcriptional levels and E-P contact probabilities.We find that the Hill or linear upstream-todownstream connection can effectively fit the experimental data.Therefore, the nonlinear dependence of the activation rate on E-P contacts does not seem to be a dominant cause of the nonlinear relationship observed in the experiment (as pointed out in the reviewer's comment).
Rather, the coupling of the fast-slow chromatin dynamics may be an important factor contributing to the nonlinearity.And in particularly, for the slow chromatin dynamics, the regulation of upstream to downstream is inseparable, which may cause the gene expression levels of the slow limit are not linear dependence on E-P contacts.In addition, the Ref. 10 also pointed out that nonlinearity might arise from the transient E-P interaction being translated to the slower transcription dynamics.However, we have carefully considered the different time scales between upstream and downstream based on real processes, provided a clear definition of scale gap between upstream and downstream, and elaborated on the role of fast and slow timescales in gene expression regulation.
Finally, we point out that in the case of fast chromatin dynamics, upstream-to-downstream regulation is performed through the expectation of the E-P spatial distance, which can be equated, in a general way, to the contact probability in Ref. 10.However, the slow chromatin dynamics also plays an important role in regulating the distribution of mRNAs and we derive the corresponding distribution for this case (as pointed out in the reviewer's next comment).The general regulation of gene expression by E-P communication should be the coupling of fast and slow scales, and we derive an approximate formula for mRNA distribution.In this revision, we have discussed the model presented in Ref. 10 in the main text Discussion part.
Additionally, the authors introduce expressions for the slow chromatin dynamics limit, which are new.However, they do not discuss the contribution of slow dynamics to the fitting of experimental data.While the authors emphasize that their model fits experimental data better than the expression in Ref. 10, this improvement may be expected since the new model has more parameters.It would be interesting if the authors could provide insight into the obtained value for w in Eq. 12 and its impact on fitting with experimental data.Is the contribution from slow chromatin dynamics important for the observed improvement in fitting?
Response: Thanks for your suggestion.First, the basic fit results allow us to calculate the weight (or contribution) of the slow-scale dynamics to the data fit (1/(1+ω)).We find that the weight of the progressively larger slow chromatin dynamics as d_G decreases (Additional Figure 2A) plays an important role in the modification of the fitted distribution.For example, in cell line C2, we can see that the contribution of the slow dynamics is 32% and the coupled kinetics can adjust the nonoriginal single peak under the fast to an original single peak that is closer to the experimental data (Additional Figure 2C).Second, we calculate the KS distances between the fast distribution (or coupled distribution) and experimental distribution, finding that the average KS distance of the coupled distribution is smaller than that of the fast one (13% decrease on average and 33% decrease for cell cline C2, Additional Figure 2B), which also illustrates the role of the slow chromatin dynamics in correcting the distribution.It should be noted, however, that the fact that these two KS distances are essentially the same for cell lines C5 and C6 does not mean that the slow kinetics has no contribution (the weights are 22% and 16% respectively), but rather that the distributions produced by fast and slow are almost identical under the same parameters (Additional Figure 2D).In a word, the slow chromatin dynamics are crucial for the improvement of the data fitting.We have explicitly emphasized the importance of the contribution of slow dynamics in the main text.
Although more parameters may fit the data better, we clarify the following points.First, fast chromatin dynamics only require the average E-P distance or connection probability, but slow dynamics need richer information and more parameters of the upstream chromatin motion.Second, we can infer the upstream and downstream parameters together using d_G data (that can be measured directly by experiments) rather than connection probability data (that needs to be computed) as well as gene expression distributions.Third, upstream parameters can effectively obtain the contact probabilities.To some extent, the number of downstream parameters is basically the same as the model in Ref. 10. Finally, the results of the inference show that the model not only fits the data well but also infers the connection probabilities as a by-product, which reinforces the validity and reliability of the inference results.To justify the publication of the manuscript, the authors need to highlight novel observations that were not reported in Ref. 10.Many of the analytical expressions presented in the manuscript are standard for polymer physics and stochastic gene expression models.While it is commendable that the authors report these expressions and combine the two models, the expressions alone may not present sufficiently interesting results for publication.Prob.Finally, we point out that since the system has many parameters, including upstream and downstream scale, which would collectively regulate the resulting mRNA distribution, it seems difficult to find the critical conditions giving rise to different scenarios, and large-scale sampling in high-dimensional parameters space is needed.However, we are only interested in effects of the parameters related to E-P communication on mRNA distribution, and find a consistent pattern of modulating mRNA distribution across parameter changes, as shown in our main text and Supplementary Text.

Response
On page 17 line 355, different values of d_G were chosen by fitting the experimental data.This information should be provided to clarify how those values were obtained.
Response: Thanks for your suggestion.In Zuin's paper, the author provided E-P genomic distances for different cell lines.They are 112.1710kb, 39.4530 kb, 23.1110 kb, 17.0190 kb and 6.0600 kb.We supposed that one monomer in our coarse-grained polymer model represented 5kb, and then we obtained corresponding d_Gs.We have added this information in the main text.Response: Thanks for your careful review.We have made corrections.
However, it is unclear what specific contributions the authors have made compared to previous work.Although the authors briefly discuss a comparison with the model presented in Ref. 10, most of the discussion is relegated to the Supplementary Information (SI).It appears that the authors' model is identical to the one in Ref. 10 in the fast chromatin dynamics regime.In this limit, the contact probability used in Ref. 10 can be directly replaced with the distance used in Eq. 3 of the SI.
10. Second, we obtained analytical results for the E-P spatial distance distribution and mRNA distributions regulated by E-P communication.We explicitly derived the E-P spatial distance distribution as Maxwell-Boltzmann (MB) distribution and the data in living mouse embryonic stem cells have indicated the validity.The characteristic parameter of MB distribution clearly defines an analytical relationship between E-P spatial distance and E-P interaction parameters (including genomic distance and interaction strength).Meanwhile, we obtain a high-accuracy approximation of mRNA steady distribution which is a useful and simple formula for predicting the dynamics of mRNA over a broad range of timescale separation.Third, our model demonstrated that E-P communication is a flexible regulation strategy to explain possible implications of bimodal distributions and even trimodal distributions.It is important to point out that without the contribution of slow chromatin dynamics, the obtained Poisson-Beta distribution of fast dynamics can produce neither the bimodal with two non-origin peaks (observed in the experiment [2]) nor the trimodal, implying the importance of slow chromatin dynamics in regulating gene-product distributions (as pointed out in the reviewer's next comment).This provides insights into complex mechanisms of biological processes and is essential for cellular decision-making and environmental adaptation.Note that in spite of the complexity of the expression of the switching rate in Ref. 10, it is essentially a constant and the mRNA distribution is a Poisson-Beta distribution.So, it is impossible that the bimodal with two non-origin peaks or the trimodal appears in the model in Ref. 10.

Figure 2 .
The contribution of slow chromatin dynamics to data fitting.(A) The weight of slow chromatin dynamics in fitting distribution with factor 1/(1+ω). (B) KS distances of different cell lines.The blue dots and shadow line stand for the KS distance between the fast distribution and experimental data, and the red shows the KS distance between interpolation distribution and experimental data.The dashed lines represent the mean KS distance for all cell lines.(C) The distribution of Cell line 2. (D) The distribution of Cell line 6.

:
Thanks for your comment.In addition to the general theoretical framework established to study gene expression regulated by E-P communication and the theoretical results obtained Each line divides the entire region into two parts, with U indicating a single peak area and The solid line divides the entire region into two parts, with the part marked U indicating unimodal and the part marked M indicating multimodal.
205, mothod -> method Page 11 line 223, challenge-> challenging Page 11 line 236, or a a frozen E-P -> or a frozen E-P Page 13 line 260, given vs -> given as Page 14 line 285, CV downregulated by -> CV is downregulated by Page 14 line 297, the mRNA level decrease and trends -> the mRNA level decreases and tends Page 16 line 329, it should be 'still changes from unimodal to the bimodal and then to unimodal again'.

Table 1 .
Parameter values for the best fitting parameter in different monomer lengths.