A Looping-Based Model for Quenching Repression

We model the regulatory role of proteins bound to looped DNA using a simulation in which dsDNA is represented as a self-avoiding chain, and proteins as spherical protrusions. We simulate long self-avoiding chains using a sequential importance sampling Monte-Carlo algorithm, and compute the probabilities for chain looping with and without a protrusion. We find that a protrusion near one of the chain’s termini reduces the probability of looping, even for chains much longer than the protrusion–chain-terminus distance. This effect increases with protrusion size, and decreases with protrusion-terminus distance. The reduced probability of looping can be explained via an eclipse-like model, which provides a novel inhibitory mechanism. We test the eclipse model on two possible transcription-factor occupancy states of the D. melanogaster eve 3/7 enhancer, and show that it provides a possible explanation for the experimentally-observed eve stripe 3 and 7 expression patterns.


Numerical implementation overview
In order to evaluate the looping probabilities, we used a Monte-Carlo algorithm to generate a large number (10 8 − 10 10 ) of self-avoiding DNA chains. The Monte-Carlo simulation was written in CUDA C++, and was executed on two NVIDIA GeForce GTX TITAN cards.

Derivation of the sampling probability and Rosenbluth weights
The probability density of a chain configuration x is given by: where x is a single chain configuration, corresponding to {θ, φ} N in equation (1) in the manuscript, E (x) ≡ E ({θ, φ} N ) in the same equation, and is the partition function. Following the notations in [6], for a chain x we defined a sequence of auxiliary distributions π t (x t ), t = 1, 2, ..., N − 1 for the sub-chain (partial sample) x t = (x 1 , x 2 , ..., x t ) with t links: where Z t is the partition function of a self-avoiding worm-like chain with t links. The marginal distribution of the partial sample x t−1 = (x 1 , ..., x t−1 ) is where E t (x t |x t−1 ) is the potential energy of the tth link x t given t − 1 previous links x t−1 . The second transition is possible due to the fact that the energy of the chain can be broken to contributions from individual links: The conditional distribution is therefore given by: We set the incremental sampling distribution g t (x t |x t−1 ) in the growth method for sampling the tth link x t given x t−1 exactly equal to the conditional distribution π t (x t |x t−1 ): The importance weight of link t is now given by: And the total importance weight is given by: Due to the fact that we do not know Z (or the individual Z t ) we define the weights up to a multiplicative constant and use a biased estimator for the chain properties (see Section 3.1): and The probability to draw a chain x sequentially is: The expectation value of a property v is computed from an ensemble of m chains as: where x (i) is the ith chain in the ensemble. This estimate is asymptotically unbiased, but is biased for finite m.

Sampling procedure
Each self-avoiding worm-like chain (SAWLC) in our ensemble was grown one link at a time by selecting the link's orientation according to the incremental sampling distribution g t (x t |x t−1 ) (8): where We sampled the link orientation angles (cos θ i , φ i ) using inversion sampling [2]. Inversion sampling of a single random variable X from a probability distribution function (P DF (X)) is carried out by, first, integrating the P DF (X) over the entire range of X, resulting in a cumulative distribution function (CDF (X)). Then, a random number y in the range (0, 1) is generated from a continuous uniform distribution. Finally, the desired variable value x is extracted from the inverse function of the CDF (X) at y: x = CDF −1 (y). We first consider the case of the chain without the excluded volume constraint. Namely, we compute the inverse of the following function: which allows us to draw cos θ i . Then, we generate φ i from a uniform distribution over the entire range φ i [0, 2π).
When we reintroduce the hard-wall potential, the above integral now changes to: As a result, the integral over φ is no longer over the entire [0, 2π) range, as some directions in space cannot be assumed by the new link due to the excluded volume constraint. Thus, for each possible θ angle of the new link, there is a different range of φ values that are allowed. Since there can be more than one colliding object, the set of allowed values for φ i need not be one continuous range. Rather, these values can be organized into m ≥ 1 consecutive ranges [r a i,0 , r b i,0 ], ..., [r a i,m−1 , r b i,m−1 ], such that Thus, for each θ i the φ i angle can be sampled from a step-wise uniform distribution: We computed the CDF (cos θ i ) numerically, using the adaptive Gaussian integration method [9], and subsequently inverted it.

Simulation details
Simulation parameters Simulated ensembles varied in size depending on the desired accuracy, with the smallest ensembles containing 10 8 samples and the largest containing 5 × 10 9 . The length of the generated chains in most cases (except for the cases described in the section titled "Looping at chain ends vs. looping in the middle of the chain" in the manuscript) was limited to 10 4 bp, which is 850-970 links (100-250 base-pair sized links for the enhancer region and 750-720 w = 4.6 nm sized links for the region between the enhancer and the promoter). As can be seen in Figure 1 this two stage procedure has no effect on looping probability for chains longer than ∼500 bp, and no effect on looping probability ratio. The running time changed significantly, depending on the location of the protrusion and the number of links generated, with an average sampling rate of approximately 10 7 samples per minute. Figure 2B  An additional simulation was run without a protrusion, containing 5×10 9 chains in order to estimate the probability of looping without a protrusion. The large amount of samples in this simulation was needed to improve the accuracy of F (L) estimation, as the probability of looping with a protrusion was divided by the probability of looping without a protrusion. For the plots in Figure 3 we run two simulations in which the protrusion was positioned in a static location at (41.45 nm)û 0 , and its radius was R 0 = 23 nm. Each simulation contained 2 × 10 9 samples. In one simulation we removed chain-chain interactions, leaving only chain-static object interactions, while in the second simulation we introduced both chain-chain and chain-static object interactions. For the plots in Figure 5 we ran two simulations in which we added a leading segment of one Kuhn length b. Each simulation contained 5 × 10 8 samples. In one simulation we positioned the protrusion on the leading segment (upstream) at a distance of 95 bp from the looping region. In the other simulation we positioned the protrusion after the looping region (downstream) at a distance of 95 bp from it. We ran an additional simulation of 5 × 10 9 samples without a protrusion in order to compute F (L).
D. melanogaster eve 3/7 enhancer For the plot in Figure 8 we ran a simulation modeling the 3/7 enhancer in D. melanogaster. The structure of the enhancer was modeled based on [11]. In particular, the locations of the protein complexes were 16, 43, 68, 88, 112, 268, 310, 342, 409, 446 and 483 bp. The locations of the two dStat activators were 202 and 323 bp, and the location of the Zld activator was 416 bp. The chain was simulated as 512 base-pair sized links, followed by 237 links of size w = 4.6 nm, followed by 150 base-pair sized links, giving a total of 3900 bp between both ends of the chain. We used R dStat = 3.04 nm, R Zld = 3.75 nm and three protein complex sizes of R o = 2.38 nm, R o = 3.59 nm and R o = 5.83 nm. The activator and protein complex protrusions were located around the chain axis according to the native DNA twist of 2π/10.5. We ran 4 simulations in total: one simulation with only the three activator protrusions present, containing 5 × 10 9 samples, and three simulations with all activators present as well as different sized protrusions corresponding to the three values of R o , containing 2 × 10 8 samples each.
Looping in the middle of the chain simulation In order to compute the probability of looping for a generic chain, where the loop now takes place between two widely separated internal links of the chain, we extended the original length of the chain L by two segments of 10 Kuhn lengths (10b) from each end of the chain. We denote the links added before the looping region (chain origin) as links −10b...0 and the links added after the chain terminus as N + 1, ..., N + 10b. The procedure of the chain generation was changed in the following way. First, we generated a leading trail of 10b links prior to the location of the enhancer, computing its cumulative Rosenbluth weight as usual. We proceeded to generate the rest of the links of the chain as before. When a new link in the range n ∈ [10b, N + 10b] was generated, its cumulative Rosenbluth weight W n ≡ n i=−10b w i was assigned to link n − 10b.
This effectively resulted in an enhancer-promoter region of length n − 10b, with two flanking regions of length 10b each, as required. This scheme allowed us to generate the looping probability data for all lengths of chains in a single run as before.
Confining sphere simulation In the case of chains confined by a sphere, the simulation process was broken into several stages: 1. The initial links of the chain were generated as before, with the bounding sphere of the links generated thus far being continuously computed.
2. When the size of the bounding sphere reached the confining size, the sphere's location was frozen.
3. The frozen bounding sphere acted as an additional hard-wall potential for all links generated henceforth.
The computation of a minimal sphere encompassing spheres is a very demanding task [3]. Computation of a minimal bounding sphere encompassing points is slightly less demanding, but nevertheless very costly. In our case, however, we do not require precision in the determination of the bounding sphere, especially since the results show that the size of the confining sphere has no effect on the looping probability ratio. Therefore, we used a very simple algorithm for the bounding sphere computation: 1. We begin with a sphere with the center at the origin c = (0, 0, 0) and radius R = 0.

When a new joint
x i is added to the chain, we check whether it is inside the sphere.
3. Repeat step 2 until R is bigger or equal to the radius of the confining sphere.
The result of this algorithm is a confining sphere of the target size around an initial segment of the chain. For a freely jointed chain this method produces a bounding sphere with radius 5%-25% larger than the radius of the minimal bounding sphere, with an average of 20%, independent of the number of links generated (data not shown). Thus, the placement of the confining sphere is not ideal, with the chain link that caused the sphere freeze touching the surface of the sphere, and the origin of the sphere distanced 0.1R ≤ s ≤ 0.5R away from the sphere's surface. We partially compensate for this problem by advancing the sphere center c in the direction xi−c |xi−c| by min {0.15R, s}. While this is a slight improvement, the placement of the frozen sphere is still far from ideal. However, since our results show no dependency at all on the size of the bounding sphere, this inaccuracy is inconsequential.

Computation of F ∞ and its error estimation
To compute the value of F ∞ for a specific simulation we averaged the values of F (L) for L ≥ 10b. According to the model and the numerical simulations themselves, at this separation and onward F (L) was constant. We denote the first link satisfying this condition as N 10b and the last link of the chain as N . We considered each pair F (N i ), F (N j ) independent for all i = j . Thus: where l is the length of the link. In our simulations N − N 10b + 1 = 480. The standard error of the mean is In our plots we plot the error bars with 1.96δF ∞ .

Blob sizes for model organisms
Chromosomal DNA is compacted into a globular state. For low enough volume fractions it can be described as a semi-dilute polymer solution. This semi-dilute solution may be pictured as a system of blobs of characteristic size ξ. Inside each blob, the chain behaves as an isolated macromolecule with excluded volume, and different blobs are statistically independent of one another. The blob size can be roughly estimated as [4]: where Φ is the polymer volume fraction, which for DNA is organism-dependent. The number of consecutive chain Kuhn lengths located in a blob is roughly given by [4]: Therefore for a volume fraction of Φ ≈ 0.01 in E. coli [5] and b = 100 nm for bare DNA [12] we get a blob size of ξ E.coli ∼ 3200 nm and bg E.coli ∼ 320b ≈ 90 kbp inside a blob. Similarly, for a volume fraction of Φ ≈ 0.015 in D. melanogaster [5] and b = 200 nm for chromatin [12] we get an even larger blob size of ξ D.melanogaster ∼ 4600 nm. These rough estimates suggest that for realistic volume fractions, a typical blob will contain many Kuhn lengths of polymer.
3 Estimation of the bias in the expectation value of the looping probability

Bias origin
Given the distribution of x, π (x), the mean value of a function v (x) is evaluated as: There is an inherent bias in estimating the mean value with importance sampling. The bias results from the fact that instead of the unbiased one (40): where w x (i) is the importance weight of the configuration i (12) and Z is the partition function of π (x) ≡ Zπ (x) (3). The unbiased estimator's expectation value is equal to the mean value of the corresponding function (40): where g (x) is the sampling distribution (13). However, the unbiased estimator cannot be used, since it requires the knowledge of π (x), while usually we only know the distribution π (x) up to a constant Z.

Bias analysis
In this section we follow the derivations of [7] and adapt them to the SAWLC case. The true expectation value of a property v is computed from π (x): The partition function Z (3) and the mean value v are estimated from the numerical ensemble of m samples by: and (29): We shall continue to use v for the mean value estimated from m samples, to distinguish it from v the true mean value of the ensemble. First, we show that the estimator ofẐ (33) is unbiased. For an estimated propertyû we can compute the expectation value (integrating over all possible configurations, taking into account the probability to draw those configuration numerically g (x)) as: Thus we can compute the expectation value ofẐ. Plugging (13) into (33), since The variance ofẐ is related to the variance of w, becauseẐ is just the average of m independent values of w: (37) For a parameter v we define an unbiased (up to a constant Z) estimator of v : Then the expectation value of R: Thus the estimator v is asymptotically unbiased. For finite m we expand v around R 0 , +higher order terms. (41) We take the expectation value of v . The second and third terms vanish, we plug in (36),(40) and we're left with: But where we used (37),(36). In addition: Thus, plugging (43) and (47) into (42): Hence the bias: Similar to [7] we obtained that the bias scales as m −1 , thus we expect it to be very small for the typical m = 10 8 we used in our simulations.

Practical estimation of the bias
While (50) is a very nice formula, it is not practical, as it requires integration over the entire phase space of a chain.
[8] develops a practical estimate of the bias, which we adapt to the SAWLC case following his derivations. For p independent runs of m chains each, the covariance of v andẐ can be computed as: However, by definition Using (39) and 40: while from (36): Therefore Thus: where the cov v ,Ẑ is estimated from p independent runs, and Z is estimated byẐ and further refined as

Estimation of the looping probability bias
We applied the procedure described in the previous section to the estimation of looping probability bias. The smallest ensemble that we used in this work contained 10 8 chains. We performed p = 50 runs with m = 10 8 chains in each run, for maximal length L = 10 4 bp. We used 51 and 56 to estimate the bias. As can be seen in Figure  2 the bias in the looping probability estimation is smaller than the standard deviation of the looping probability estimation by at least 4 orders of magnitude. We repeated the bias estimation and standard deviation estimations for the simulations including a confining sphere. Both the bias and the standard deviation were significantly smaller than the looping probability (exhibiting similar behavior to Figure 2) for R ≥ 250 nm. For R = 125 nm our simulations are no longer valid for L ≥ 4400 bp, due to an increase in standard deviation to the level of the looping probability. The estimated bias reaches the levels of the looping probability and the standard deviation only at higher values of L. P looped (L) σ(P looped (L)) B m (L) Figure 3: Comparison between the bias in estimation of the looping probability (red) and the standard deviation of the looping probability (green) as a function of generated chain length for chains confined by a sphere with radius R = 125 nm. The looping probability for a sample run is shown in blue. The estimation is done using p = 40 runs with m = 2 × 10 7 chains in each run, for maximal chain length of L = 10 4 bp.