A Stochastic Model of DNA Fragments Rejoining

When cells are exposed to ionizing radiation, DNA damages in the form of single strand breaks (SSBs), double strand breaks (DSBs), base damage or their combinations are frequent events. It is known that the complexity and severity of DNA damage depends on the quality of radiation, and the microscopic dose deposited in small segments of DNA, which is often related to the linear transfer energy (LET) of the radiation. Experimental studies have suggested that under the same dose, high LET radiation induces more small DNA fragments than low-LET radiation, which affects Ku efficiently binding with DNA end and might be a main reason for high-LET radiation induced RBE [1] since DNA DSB is a major cause for radiation-induced cell death. In this work, we proposed a mathematical model of DNA fragments rejoining according to non-homologous end joining (NHEJ) mechanism. By conducting Gillespie's stochastic simulation, we found several factors that impact the efficiency of DNA fragments rejoining. Our results demonstrated that aberrant DNA damage repair can result predominantly from the occurrence of a spatial distribution of DSBs leading to short DNA fragments. Because of the low efficiency that short DNA fragments recruit repair protein and release the protein residue after fragments rejoining, Ku-dependent NHEJ is significantly interfered with short fragments. Overall, our work suggests that inhibiting the Ku-dependent NHEJ may significantly contribute to the increased efficiency for cell death and mutation observed for high LET radiation.


Appendix S1
Supplementary Note 1: Irreversible Approximation Consider a reaction system consisting of a reversible reaction and an irreversible reaction, It is known that system (1) eventually tends to the equilibrium state x = z = 0 no matter whether the first reaction is reversible (β > 0) or irreversible (β = 0), suggesting that the reversibility of the first reaction does not change the long term dynamics of the reaction in the qualitative manner. However, the reversibility does affect the efficiency of the completion of reaction in the quantitative manner because the reverse reaction lowers the production of species z. From the dynamical point of view, system (1) is analogous to the following reaction system where α ∼ α and γ ∼ γ as β α. In the model, we assume that the fragments rejoining is an irreversible process. In other words, after two pieces of fragments join together, the resulting fragment does not split. Thus the combination of recruitment process and rejoining process forms a system similar to (1) if the recruitment is reversible, while the irreversible approximation of recruitment process leads to system (2). Therefore, we conclude that the reversibility of the recruitment process may change the biphasic profile of the mean rejoining time slightly in the quantitative manner, but not in the qualitative manner.

Supplementary Note 2: System of Biochemical Reactions
The DNA fragments rejoining involves three steps of reactions and leads to a large system of biochemical reactions. To study the model, we need to identify all the involving species and reactions.

Biochemical Reactions
Taking into consideration all the reactions involving in the three steps of DNA fragments rejoining, we have a series of biochemical reactions listed below where * and are repair protein E, protein residue r or none ∅. Note that some species can be produced through different reactions. For instance, as n, m > L * .

Reacting Species
Assume that DNA fragments of different length are different species. When the DNA fragments of the same length are treated to be distinct with or without repair protein E, the protein residue r or R attached, more species will be introduced into the system. Collectively, all the species involving in the biochemical reactions of DNA fragment rejoining are listed below X = X n , X E n , X EE n , X rE n , X r n , X rr n , X R n .
Because extremely short fragments (n < L m ) cannot recruit Ku, X E n exists only if n ≥ L m and similarly X EE n exists for n > L * . By considering how other species are produced, we may notice that X R n is well defined for 2L m ≤ n ≤ 2L * , X r n and X rE n for n > L m + L * and X rr n for n > 2L * . Because of these seven types of derivative species with the same length, a large amount of species have to be included into the system when different lengths are taken into consideration.

Large Network of Reactions
If we let # X n be the number of DNA fragments of length n and assume that the total length of a DNA sequence is L T , then whenever X * n is well defined, where # (X * n ) may be zero. As L T is large, the DNA fragments rejoining leads to a large network system consisting of approximately 6L T species and 9 16 L 2 T reactions. Note that short fragments of length n < L m do not participate in any reaction.

Supplementary Note 3: Stochastic Simulation by Gillespie's Algorithm
Simulation of such a large system is computationally expensive, especially when stochastic simulation is adopted. It is known that 1Gy ionizing radiation produces 25 − 35 DSBs [1] and hence similar amount of DNA fragments. Therefore, the computational cost can be tremendously reduced if we consider only the possible involving reactions associated with the prescribed initial fragments distribution, instead of all the reactions.

Gillespie's Algorithm
Precisely, for a given positive integer K 0 , let {(n 0 k , m 0 k )} be a sequence of positive integer pairs such that n 0 k , m 0 k ≥ 1 for 1 ≤ k ≤ K 0 . Then a prescribed initial DNA fragments distribution can be written as which consists of m 0 k pieces of DNA fragments of length n 0 k . Denote by R the set of all possible reactions associated to a given set S of species, then it is readily to see that .
with the associated propensity functions defined by Set By the Gillespie's direct algorithm of stochastic simulation [2], pick randomly a pair of real numbers (r 0 1 , r 0 2 ) satisfying uniform distribution on the interval [0, 1]. Then the first reaction time is given by and the first firing reaction is reaction j 1 satisfying Consequently at time t = τ 1 when the first reaction step is completed, the set of reacting species is updated to Hence the associated reaction set R 1 and propensity functions {a 1 k } are revised correspondingly. Repeat this process of Gillespie's stochastic simulation until the reaction set R is an empty set, that is, no more reactions can happen. Note that the propensity functions of the second order reactions for fragments rejoining are given by as X E * n and X E m are different or the same species, respectively. In the release step that is a first order reaction, the propensity functions are of the forms k 3 X r * n or k 3 X R n . As a summary, the Gillespie's stochastic simulation is preformed in the following steps.  3. Calculate τ and j as in (5) and (6), respectively. 4. Update the set of reacting species S and reaction set R at t = t + τ . 5. End simulation if R is empty or return to step 1 otherwise.

Kinetics of DNA Fragments Rejoining
To mimic the kinetics of DNA fragment rejoining, we need to record the number or the fraction of DNA fragments in each step of reactions. In other words, we count M T (t i ) = 1≤k≤K i * , ∈{E,r,R,∅} # X * n i k after reaction step i is completed, where #(X n k ) is the number of fragments of length n k . Because DNA fragments rejoining is assumed to be irreversible, its kinetic profile must be in the non-increasing manner and undergo no change after all the repairable fragments are rejoined.
Moreover, due to the stochastic features of the DNA fragment rejoining, the kinetic changes and total rejoining time may be very different although the simulations start from the same initial fragments distribution and share the same decreasing profile. Therefore, the mean value of total rejoining times obtained by numerous simulations is considered as the mean rejoining time associated to a specific initial fragments distribution, as used in Figures 3(a-d).

Parameters
Ku is an abundant heterodimeric nuclear protein (Ku70/Ku80). It was estimated that around half a million copies of Ku70/80 complex are produced in the human cells [3]. Hence, as suggested in [4] that there is an optimal level of repair enzymes within cells for optimizing DNA repair and the precise value of the repair enzyme concentration is of some relevance, we suppose, in this model, that the amount of repair protein E remains constant in a relative normal human cell line and Because the recruitment of repair proteins and fragments rejoining are all second order reactions, the reaction rate will be dependent on the volume of the reacting solution. Since DNA is located in the nucleus and all the reactions take place in the nucleus, we take the volume V of the nucleus for this purpose and assume that V = 2 × 10 −10 ml, where ml is milliliter. It is known that Ku starts accumulation at DSBs within a few seconds and reaches its maximum at around 3 minutes [5]. Thus we assume this rate is given byk 1 = 10/minute = 600/h, where h is hour. Becausek 1 depends on the concentration of Ku and the volume V , we havē Recent study revealed that Ku releasing from DNA is through ubiquitylation and dependent on the length of DNA. Effective removal of Ku from DNA requires DNA longer than 50bp and may take 30-60 minutes [6]. Then we assume that the repair protein residue is released in the similar time range and set Up to date, there is no published data for measuring or estimating the second order reaction rate constant k 2 , which is related to the fragments rejoining rate k2 V X E * n X E m . Alternately, DNA ligation rates were proposed and set to be 0.03 in the fast repair kinetics and 0.003 in the slow kinetics [7] (Supplementary  Document: Table S2). Thereby, we set

Comparison with Experimental Data
The kinetic data of DSBs repair is achieved by taking the average value of fractions of foci observed in the experiments. In order to compare with experimental data, we need the average kinetics of fragment rejoining. Assume that there are N sets of kinetic data by N simulations, is a set of P j time points that is either regular (equally spaced) or irregular (not equally spaced) and any two sets of time points may have no overlap. Because we know that each kinetic profile is non-increasing, we can choose N decreasing functions f j (t), each of which interpolates one set of kinetic data and is well defined at the given P time points, say {t i } P i=1 , where we like to compare with experimental data. Hence we take the averaged function to describe the averaged kinetics from simulation data. Finally, the comparison is conducted between the experimental data set and interpolated simulation data set {f (t i )}, as seen in Figure 4.
On the other hand, in the biological experiments, DNA double strand breaks (DSBs) are marked by 53BP1 foci. DSBs repair is testified by counting the numbers of foci at specific time points to recover the repair kinetics [8]. Complete DSBs repair results in zero foci present. In contrast, our model is for DNA fragments rejoining, whose kinetics shows the change of number of fragments and that complete rejoining leads to one fragment at last. To compare with experimental data, therefore, we consider the kinetic change of the ratio M T (t) − 1 M T (0) − 1 which represents the ratio of DSBs and hence foci remaining. Here M T (t) is the total number of DNA fragments and M T (t) − 1 the total number of DSBs at time t.