UvA-DARE ( Digital Academic Repository ) Robustness of DNA Repair through Collective Rate Control

DNA repair and other chromatin-associated processes are carried out by enzymatic macromolecular complexes that assemble at specific sites on the chromatin fiber. How the rate of these molecular machineries is regulated by their constituent parts is poorly understood. Here we quantify nucleotide-excision DNA repair in mammalian cells and find that, despite the pathways’ molecular complexity, repair effectively obeys slow first-order kinetics. Theoretical analysis and databased modeling indicate that these kinetics are not due to a singular rate-limiting step. Rather, first-order kinetics emerge from the interplay of rapidly and reversibly assembling repair proteins, stochastically distributing DNA lesion repair over a broad time period. Based on this mechanism, the model predicts that the repair proteins collectively control the repair rate. Exploiting natural cell-to-cell variability, we corroborate this prediction for the lesion-recognition factor XPC and the downstream factor XPA. Our findings provide a rationale for the emergence of slow time scales in chromatin-associated processes from fast molecular steps and suggest that collective rate control might be a widespread mode of robust regulation in DNA repair and transcription. Citation: Verbruggen P, Heinemann T, Manders E, von Bornstaedt G, van Driel R, et al. (2014) Robustness of DNA Repair through Collective Rate Control. PLoS Comput Biol 10(1): e1003438. doi:10.1371/journal.pcbi.1003438 Editor: Jorg Stelling, ETH Zurich, Switzerland Received August 12, 2013; Accepted November 29, 2013; Published January 30, 2014 Copyright: 2014 Verbruggen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: THö acknowledges support through the BMBF SysTec consortium EpiSys (grant no. 0315502A) and the Initiative and Networking Fund of the Helmholtz Association within the Helmholtz Alliance on Systems Biology/SBCancer. This work was supported by a grant from the Netherlands Organisation for Health Research and Development (ZonMW grant no. 40-00812-98-08031) to RvD and PV. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: r.vandriel@uva.nl (RvD); t.hoefer@dkfz-heidelberg.de (THö) . These authors contributed equally to this work.


Analytical model of DNA repair
We consider the simplified model of DNA repair depicted in Fig. 3: N proteins assemble to form a repair complex which then executes the repair reaction with rate constant ρ. Assembly of the individual components can be in a particular order (sequential) or without a specific order (random) or by a mixed mechanism with random and sequential subsystems. We consider completely sequential and completely random mechanisms as the extreme cases; Fig. 3 illustrates these two extremes for the case N = 3 (with the components of the repair complex A, B and C). Note that similar 'recruitment-reaction' schemes apply to other chromatin-associated processes including transcription initiation.
Sequential assembly proceeds via a single pathway compared to the large number of pathways for random assembly, which amount to N ! different N -step pathways for N protein components. For this reason, random assembly has an intrinsic advantage with respect to assembly speed. However, the sequential scheme has fewer incomplete protein complexes (N −1) than the random scheme (2 N − 1), thus favouring the complete, enzymatically active complex. The efficiency of random and sequential schemes in assembling the complete repair complex and carrying out the repair reaction will be affected by both these structural properties.
A second level of control is provided by the kinetic properties of the individual steps, measured by the on-rates of the proteins, their off-rates as well as the catalytic rate constant (ρ). Experimental measurements suggest that the apparent first-order on-rate constants (on-rate constants × free concentration of the component) are of the same order of magnitude as the off-rate constants (∼ 1/min; balanced reversibility of assembly), while catalysis of the reaction will be faster (∼ 1/s) [1]. We will contrast this realistic case with the hypothetical scenario that all proteins bind irreversibly.
To model the dynamics of repair, denote the composition of a macromolecular (multiprotein) complex in the assembly pathway by an index vector p, where a component p i = 1 (p i = 0) if protein i is present (absent) in the complex. For example, p = (0, 1, 1, 0, . . .) would denote a complex in which molecular component 1 and 4 are absent and components 2 and 3 are present. Furthermore, let p (i ) denote a 'neighboring' index vector that differs from p only in position i (Hamming distance 1). The concentration of the complex with composition p, y p is determined by the balance equation where the κ and l denote the association and dissociation rate constants, with units M −1 s −1 and s −1 , respectively, and c i is the concentration of unbound component i . For the fully assembled complex (p = (1, 1, 1, . . .)), whose concentration we denote byx(t), the additional term −ρx appears on the right-hand side of equation (1). For the concentrations of unbound components, we have The rate of the reaction catalysed by the multiprotein complex is given by v(t ) = ρx(t ). The mean time at which the reaction occurs can be defined by denoting the k t h moment of the reaction rate. The mean reaction time τ can be calculated analytically when only a small fraction of any protein is bound to DNA at any time (i.e., c i (t ) ≈ constant), and all binding and dissociation rate constants are assumed equal (κ i c i = k and l i = l ). Then integrating Eq. 1 to find the required zeroth and first moments of the catalytic rate yields linear systems of algebraic equations. We have solved them explicitly for N ≤ 20; the solutions can be expressed as reassembly and reaction (5) for a complex of N proteins. When catalysis is fast compared to protein binding and dissociation (ρ k, l ), the reaction is limited by the rate of complex assembly, and only the first term is significant on the right-hand side in Eq. (5). This term, with coefficients A i , gives the average time until the protein complex is fully assembled for the first time. The second term, with coefficients B i , gives the additional time needed to carry out the reaction. It takes into account the possibility that the complete complex disassembles before the reaction takes place and must reassemble. The coefficients A i and B i depend on the assembly mechanism. We have for random and sequential assembly respectively. These analytical expressions provide insight into the dependence of the reaction time on the complex assembly mechanism and its parameters. In the limit of irreversible component binding, the sequential assembly time grows linearly with N while the random assembly time increases only logarithmically, ( Figure 3A). Thus irreversible assembly is always faster with a random mechanism, which is due to the larger number of potential assembly pathways. The larger number of incompletely assembled complexes in the random scheme are not detrimental to assembly speed when the components bind irreversibly. For reversible component binding, the reaction is slower because protein complexes may disassemble and reassemble before the reaction takes place. This effect can become particularly pronounced with a large number of components N . However, for sequential assembly the reaction time increases only as an algebraic function of N up to the case of balanced irreversibility, l = k, where For stronger reversibility l > k, the reaction time of the sequential mechanism grows exponentially with N . For reversible binding of the proteins, random complex formation can still be faster than sequential assembly for a small number of components but eventually becomes much slower as the number of components grows ( Figure 3B). In particular, for fast catalysis (ρ l , k), random and sequential assembly mechanisms are of comparable efficiency for N ≤ 10. Eqs (5) and (6) have been used to compute Fig. 3B; Eq. 1 with the simplifications κ i c i = k (c i (t ) = constant) and l i = l and N = 9 components has been used to compute Fig. 3C. 2 Data-based model of DNA repair

Model formulation
The model describes 5 DNA intermediates (I, damaged; II, unwound; III, incised; IV, resynthesized and V, rechromatinized). In total 7 repair proteins can bind to one or more repair intermediates (see Figure 4A). From now on we will use the following notation: XPC = C, TFIIH = T, XPG = G, XPF = F, XPA = A, RPA = R and PCNA = P. Apart from the restriction that TFIIH can bind only after lesions were detected by XPC [2] protein binding is assumed random and characterized by a protein binding constants k i and a dissociation constants l i . Transitions between different repair intermediates are described by catalytic rate constants ( α, unwinding; , re-annealing; β, dual incision; γ, resynthesis; δ rechromatinization). We have the following system of equations for the protein complexes at the different repair intermediates (see also Luijsterburg et al. [1]): The y R π variables denote the concentrations of the repair intermediate R to which the proteins described by index vector π are bound (see Section 1). We define p ∈ {C , T,G, F, A, R, P } and have π(p) = 1 if the corresponding repair factor is bound and π(p) = 0 otherwise. The protein binding kinetics are governed additionally by the unbound nuclear concentrations C p (analogous to Eq. 2). The enzymatic reaction rate E (y) from one repair intermediate to the next is catalysed by a specific complex of repair proteins. If a state has no in-or outgoing enzymatic reactions then E = 0. For the transition from damaged to unwound DNA, we have E (y I 00 ) = y I I 000000 , E (y I 11 ) = −α y I

11
(10) For simplicity, we assume that the rate of incision is very fast and hence that as soon as all necessary enzymes are bound to the unwound state the lesion strand is rapidly incised (β → ∞). This has the following consequences for unwound and incised DNA (R = II and R = (14) and for rechromatinized DNA (R = V): The free nuclear protein concentrations are governed by 7 additional differential equations: where we have used the Kronecker δ to ensure binding of the proteins to the correct intermediate. The factor r takes account of the volume ratio of local damage to nuclear volume (estimated to be about 0.1). In total the model comprises 206 distinct DNA repair states and has 31 binding and dissociation parameters. As initial conditions we used the nuclear concentrations shown in Table SI and an average concentration of 3.33 µM for the amount of inflicted damages [1]. To quantify the accumulation of a particular repair protein we summed up the states where the protein is bound. For the representation of the FLIP experiments we assumed that each protein dissociating from the repair intermediate is rapidly bleached. Thus, in the model representation all k R i for the respective protein were set to zero from the time when bleaching was started. The starting point of the FLIP experiment depends on the time when the repair protein reached maximal accumulation (600s for XPC and ERCC1-XPF, 900s for XPG and TFIIH, 2,000s for XPA, and 7,200s for RPA and PCNA).

Parameter fitting and identifiability analysis
We performed the identifiability analysis by calculating the profile likelihood estimate (PLE) for all 31 parameters. Detailed description how to implement and perform PLE can be found in Raue et al. [3]. Initial fitting was performed by maximizing the likelihood with the MATLAB implementation of the trust-region method and user-supplied derivatives [4]. To reach a global minimum we began the fitting procedure from distant locations in parameter space by Latin-Hypercube sampling of the initial parameter values. Starting from the best fit for each PLE the current parameter was fixed stepwise in ascending and descending direction. At every step all other parameters were locally re-fitted simultaneously. A parameter was determined as identifiable if the likelihood profile crossed the 95 % confidence level drawn from a χ 2 distribution (all binding and dissociation rate constants of the proteins). For the enzymatic rate constants, only a lower bound was found, implying that these parameters need to be sufficiently fast. This case of practical non-identifiability proved without consequence for the goodness of the predictions made with the model (Section 2.3).

Prediction profile likelihoods
To determine the confidence bounds for the responds coefficients (RC) we calculated their Prediction Profile Likelihood (PPL). To this end, we fixed the value for the RC prediction. This value is increased or decreased stepwise, and this nonlinear constraint is used as an additional penalty during the least square fitting procedure. For each step all remaining model parameters are fitted simultaneously until the convergence is reached. A detailed description about implementation of this method can be found in the supplemental material of Kreutz et al. [5].

Estimating the measurement error for fluorescence-microscopy quantification of repair factors
To dissect the natural variability of XPC from the measurement error we correlated the GFPtagged expression values (I x ) with the immunofluorescence signal (I y ). We assume that both quantities have a normalized error x err ∼ y err → Normal(0, p(I x,y )) that can be decomposed into measurement error and natural variability. Measuring GFP and immunofluorescence signal independently at the same time, the measurement error should be orthogonal to the natural variability, which is equal in both measurements. Thus, we approximate the measurement error geometrically by

Derivation of Equation [4]
Consider the repair rate v as a function of the concentration of the repair factors C i and the initial amount of DNA lesions L: v = L f (C 1 , . . . ,C N ) The dependence on L is linear as we have shown here that the repair is first-order in the amount of inflicted lesions. Let us assume that L and C i vary independently between different cells. According to the standard law for propagation of uncertainty, the resulting variability in v is approximated by where σ(x) denotes the standard deviation of x. Introducing the response coefficients and noting that σ(v)/v = CV (v) and σ(C i )/C i = CV (C i ) are the coefficients of variation, Eq. 21 can be rewritten in the form Obviously, the 'response coefficient' for the initial amount of inflicted lesions is 1.